• Please review our updated Terms and Rules here

IDE Transfer DMA Code

pearce_jj

Veteran Member
Joined
May 14, 2010
Messages
2,808
Location
UK
I seem to have run out of puff coding this, would REALLY appreciate some help!

I'm trying to code a DMA transfer routine to drop into the xtide-universal BIOS. It can be hard-coded to work with XT architecture, DMA channel 3. I want to use block-mode transfers. Communication with the IDE controller except for the actual transfer is via XT-IDE style port 300h access. The card will assert DRQ3 when value 04h is written port port 30Fh and clear it when TC is asserted.

Here we go so far:

Code:
;	Parameters:
;		CX:		Block size in 512 byte sectors
;		DX:		IDE Data port address
;		ES:DI:		Normalized ptr to buffer to receive data
;	Returns:
;		Nothing
;	Corrupts registers:
;		AX, BX, CX

	; work out how much we're transferring.  We can't cross a physical 64KB boundary
	; but since the max transfer size is 64KB, we only ever need to do one or two DMA operations

	mov	ax, 0xffff
	sub	ax, di			; 64k - DI = number of bytes we could transfer, in AX
	xchg	cl, ch			; sectors to words
	shl	cx, 1			; words to bytes; CX has total byte count
.StartDMA:
	cmp	ax, cx			; can we do it in one hit?
	jb	.NextDMA
	; if we're here we need to do only one DMA operation
.LastDMA:
	mov	ax, cx			; move bytes left to ax
.NextDMA:
	mov	bx, ax			; save the byte count
	; set up the DMA controller for this transfer
	cli				; clear interrupts while we set up the actual DMA transfer

	mov	al, 0x07		; mask (4) + channel (3)
	out	0x0a, al		; send to DMA mask register
	xor	al, al			; clear al
	out	0x0c, al		; send 0 to DMA clear register, i.e. clear the channel config

	mov	al, 0x97		; required mode is block/inc/non-auto-init/write(to memory)/ch3
	out	0x0b, al		; and send mode to DMA mode register

	mov	ax, di
	out	0x06, al
	mov	al, ah
	out	0x06, al		; send offset to DMA controller address port for ch.3
	
	mov	ax, es
	xchg	al, ah			; high 8 bits of ES now in AL
	shr	al, 1
	shr	al, 1
	shr	al, 1
	shr	al, 1			; shr al, 4 => high 4-bits of ES now in low 4-bits of AL
	out	0x83, al		; send those 4-bits to DMA controller page for ch.3 (XT port)
	
	mov	ax, bx
	out	0x07, al		; send low-byte of transfer size in bytes...
	xchg	al, ah			; switch...
	out	0x07, al		; ...and high byte to DMA controller count port for ch.3
	mov	al, 0x03		; clear bit mask (0) + channel (3)
	out	0x0a, al		; send to DMA mask register - enable the DMA!
	sti				; enable interrutps; let the CPU see to anything outstanding
		
	; now get the card to trigger the actual transfer
	mov	dx, 0x030f		; XT-CF card control register
	mox	al, 0x40		; special DMA enable code
	out	dx, al			; send to card - this will assert DRQ.. and we're off...
	nop
	nop				; a couple of padding instructions, but once we're here
	nop				; the DMA should be done
	
	; DMA is done - any more to do?
	add	di, bx
	jc	.AddPageToES
.CheckMoreDMATransfers:
	sub	cx, bx			; total bytes - bytes we just transferred
	jnz	.LastDMA		; do next transfer, if there's bytes left
	
	ret
	
.AddPageToES:
	add	es, 0x1000
	jmp	.CheckMoreDMATransfers

In particular, I'm not sure about the part dealing with ES: DI + transfer size straddling a physical 64K boundary.

Any thoughts or help on this would be greatly appreciated!
 
I haven't looked at your code yet, but you've got two options for handling the physical boundary "straddle".

You can return an error 9 and hope that the calling routine understands an error that hasn't been around since the XT. (AT and later either use programmed I/O or a different DMA method (e.g. PS/2) for hard drives.)

The other option is to break the transfer up and handle the straddle either with programmed I/O (if you can) or DMA to a 512 byte buffer that doesn't straddle the boundary.

Floppy I/O returns an error 9, as the basic mechanism hasn't changed much since the 5150.
 
I would look to see what the original XT hard drive BIOS does and follow that.
 
I would look to see what the original XT hard drive BIOS does and follow that.

I thought that I'd said that the 5160 returns error 9. Line 1467 et seq. in the fixed disk BIOS listing. The issue is that not all third-party checks for that, particularly if written for anything later than an AT.

I believe that the 5160 HD BIOS uses single-mode DMA; i.e., mode 0x47 for a read from disk on channel 3. So if pearce_jj wants to use block-mode, it's not much help.
 
Last edited:
The only applications that are going to issue BIOS calls like that are hard drive utility programs and DOS. And both should be aware of error messages.

I agree that hiding the DMA boundary is probably the right thing to do. But compatibility with the original XT method should be good enough given the limited amount of software that will make that kind of request.
 
I've seen utilities written after the debut of the 5170 that are completely unaware of the DMA boundary issue and don't know what to do if they get an error 9, other than to return the error (sometimes with the message "disk error"--duh). (I once spent the better part of a day debugging someone's disk imaging program before discovering that one).

It's a sore point with me because any utility compiled to run on an 8086 should expect XT-type error conditions and an XT-restricted API. Granted, the DMA thing shouldn't happen very often if the utility reads in single sectors, but still... If you compile for 8086, support the platform. Otherwise, code/compile for 80286.
 
Thanks for the replies. The maximum number of sectors to be transferred is 128 according to the comments in the xtide-universal BIOS, so a transfer could either be 1 or 2 DMA transfers always. Rather than revert to PIO I'd like to try and do the DMA(s) if I can, the question really is how to check for it and handle it.

Is it too simplistic to just compare DI against FFFFh:

Code:
	mov	ax, 0xffff
	sub	ax, di			; 64k - DI = number of bytes we could transfer, in AX
	xchg	cl, ch			; sectors to words
	shl	cx, 1			; words to bytes; CX has total byte count
.StartDMA:
	cmp	ax, cx			; can we do it in one hit?
 
Last edited:
Well, here's what I use to test a buffer for a 64K "straddle":

Code:
;*	Straddle - Test buffer for 64K DMA straddle.
;	--------------------------------------------
;
;       On entry, a long pointer to the buffer and the length in bytes.
;
;	Returns 0 if no straddle, or number of bytes overlapping
;	into next segment.
;

Straddle	proc	public buffer:far ptr, buflen:word
	xor	ax,ax			; say no straddle
	mov	dx,word ptr (buffer+2)	; get segment
	mov	cl,4
	shl	dx,cl			; to byte address
	add	dx,word ptr (buffer)	; + base
	add	dx,buflen		; + length
	jnc	Straddle2		; if no straddle
 
	mov	ax,buflen
	sub	ax,dx			; buffer - bytes over 64K
Straddle2:
	ret
Straddle      endp

Similar; maybe it'll help.
 
My DMA software/hardware is working :) Initially I tried to do entire sectors in block mode, which worked (fast!) but disk accesses made sound payback run slow. Block mode can't be interrupted, so that figures.

Next I changed the hardware to release DRQ after 16 bytes and re-coded the transfer code to use demand mode. This also works well enough, a little slower given there's a 10-cycle port command every 16-bytes to set off another transfer. But, this made no difference to the sound playback issue.

Surely if DMA demand mode has been suspended by DRQ being released, a higher-priority channel should be able to jump in and make a transfer?
 
For the benefit of the search, higher-priority channels can jump in once DRQ is released, but wait-states seem to be needed to allow everything time to catch up. For example JMP $+2 after a transfer request is made.
 
The silly timing makes sense if you remember that the 8237 is really an 8085 peripheral.

Although not an issue in MS-DOS, I wonder if your DMA transfer code would clobber a floppy DMA transfer in progress? The 765 FDC basically has a one-byte buffer.
 
Thanks - is it even possible to access two drives truly concurrently under DOS? How could I test it?

EDIT - here is some condensed transfer code, entry is with byte count in CX and ES is >> 12 to suite the DMA registers.

Code:
.NextDemandBlock:
	; Whilst DACK is set the card counts to 16 (bytes) before releasing DRQ
	; thereby transferring 16 bytes at a time
	; No unrolling since we purposely want some time between transfers, to allow
	; any competing tasks time to step in if they need to (i.e. other DMA activities)
	out		dx, al			; transfer 16 bytes by raising DRQ
	sub		cx, 16			; update count
	jc	.CleanUp			; underflow - we're done (block wasn't divisably by 16)
	jnz	.NextDemandBlock		; repeat if there's more

.CleanUp:
	; check the transfer is actually done - in case another DMA operation messed things up
	in		al, 0x08		; get DMA status register
	and		al, 0x08		; test DMA ch.3 TC bit
	jz	.MoreToDo			; it wasn't set so get more bytes
	pop		cx			; get back byte count
	mov		dx, es			; 
	add		di, cx			; update pointer
	adc		dx, 0			; ...
	mov		es, dx			; ...increment ES if needed	
	ret

.MoreToDo:					; set up extra transfers when we need to
	mov		al, 0x40		; XT-CF DMA enable command
	mov		cx, 16			; so that the transfer logic works
	jmp	.NextDemandBlock

EDIT2 - Looks that FDD data rate is 125,000 bps - so about 300 clocks between consecutive bytes. With 16-byte transfers the DMA controller is busy with DRQ3 for only about 100 cycles so it should be OK. It doesn't bother the soundblaster running at 22kHz anyway.
 
Last edited:
It's possible if some multitasking software is installed. On the AT and above, there are BIOS calls that enable time-slicing.

I've written code that allows for running the floppy and hard disk concurrently--and I'm not the only one. Search around for my old program ConFormat, that uses a popup to format floppies in the background. There's also at least one backup package that backs to floppies concurrently with hard disk access. I think most tape backup programs do this also--it's deadly to throughput if you have to stop the tape motion.
 
Hi Chuck, could you post a link to your background formatter? I found a utility "bgformat" but it double-steps the 360K drives in the 5155 for some reason, plus the screen is impossible to read on the 5155 screen setup. It does though behave consistently with or without transfers running concurrently.

Re AT, the DMA stuff is only really of use on a 4.77MHz 8088 - even with just a V20, the gains are marginal. But the card can be switched on-the-fly between IO ports, memory-mapped or DMA by simply updating it's status register:

Code:
	mov		dx, 0x30f	; XT-CF board control register address (assuming 300h base)
	in		al, dx		; get control register
	cmp		al, 0		; if it's zero, use...
	jz	.PortIO			; ...port-based IO (the default)
	cmp		al, 0xA0	; if it's 0 < al < A0h use...
	jb	.DMAIO			; ...DMA (hard-wired to channel 3)
					; otherwise, it's >= A0h so fall to memory-mapped IO...
.MemMapIO:				; cx has sector count, al has high 8-bits of mem-mapped base
					; address, which we read from the card config register
 
Rather than prowl the web for a copy that's over 20 years old, I've just attached it here. We dropped the product in 1991 because PM GUIs were becoming popular and it would have been a lot of work to design something like this to work in both PM and real-mode text. The other reason was that just about everyone had a hard disk, so diskette formatting wasn't that important anmore.

But here it is--it should work fine on a 5155.
 

Attachments

  • CONFM106.ZIP
    33.5 KB · Views: 1
Chuck, thanks for posting that. What a great utility; the hours that would have saved me, back in the day!

Anyway it seems to work OK - I tried formatting a 360K floppy with the system idle then running performance and pattern tests (against the C: drive) and everything seems good. C: throughput dropped about 15KB/s when run with a format running of course (to 480KB/s or so). Disks were blank and appeared good in every case, meanwhile pattern tests produced no errors either.

Then I tried 8088 Corruption at 30fps whilst formatting - and to my surprise that also went well and with no pauses. That would have been driving all four DMA channels concurrently.
 
That's great to hear.

One used to be able to clobber a floppy format on NT 4.0 by running a CPU-intensive task at the same time. It seems that the floppy kernel driver (which runs floppy I/O as a separate thread) forgot to boost the run priority of its thread.
 
I've found that I need to check the DMA status register to determine if a DMA block transfer really has been completed:

Code:
.CleanUp:
	; check the transfer is actually done - in case another DMA operation messed things up
	in		al, 0x08		; get DMA status register
	and		al, 0x08		; test DMA ch.3 TC bit
	jz	.MoreToDo			; it wasn't set so get more bytes
	... other end-of-function code ...
	ret

.MoreToDo:					; set up extra transfers when we need to
	mov		al, 0x40		; XT-CF DMA enable command
	out		dx, al			; get up to 16 bytes
	jmp	.CleanUp			; go back to check (this jump is the wait-state too)

But, I'm wondering why? This is *only* needed if there are competing DMA tasks, and then only occasionally.

To clarify what's going on, demand-mode DMA transfers are made by the hardware immediately asserting DRQ3 when 40h is written to it's control register. It then counts 16 transfers by counting /IOW or /IOR rising edges when /DACK3 is asserted. Once 16 transfers are done, DRQ3 is de-asserted (DRQ3 will also be de-asserted if DMA TC is raised with /DACK3 also asserted).

Now everything seems to work well exactly as it is, BUT there is a significant performance enhancement possible for compact flash cards if this can be solved.

Since CF cards support only single-sector transfers, the DMA controller needs to be re-programmed for each sector (and all the corresponding boundary checks). If the above check could be eliminated, the DMA controller could be programmed once only for the entire transfer - up to 64 sectors.
 
CF cards don't support multi-sector operation?

Most do. It depends on what comes back from the IDENTIFY command. See this for example.

For those that don't, the performance hit isn't too bad; they're mostly the older, smaller cards.
 
Back
Top