DMA

From SpecNext official Wiki
Revision as of 15:56, 6 January 2024 by Intrepidis (talk | contribs) (Layer2 auto-slowdown was lifted in Core 3.0)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Overview

The ZX Spectrum Next DMA (zxnDMA) is a single channel DMA device that implements a subset of the Z80 DMA functionality. The subset is large enough to be compatible with common uses of the similar Datagear interface available for standard ZX Spectrum computers and compatibles. It also adds a burst mode capability that can deliver audio at programmable sample rates to the DAC device.

Accessing the zxnDMA

The zxnDMA is mapped to a single Read/Write IO Port 0x6B which is the same one used by the Datagear but unlike the Datagear it doesn't also map itself to a second port 0x0B similar to the MB-02 interface.

Since core 3.1.2 the zxnDMA is mapped to Datagear DMA Port ($xx6B / 107), and Zilog-DMA mode is mapped to MB02 DMA Port ($xx0B / 11).

Description

The normal Z80 DMA (Z8410) chip is a pipelined device and because of that it has numerous off-by-one idiosyncrasies and requirements on the order that certain commands should be carried out. These issues are not duplicated in the zxnDMA. You can continue to program the zxnDMA as if it is were a Z8410 DMA device but it can also be programmed in a simpler manner.

The single channel of the zxnDMA chip consists of two ports named A and B. Transfers can occur in either direction between ports A and B, each port can describe a target in memory or IO, and each can be configured to autoincrement, autodecrement or stay fixed after a byte is transferred.

A special feature of the zxnDMA can force each byte transfer to take a fixed amount of time so that the zxnDMA can be used to deliver sampled audio.

Modes of Operation

The zxnDMA can operate in a z80-DMA compatibility mode.

REMOVED in core 3.1.2: The z80-DMA compatibility mode is selected by setting bit 6 of nextreg 0x06. In this mode, all transfers involve length+1 bytes which is the same behaviour as the z80-DMA chip. In zxn-DMA mode, the transfer length is exactly the number of bytes programmed. This mode is mainly present to accommodate existing spectrum software that uses the z80-DMA and for cp/m programs that may have a z80-DMA option.

Since core 3.1.2: the DMA mode is selected by port number, the Datagear DMA Port ($xx6B / 107) works in zxnDMA mode, MB02 DMA Port ($xx0B / 11) works in Zilog mode. The bit 6 in Peripheral 2 Register ($06) is not DMA related any more and will be reused for something different.

The zxnDMA can also operate in either burst or continuous modes.

Continuous mode means the DMA chip runs to completion without allowing the CPU to run. When the CPU starts the DMA, the DMA operation will complete before the CPU executes its next instruction.

Burst mode nominally means the DMA lets the CPU run if either port is not ready. This condition can't happen in the zxnDMA chip except when operated in the special fixed time transfer mode. In this mode, the zxnDMA will let the CPU run while it waits for the fixed time to expire between bytes transferred.

Note that there is no byte transfer mode as in the Z80 DMA.

Programming the zxnDMA

Like the Z80 DMA chip, the zxnDMA has seven write registers named WR0-WR6 that control the device. Each register WR0-WR6 can have zero or more parameters associated with it.

In a first write to the zxnDMA port, the write value is compared against a bitmask to determine which of the WR0-WR6 is the target. Remaining bits in the written value can contain data as well as a list of associated parameter bits. The parameter bits determine if further writes are expected to deliver parameter values. If there are multiple parameter bits set, the expected order of parameter values written is determined by parameter bit position from right to left (bit 0 through bit 7). Once all parameters are written, the zxnDMA again expects a regular register write selecting WR0-WR6.

The table below describes the registers and the bitmask required to select them on the zxnDMA.

Register Group Register Function Description Bitmask Notes
WR0 Direction, Operation and Port A configuration
0XXXXXAA
AA must NOT be 00
WR1 Port A configuration
0XXXX100
WR2 Port B configuration
0XXXX000
WR3 Activation
1XXXXX00
It’s best to use WR6
WR4 Port B, Timing and Interrupt configuration
1XXXXX01
WR5 Ready and Stop configuration
10XXX010
WR6 Command Register
1XXXXX11

zxnDMA Registers

These are described below following the same convention used by Zilog for its DMA chip:

WR0 – Write Register Group 0

D7  D6  D5  D4  D3  D2  D1  D0  BASE REGISTER BYTE
 0   |   |   |   |   |   |   |
     |   |   |   |   |   0   0  Do not use
     |   |   |   |   |   0   1  Transfer (Prefer this for Z80 DMA compatibility)
     |   |   |   |   |   1   0  Do not use (Behaves like Transfer, Search on Z80 DMA)
     |   |   |   |   |                       
     |   |   |   |   |   1   1  Do not use (Behaves like Transfer, Search/Transfer on Z80 DMA)
     |   |   |   |   |                      
     |   |   |   |   0 = Port B -> Port A (Byte transfer direction)
     |   |   |   |   1 = Port A -> Port B
     |   |   |   V
D7  D6  D5  D4  D3  D2  D1  D0  PORT A STARTING ADDRESS (LOW BYTE)
     |   |   V
D7  D6  D5  D4  D3  D2  D1  D0  PORT A STARTING ADDRESS (HIGH BYTE)
     |   V
D7  D6  D5  D4  D3  D2  D1  D0  BLOCK LENGTH (LOW BYTE)
     V
D7  D6  D5  D4  D3  D2  D1  D0  BLOCK LENGTH (HIGH BYTE)

Several registers are accessible from WR0. The first write to WR0 is to the base register byte. Bits D6:D3 are optionally set to indicate that associated registers in this group will be written next. The order the writes come in are from D3 to D6 (right to left). For example, if bits D6 and D3 are set, the next two writes will be directed to PORT A STARTING ADDRESS LOW followed by BLOCK LENGTH HIGH.

WR1 – Write Register Group 1

D7  D6  D5  D4  D3  D2  D1  D0  BASE REGISTER BYTE
 0   |   |   |   |   1   0   0
     |   |   |   |
     |   |   |   0 = Port A is memory
     |   |   |   1 = Port A is IO
     |   |   |
     |   0   0 = Port A address decrements
     |   0   1 = Port A address increments
     |   1   0 = Port A address is fixed
     |   1   1 = Port A address is fixed
     |
     V
D7  D6  D5  D4  D3  D2  D1  D0  PORT A VARIABLE TIMING BYTE
 0   0   0   0   0   0   |   |
                         0   0 = Cycle Length = 4
                         0   1 = Cycle Length = 3
                         1   0 = Cycle Length = 2
                         1   1 = Do not use

The cycle length is the number of cycles used in a read or write operation. The first cycle asserts signals and the last cycle releases them. There is no half cycle timing for the control signals.

WR2 – Write Register Group 2

D7  D6  D5  D4  D3  D2  D1  D0  BASE REGISTER BYTE
 0   |   |   |   |   0   0   0
     |   |   |   |
     |   |   |   0 = Port B is memory
     |   |   |   1 = Port B is IO
     |   |   |
     |   0   0 = Port B address decrements
     |   0   1 = Port B address increments
     |   1   0 = Port B address is fixed
     |   1   1 = Port B address is fixed
     |
     V
D7  D6  D5  D4  D3  D2  D1  D0  PORT B VARIABLE TIMING BYTE
 0   0   |   0   0   0   |   |
         |               0   0 = Cycle Length = 4
         |               0   1 = Cycle Length = 3
         |               1   0 = Cycle Length = 2
         |               1   1 = Do not use
         |
         V
D7  D6  D5  D4  D3  D2  D1  D0  ZXN PRESCALAR (FIXED TIME TRANSFER)

The ZXN PRESCALAR is a feature of the zxnDMA implementation. If non-zero, a delay will be inserted after each byte is transferred such that the total time needed for each transfer is determined by the prescalar. This works in both the continuous mode and the burst mode. If the DMA is operated in burst mode, the DMA will give up any waiting time to the CPU so that the CPU can run while the DMA is idle.

The rate of transfer is given by the formula “Frate = 875kHz / prescalar” or, rearranged, “prescalar = 875kHz / Frate”. The formula is framed in terms of a sample rate (Frate) but Frate can be inverted to set a transfer time for each byte instead. The 875kHz constant is a nominal value assuming a 28MHz system clock; the system clock actually varies from this depending on the video timing selected by the user (HDMI, VGA0-6) so for complete accuracy the constant should be prorated according to documentation for nextreg 0x11.

In a DMA audio setting, selecting a sample rate of 16kHz would mean setting the prescalar value to 55. This sample period is constant across changes in CPU speed.

WR3 – Write Register Group 3

D7  D6  D5  D4  D3  D2  D1  D0  BASE REGISTER BYTE
 1   |   0   0   0   0   0   0
     |
     1 = DMA Enable

The Z80 DMA defines more fields but they are ignored by the zxnDMA. The two other registers defined by the Z80 DMA in this group on D4 and D3 are implemented by the zxnDMA but they do nothing.

It is preferred to start the DMA by writing an 'Enable DMA' command to WR6.

WR4 – Write Register Group 4

D7  D6  D5  D4  D3  D2  D1  D0  BASE REGISTER BYTE
 1   |   |   0   |   |   0   1
     |   |       |   |
     0   0 = Do not use (Behaves like Continuous mode, Byte mode on Z80 DMA)
     0   1 = Continuous mode
     1   0 = Burst mode
     1   1 = Do not use
                 |   |
                 |   V
D7  D6  D5  D4  D3  D2  D1  D0  PORT B STARTING ADDRESS (LOW BYTE)
                 |
                 V
D7  D6  D5  D4  D3  D2  D1  D0  PORT B STARTING ADDRESS (HIGH BYTE)

The Z80 DMA defines three more registers in this group through D4 that define interrupt behaviour. Interrups and pulse generation are not implemented in the zxnDMA nor are these registers available for writing.

WR5 – Write Register Group 5

D7  D6  D5  D4  D3  D2  D1  D0  BASE REGISTER BYTE
 1   0   |   |   0   0   1   0
         |   |
         |   0 = /ce only
         |   1 = /ce & /wait multiplexed
         |
         0 = Stop on end of block
         1 = Auto restart on end of block

The /ce & /wait mode is implemented in the zxnDMA but is not currently used. This mode has an external device using the DMA's /ce pin to insert wait states during the DMA's transfer.

The auto restart feature causes the DMA to automatically reload its source and destination addresses and reset its byte counter to zero to repeat the last transfer when a previous one is finished.

WR6 – Command Register

D7  D6  D5  D4  D3  D2  D1  D0  BASE REGISTER BYTE
 1   ?   ?   ?   ?   ?   1   1
     |   |   |   |   |
     1   0   0   0   0 = 0xC3 = Reset
     1   0   0   0   1 = 0xC7 = Reset Port A Timing
     1   0   0   1   0 = 0xCB = Reset Port B Timing
     0   1   1   0   0 = 0xB3 = Force Ready (irrelevant for zxnDMA)
     0   1   1   1   1 = 0xBF = Read Status Byte
     0   0   0   1   0 = 0x8B = Reinitialize Status Byte
     0   1   0   0   1 = 0xA7 = Initialize Read Sequence
     1   0   0   1   1 = 0xCF = Load
     1   0   1   0   0 = 0xD3 = Continue
     0   0   0   0   1 = 0x87 = Enable DMA
     0   0   0   0   0 = 0x83 = Disable DMA
 +-- 0   1   1   1   0 = 0xBB = Read Mask Follows
 |
D7  D6  D5  D4  D3  D2  D1  D0  READ MASK
 0   |   |   |   |   |   |   |
     |   |   |   |   |   |   V
D7  D6  D5  D4  D3  D2  D1  D0  Status Byte
     |   |   |   |   |   |
     |   |   |   |   |   V
D7  D6  D5  D4  D3  D2  D1  D0  Byte Counter Low ("High" with core 3.0.5 = bug in core)
     |   |   |   |   |
     |   |   |   |   V
D7  D6  D5  D4  D3  D2  D1  D0  Byte Counter High ("Low" with core 3.0.5 = bug in core)
     |   |   |   |
     |   |   |   V
D7  D6  D5  D4  D3  D2  D1  D0  Port A Address Low
     |   |   |
     |   |   V
D7  D6  D5  D4  D3  D2  D1  D0  Port A Address High
     |   |
     |   V
D7  D6  D5  D4  D3  D2  D1  D0  Port B Address Low
     |
     V
D7  D6  D5  D4  D3  D2  D1  D0  Port B Address High

Unimplemented Z80 DMA commands are ignored.

Prior to starting the DMA, a LOAD command must be issued to copy the Port A and Port B addresses into the DMA's internal pointers. Then an 'Enable DMA' command is issued to start the DMA.

The 'Continue' command resets the DMA's byte counter so that a following 'Enable DMA' allows the DMA to repeat the last transfer but using the current internal address pointers. I.e. it continues from where the last copy operation left off.

Registers can be read via an IO read from the DMA port after setting the read mask. (At power up the read mask is set to 0x7f). Register values are the current internal DMA counter values. So 'Port Address A Low' is the lower 8-bits of Port A’s next transfer address. Once the end of the read mask is reached, further reads loop around to the first one.

The format of the DMA status byte is as follows:

00E1101T

E is set to 0 if the total block length has been transferred at least once.

T is set to 1 if at least one byte has been transferred.

Operating speed

The zxnDMA operates at the same speed as the CPU, that is 3.5MHz, 7MHz, 14MHz or 28Mhz. This is a contended clock that is modified by the ULA and the auto-slowdown by Layer2 (which only occurred in Next core's 1 and 2, the limitation was lifted in core 3.0).

The (pre-core 3.0) auto-slowdown occurs without user intervention if speed exceeds 7Mhz and the active Layer2 display is being generated (higher speed operation resumes when the active Layer2 display is not generated). Programmers do NOT need to account for speed differences regarding DMA transfers as this happens automatically.

Because of this, the cycle lengths for Ports A and B can be set to their minimum values without ill effects. The cycle lengths specified for Ports A and B are intended to selectively slow down read or write cycles for hardware that cannot operate at the DMA's full speed.

The DMA and Interrupts

The zxnDMA cannot currently generate interrupts.

The other side of this is that while the DMA controls the bus, the Z80 cannot respond to interrupts. On the Z80, the NMI interrupt is edge triggered so if an NMI occurs the fact that it occurred is stored internally in the Z80 so that it will respond when it is woken up. On the other hand, maskable interrupts are level triggered. That is, the Z80 must be active to regularly sample the /INT line to determine if a maskable interrupt is occurring. On the Spectrum and the ZX Next, the ULA (and line interrupt) are only asserted for a fixed amount of time ~30 cycles at 3.5MHz. If the DMA is executing a transfer while the interrupt is asserted, the CPU will not be able to see this and it will most likely miss the interrupt. In burst mode, with large-enough prescalar value, the CPU will never miss these interrupts, although this may change if multiple channels are implemented.

Programming examples

A simple way to program the DMA is to walk down the list of registers WR0-WR5, sending desired settings to each. Then start the DMA by sending a LOAD command followed by an ENABLE_DMA command to WR6. Once more familiar with the DMA, you will discover that the amount of information sent can be reduced to what changes between transfers.

Assembly

Short example program to DMA memory to the screen, then DMA a sprite image from memory to sprite RAM, and then showing said sprite scrolling across the screen.

;------------------------------------------------------------------------------
    ; sjasmplus extra options to enable Z80N, stricter syntax and Next device
    opt --zxnext --syntax=abf : device zxspectrumnext
;------------------------------------------------------------------------------
;     DEFINE testing        ; uncomment to produce NEX file (instead of DOT)
;------------------------------------------------------------------------------
; DMA (Register 6)
;
;------------------------------------------------------------------------------
;zxnDMA programming example
;------------------------------------------------------------------------------
;(c) Jim Bagley
;------------------------------------------------------------------------------
DMA_RESET                      equ $c3
DMA_RESET_PORT_A_TIMING        equ $c7
DMA_RESET_PORT_B_TIMING        equ $cb
DMA_LOAD                       equ $cf ; %11001111
DMA_CONTINUE                   equ $d3
DMA_DISABLE_INTERUPTS          equ $af
DMA_ENABLE_INTERUPTS           equ $ab
DMA_RESET_DISABLE_INTERUPTS    equ $a3
DMA_ENABLE_AFTER_RETI          equ $b7
DMA_READ_STATUS_BYTE           equ $bf
DMA_REINIT_STATUS_BYTE         equ $8b
DMA_START_READ_SEQUENCE        equ $a7
DMA_FORCE_READY                equ $b3
DMA_DISABLE                    equ $83
DMA_ENABLE                     equ $87
DMA_WRITE_REGISTER_COMMAND     equ $bb
DMA_BURST                      equ %11001101
DMA_CONTINUOUS                 equ %10101101
ZXN_DMA_PORT                   equ $6b
SPRITE_STATUS_SLOT_SELECT      equ $303B
SPRITE_IMAGE_PORT              equ $5b
SPRITE_INFO_PORT               equ $57
;------------------------------------------------------------------------------

    IFDEF testing
        org $5800
        block 32*24, $38              ; default ULA attributes
        org $6000
    ELSE
        org $2000
    ENDIF

start
    ld   hl,$0000
    ld   de,$4000
    ld   bc,$800
    call TransferDMA                  ; copy some random data to the screen pointing
                                      ; to ROM for now, for the purpose of showing 
                                      ; how to do a DMA copy.
    ld   a,0                          ; sprite image number we want to update
    ld   bc,SPRITE_STATUS_SLOT_SELECT
    out  (c),a                        ; set the sprite image number
    ld   bc,1*256                     ; number to transfer (1)
    ld   hl,testsprite                ; from 
    call TransferDMASprite            ; transfer to sprite ram

    nextreg 21,1                      ; turn sprite on. for more info on this check 
                                      ; out https://www.specnext.com/tbblue-io-port-system/
    ld   de,0
    ld   (xpos),de                    ; set initial X position ( doesn't need it for
                                      ; this demo, but if you run the .loop again it
                                      ; will continue from where it was
    ld   a,$20
    ld   (ypos),a                     ; set initial Y position

.loop
    ld   a,0                          ; sprite number we want to position
    ld   bc,SPRITE_STATUS_SLOT_SELECT
    out  (c),a
    ld   de,(xpos)
    ld   hl,(ypos)                    ; ignores H so doing this rather than 
                                      ; ld a,(ypos):ld l,a
    ld   bc,(image)                   ; not flipped or palette shifted
    call SetSprite

    halt

    ld   de,(xpos)
    inc  de
    ld   (xpos),de
    ld   a,d
    cp   $01
    jr   nz,.loop                     ; if high byte of xpos is not 1 (right of 
                                      ; screen )
    ld   a,e
    cp   $20+1
    jr   nz,.loop                     ; if low byte is not $21 just off the right of
                                      ; the screen, $20 is off screen but as the 
                                      ; INC DE is just above and not updated sprite
                                      ; after it, it needs to be $21
    xor  a
    ret                               ; return back to basic with OK

xpos dw 0                             ; x position
ypos db 0                             ; y position
                                      ; these next two BITS and IMAGE are swapped 
                                      ; as bits needs to go into B register
image db 0+$80                        ; use image 0 (for the image we transfered)
                                      ; +$80 to set the sprite to active
bits db 0                             ; not flipped or palette shifted

c1 = %11100000
c2 = %11000000
c3 = %10100000
c4 = %10000000
c5 = %01100000
c6 = %01000000
c7 = %00100000
c8 = %00000000

testsprite
    db c1,c1,c1,c1,c1,c1,c1,c1,c1,c1,c1,c1,c1,c1,c1,c1
    db c1,c2,c2,c2,c2,c2,c2,c2,c2,c2,c2,c2,c2,c2,c2,c1
    db c1,c2,c3,c3,c3,c3,c3,c3,c3,c3,c3,c3,c3,c3,c2,c1
    db c1,c2,c3,c4,c4,c4,c4,c4,c4,c4,c4,c4,c4,c3,c2,c1
    db c1,c2,c3,c4,c5,c5,c5,c5,c5,c5,c5,c5,c4,c3,c2,c1
    db c1,c2,c3,c4,c5,c6,c6,c6,c6,c6,c6,c5,c4,c3,c2,c1
    db c1,c2,c3,c4,c5,c6,c7,c7,c7,c7,c6,c5,c4,c3,c2,c1
    db c1,c2,c3,c4,c5,c6,c7,c8,c8,c7,c6,c5,c4,c3,c2,c1
    db c1,c2,c3,c4,c5,c6,c7,c8,c8,c7,c6,c5,c4,c3,c2,c1
    db c1,c2,c3,c4,c5,c6,c7,c7,c7,c7,c6,c5,c4,c3,c2,c1
    db c1,c2,c3,c4,c5,c6,c6,c6,c6,c6,c6,c5,c4,c3,c2,c1
    db c1,c2,c3,c4,c5,c5,c5,c5,c5,c5,c5,c5,c4,c3,c2,c1
    db c1,c2,c3,c4,c4,c4,c4,c4,c4,c4,c4,c4,c4,c3,c2,c1
    db c1,c2,c3,c3,c3,c3,c3,c3,c3,c3,c3,c3,c3,c3,c2,c1
    db c1,c2,c2,c2,c2,c2,c2,c2,c2,c2,c2,c2,c2,c2,c2,c1
    db c1,c1,c1,c1,c1,c1,c1,c1,c1,c1,c1,c1,c1,c1,c1,c1

;-------------------------------------------------
; de = X
; l = Y
; b = bits
; c = sprite image
SetSprite
    push bc
    ld bc,SPRITE_INFO_PORT
    out (c),e ; Xpos
    out (c),l ; Ypos
    pop hl
    ld a,d
    and 1
    or h
    out (c),a
    ld a,l:or $80
    out (c),a ; image
    ret

;--------------------------------
; hl = source
; de = destination
; bc = length
;--------------------------------
TransferDMA
    di
    ld (DMASource),hl
    ld (DMADest),de
    ld (DMALength),bc
    ld hl,DMACode
    ld b,DMACode_Len
    ld c,ZXN_DMA_PORT
    otir
    ei
    ret

DMACode db DMA_DISABLE
        db %01111101                  ; R0-Transfer mode, A -> B, write adress 
                                      ; + block length
DMASource dw 0                        ; R0-Port A, Start address 
                                      ; (source address)
DMALength dw 0                        ; R0-Block length (length in bytes)
        db %01010100                  ; R1-write A time byte, increment, to 
                                      ; memory, bitmask
        db %00000010                  ; 2t
        db %01010000                  ; R2-write B time byte, increment, to 
                                      ; memory, bitmask
        db %00000010                  ; R2-Cycle length port B
        db DMA_CONTINUOUS             ; R4-Continuous mode (use this for block 
                                      ; transfer), write dest adress
DMADest dw 0                          ; R4-Dest address (destination address)
        db %10000010                  ; R5-Restart on end of block, RDY active 
                                      ; LOW
        db DMA_LOAD                   ; R6-Load
        db DMA_ENABLE                 ; R6-Enable DMA
        
DMACode_Len                    equ $-DMACode

;------------------------------------------------------------------------------
; hl = source
; bc = length
; set port to write to with TBBLUE_REGISTER_SELECT
; prior to call
;------------------------------------------------------------------------------
TransferDMAPort
    di
    ld (DMASourceP),hl
    ld (DMALengthP),bc
    ld hl,DMACodeP
    ld b,DMACode_LenP
    ld c,ZXN_DMA_PORT
    otir
    ei
    ret

DMACodeP db DMA_DISABLE
        db %01111101                  ; R0-Transfer mode, A -> B, write adress 
                                      ; + block length
DMASourceP dw 0                       ; R0-Port A, Start address (source address)
DMALengthP dw 0                       ; R0-Block length (length in bytes)
        db %01010100                  ; R1-read A time byte, increment, to 
                                      ; memory, bitmask
        db %00000010                  ; R1-Cycle length port A
        db %01101000                  ; R2-write B time byte, increment, to 
                                      ; memory, bitmask
        db %00000010                  ; R2-Cycle length port B
        db %10101101                  ; R4-Continuous mode (use this for block 
                                      ; transfer), write dest adress
        dw $253b                      ; R4-Dest address (destination address)
        db %10000010                  ; R5-Restart on end of block, RDY active
                                      ; LOW
        db DMA_LOAD                   ; R6-Load
        db DMA_ENABLE                 ; R6-Enable DMA
        
DMACode_LenP                   equ $-DMACodeP
;------------------------------------------------------------------------------
; hl = source
; bc = length
;------------------------------------------------------------------------------
TransferDMASprite
    di
    ld (DMASourceS),hl
    ld (DMALengthS),bc
    ld hl,DMACodeS
    ld b,DMACode_LenS
    ld c,ZXN_DMA_PORT
    otir
    ei
    ret

DMACodeS db DMA_DISABLE
        db %01111101                   ; R0-Transfer mode, A -> B, write adress 
                                       ; + block length
DMASourceS dw 0                        ; R0-Port A, Start address (source address)
DMALengthS dw 0                        ; R0-Block length (length in bytes)
        db %01010100                   ; R1-read A time byte, increment, to 
                                       ; memory, bitmask
        db %00000010                   ; R1-Cycle length port A
        db %01101000                   ; R2-write B time byte, increment, to 
                                       ; memory, bitmask
        db %00000010                   ; R2-Cycle length port B
        db %10101101                   ; R4-Continuous mode (use this for block
                                       ; transfer), write dest adress
        dw SPRITE_IMAGE_PORT           ; R4-Dest address (destination address)
        db %10000010                   ; R5-Restart on end of block, RDY active
                                       ; LOW
        db DMA_LOAD                    ; R6-Load
        db DMA_ENABLE                  ; R6-Enable DMA
DMACode_LenS                   equ $-DMACodeS
;------------------------------------------------------------------------------
; de = dest, a = fill value, bc = lenth
;------------------------------------------------------------------------------
DMAFill
    di
    ld (FillValue),a
    ld (DMACDest),de
    ld (DMACLength),bc
    ld hl,DMACCode
    ld b,DMACCode_Len
    ld c,ZXN_DMA_PORT
    otir
    ei
    ret

FillValue db 22
DMACCode db DMA_DISABLE
        db %01111101
DMACSource dw FillValue
DMACLength dw 0
        db %00100100,%00010000,%10101101
DMACDest dw 0
        db DMA_LOAD,DMA_ENABLE
DMACCode_Len equ $-DMACCode

;------------------------------------------------------------------------------
; End of file
;------------------------------------------------------------------------------

    IFDEF testing
        savenex open "DMAtest.nex", start, $FF00
        savenex bank 5
    ELSE
fin
        savebin "DMATEST",start,fin-start
    ENDIF

Based on original text by: Allen Albright & Mike Dailly with input by Jim Bagley, Lyndon J Sharp and Phoebus R. Dokos

Technical details (core 3.1.3+)

The Zilog/zxnDMA mode is now selected by using the particular I/O port number (Datagear DMA Port ($xx6B / 107) for zxnDMA mode, MB02 DMA Port ($xx0B / 11) for Zilog mode). The bit 6 in Peripheral 2 Register ($06) is not DMA related any more and will be reused for something different.

The "counter" RR1-RR2 read back after transfer has correct byte order since core 3.1.4.

Other differences described below in "3.0.5" remains (but from practical point of view the Zilog DMA emulation in 3.1.4 is near-perfect, all the remaining differences are very minor).

Technical details (core 3.0.5)

Zilog DMA compatibility mode

In Zilog DMA compatibility mode (bit 6 of Peripheral 2 Register ($06)) the zxnDMA will mostly work as expected, but there are few differences in behaviour which may eventually throw off some rare SW, here is the list of the known differences (most of them describe also how the zxnDMA mode works):

The LOAD command must be issued with correct transfer direction, loading addresses in opposite direction and flipping direction afterward will mismatch the source/destination address data (Zilog/UA858D DMA chips are also sensitive to direction flip after LOAD, but the resulting transfer quirks in different way, reading source data byte after write, offsetting whole transfer by one and damaging start/end of sequence). (does apply also to zxnDMA mode)

The LOAD command will NOT destroy the already issued "Initialize Read Sequence" - this is how even the original Zilog documentation describes the DMA chip operation, but the real Zilog DMA and UA858D (clone chip) both destroy read sequence upon LOAD command (zxnDMA is better). (does apply also to zxnDMA mode)

The content of registers read back after finished transfer differs: the counter has swapped LSB with MSB byte, and both addresses will be adjusted length+1 times (Zilog/UA858D will return destination address adjusted only length-many times). (does apply also to zxnDMA mode, except addresses are adjusted only "length" times of course, counter has still swapped bytes)

Any read of zxnDMA port without pending read request (commands "Read Status Byte" or "Initialize Read Sequence") will return status byte (Zilog will return random value vaguely similar to status byte, but incorrect, UA858D will return zero). (does apply also to zxnDMA mode)

Status byte doesn't have bit 0 set (the "T" bit in description above). (does apply also to zxnDMA mode)

Be aware that both custom timing cycles count, and prescalar values are preserved in zxnDMA even when future write to WR1/WR2 does skip these particular bytes. To reset prescalar or cycles timing, write explicitly zero into prescalar register or use commands reset/reset-port-timing. (does apply also to zxnDMA mode, the prescalar works only in zxnDMA mode)

The destination port address is LOAD-ed even when it is "fixed" type (Contrary to Zilog DMA, which requires you to load such port as "source", flip the direction after and re-LOAD again with correct direction. UA858D chip does also load destination port address in any case, just like zxnDMA). (does apply also to zxnDMA mode)

zxnDMA mode vs Zilog mode

In zxnDMA mode length of transfer is equal to the length written to WR0 register, port addresses are adjusted also only length-times.

Prescalar value will affect speed of transfer (use zero to switch prescalar off) (in burst mode during the extra idle time the CPU receives control back and can execute further instructions, in continuous mode the transfer will keep blocking CPU even when "slow" transfer is being done).