How to implement bus sharing / DMA on a 6502 system

Question

Was it possible for NMOS based 6502 system operate in the following fashion: while 6502 is not touching the bus one custom chip sets up memory read op, then two chips read the data off the bus at the same cycle. If yes what else beside a signal from the master chip to the slave chip was needed? The chips must have retained the ability to use the system bus separately. As far as I understand ANTIC and CTIA never read from their data pins at the same time

Raffzahn · Accepted Answer · 2019-11-10 17:24:03Z

The 6502 is one of the few CPUs without build in capabilities to share the bus (*1,2). It always drives the bus. Any sharing will need external logic to detach the CPU and let others drive the bus (see CMOS Enhancements below as well).

Full bus access can be gained (*3) by waiting for any non write cycle (R/W = high during PHI2 = high) and halting the CPU by pulling RDY lowprior to the falling edge ofPHI2` (see CMOS Enhancements below as well). The CPU will halt operation and external logic can be used to decouple bus control signals allowing external circuitry to access. With the bus acquired timing can be complete independent from the CPU clock, thus as well allowing asynchronous as well as faster access. After releasing RDY the CPU will continue as before (*4).

Now, since the 6502 only uses the bus during half of it's cycle (PHI2 = high) there is a possibility to offer

Hidden Access by restricting non-CPU access to PHI2low time. This should always be possible (*5). Here the BUS gets simply detached at the start of PHI2 = low and reattached when PHI2 gets high again.

While this seams simple at first it carries the disadvantage that all access must be guaranteed to be finished within this time, as well as being in sync with the CPU. While the first may be no problem for single byte access, it gets a distinct hurdle when more than a byte must be accessed.

The second means, that if the access has to be hard coupled to a external clock (like video), the CPU has to be clocked accordingly - either by having it run at video access clock (or multiples thereof), like for example the C64, or by applying a variable speed clock, like the Apple II did.

In case of the C64 tying the CPU to video clock, to enable hidden access for the VIC, could only solve part of it, as memory bandwidth still wasn't sufficient for full display resolution. Here the VIC had to stop the CPU every 8th line to refill it's buffer in addition to access RAM during PHI2 = low

The 8-bit Atari setup of SALLY, (G/C)TIA and ANTIC is a more complex beast but more straight forward at the same time. To start with, while SALLY was basically a NMOS 6502 CPU, it could be directly stopped by asserting a /HALT signal on pin 35 (*6). When applied, the CPU not only halted operation, but as well put all bus signals into tristate to allow full external access. In addition, Address (and data) bus were always tri-stated during PHI2 = low, automatically freeing the bus for hidden access without external logic.

Now wile the ANTIC accesses video memory, it's not just a stupid graphics chip, but a processor in it's own right, executing a program, the display list. For any access it halts the CPU. This is done in three cases:

Programm access (Display list)
Graphics data (Playfield)
Sprite data (Player/Missile)
RAM refresh

(Once again it's quite obvious where the Amiga design is originated)

Using DMA of course slows the CPU down to speeds between ~90% with low res (mode 8/9) down to ~60% with high res. Then again, the CPU is clocked at 1.8 MHz, so this translates to 1.0 .. 1.6 MHz, which is still quite fast compared to other machines of the same or even later time (*7).

So if there's an intention to build an Atari like system, a standard NMOS 6502 will only fit with a lot of external components. A R65102 might save some of them but may be hard to come by, while a 65C02 is eventually the closest to a SALLY that was (and still is) sold. Though it does need some tweeking to work like in an Atari.

CMOS Enhancements

While this all of this is true for the NMOS version, the 1982 CMOS does aid bus sharing by having tri-state buffers included, controlled by the 'new' BE (*2). It will put all R/W as well as all address and data lines into high impedance. BE operates exactly like external buffers by being non sychronized with clock or bus cycles. All prior caution about bus synchron operation needs to apply.

In addition RDY signal is now obeyed for write cycles as well, taking away the need to create a bus available signal that includes R/W. Bus access can now be applied basically any time.

Using a CMOS version will save at least 4 external TTL chips for bus detachment and remove the need to wait for write operations to finish when acquiring the bus - also it's the only one available today, so I'd go for it.

General advice for system design with DMA:

It is always a good idea to see what memory regions do need DMA at all and only share these. Instead of stopping the CPU every time a DMA access happens, have them act independent as long as they do not interfere - that's as in accessing the same component. Sinclairs Spectrum is a great example, as the CPU only gets stopped when accessing the lower 16 KiB of same which it shares with video. All other RAM and ROM access is handled at full speed, even while video reads its RAM. This ofc only works as great due the Spectrum having two independent RAM devices. But even with a single RAM (16 KiB Spectrum), the method still provides considerable speedup for ROM code. The same is true for a 6502 as more than 70% of all reads are code fetches.

With concurrent access restricted to RAM, ROM code (like BASIC) will only be slowed when a collision happens during RAM access. With device specific sharing the mentioned CPU halt during VIC buffer load would have been less influential for BASIC programs or ROM based games.

So when using external buffers, putting them not around the CPU, but before the RAM may bring additional performance gains without much additional hardware.

*1 - The original NMOS 6501 did feature a protocol for bus arbitration.

*2 - While the 6502 didn't offer it, Rockwell's NMOS R65102/R65112 did - and so did Atari's (NMOS) SALLY.

*3 - Strictly speaking, pulling RDY does not stop the CPU, but extends the current cycle. So this isn't really a feature for bus sharing, but access extention for low speed peripherals that can be twisted to insert external bus cycles.

*4 - It might be still advertised to synchronize this with a rising edge of PHI2 to make sure CPU (and memory) gets a full phase to finish the resumed bus operation.

*5 - Unless that time is already used by other mechanics, like DRAM refresh.

*6 - In addition the R/W pin was moved from pin 34 to pin 36.

*7 - C64 being ~0.98/1.02 MHz (PAL/NTSC) with slowed down to less than 0.9 when using full graphics.

There are some nice schematics available here atariage.com/forums/topic/… on using a standard 65(C)02 in an 8-bit Atari, showing the HALT and Tr-state logic (straightforward TTL) needed for DMA bus access in those systems. — StarCat, Commented Nov 11, 2019 at 8:06
@StarCat yes, it's a nice hack, it would in theory work for a NMOS, as /HALT done by stopping the clock, in reality it only works with a 65C02. The NMOS CPU uses dynamic storage and can (by spec) only be halted for 10us. The longest continuous DMA sequence (IIRC) is ~90 clocks or 45 us in mode 2. So it won't work with an NMOS CPU. Now since a 65C02 is already required to make it work, the buffers (244/246) can be replaced by pulling BE low. In addition the logic to stop the clock can be replaced by a much simpler handling of RDY. — Raffzahn, Commented Nov 11, 2019 at 8:31
I don't know how long the DMA sequences in the original Atari were, but you are probably right about the fact that it might have stopped the clock for too long on an NMOS 6502. I've adapted and built the above (or similar) schematic for a 65C816 many years ago and it certainly did work with this CMOS CPU. It looks like in the Atariage schematic they did not make use of the BE pin because they were simply unaware of its existence (even left pin 36 floating). — StarCat, Commented Nov 11, 2019 at 9:58
Ok, just checked, real continuous DMA only happens in (cahracter) mode 2..5, longest with wide playfield, resulting in 96 clocks or 54 us DMA (normal is 80 clk, narrow is 64 clk). All other modes have at maximum 8 clocks in a row (for displaylist and player/missile access at the begin of a line) — Raffzahn, Commented Nov 11, 2019 at 10:37
Thank you very much for such a detailed reply! So much useful info to digest.. — Anton, Commented Nov 29, 2019 at 14:00

Chromatix · Accepted Answer · 2019-11-10 11:06:37Z

The 6502 drives the address bus full-time, but the "Phi2 low" or "Phi1" phase of the bus can be (and often was) used for non-CPU data transfers without interfering with the CPU's performance. Most often the Phi1 phase was simply dedicated to video display reads, with the side-effect of performing DRAM refresh cycles.

If your system design includes this type of display logic, you'll need to suppress your other DMA cycles while the display is actively driving pixels, ad perform them only during blanking intervals.

You will in any case need memory that's fast enough to perform a complete read or write cycle in just the Phi2 phase of the CPU cycle, and buffers to isolate the CPU's address and data pins from the bus so that the DMA transfer can be substituted. You'll also need to account for the NMOS CPU's relatively weak drive towards logic high when handing control of the bus back to it.

Using a CMOS CPU might be easier, not least because many versions of it came with a "bus enable" or "data bus enable" signal by which the address/data or only data lines could be tri-stated at the CPU itself, saving you a buffer.

It was commonplace for Atari 800 code to actually read images from disk this way, using the ANTIC co-processor. Wrote such a program myself once (the eponymous Teddraw). — T.E.D., Commented Nov 11, 2019 at 18:51

Richard Broadhurst · Accepted Answer · 2019-11-18 08:23:08Z

2

There was a simpler but much more expensive solution used in the 1981 BBC Micro, it used 4MHz RAM and a 2MHz 6502A and let the 6845/VideoULA have the other 2MHz (every other 4MHz cycle). There is a modern blitter for the BBC Micro, but it wouldn't have been available BITD. https://stardot.org.uk/forums/viewtopic.php?t=14125

answered Nov 18, 2019 at 8:23

Richard Broadhurst

4414 silver badges4 bronze badges

That's the 'hidden access' solution using the first half of the 6502 cycle. The majority of 6502 used this, from PET and Apple on.
– Raffzahn
Commented Nov 18, 2019 at 11:04

Add a comment |

Stack Exchange Network

How to implement bus sharing / DMA on a 6502 system

3 Answers 3

You must log in to answer this question.

Hot Network Questions

How to implement bus sharing / DMA on a 6502 system

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions