RAMdisk under CP/M 2.2

Bruce Tomlin · Feb 28, 2023

If you want something that doesn't use address space and has its own counter, then do like the TMS 9918, only with more memory of course. OUT to a command port to set the address, then do INIR or OTIR to the data port. You could even just output just the LBA, and have that reset a 7-bit counter to zero.

Eudimorphodon · Feb 28, 2023

cj7hawk said:
I considered just paging, but I wanted a compatible architecture, with backwards compatibility to older apps, and I have no idea where the DMA might end up - and unless I want some freaky code to keep track of it all, I might end up paging out the DMA - which would be a bit messy. Also I figure people are going to do strange things like moving the DMA around while making calls, so having the DMA source in I/O land makes a lot of sense.

FWIW, if you’re using the CP/M disk routines to drive the data transfers this isn’t going to be a problem? (IE, the code is already written so you know where DMA resides going into a disk access.) There’s a defined call to set the DMA location and you can’t change it literally in the middle of a sector read. You don’t have actual DMA hardware doing this so you can’t task switch and do another process.

An elegant way to handle this might be to have the disk code always just switch to its own system context, so instead of switching the disk page over a page of the application you double-map the page of the disk-using application that has the disk buffer into the system context. Didn’t CP/M Plus use a strategy of keeping most of the OS and driver code in its own paging region, so pretty much every system call had the potential to do a page switch?

cj7hawk said:
4) Accessing the screen without paging - yep, you guessed it - I/O space or as a file. Direct bitmapped high resolution 128K video memories can be accessed by a 64K program and it works with character I/O or high resolution bitmaps ( and hardware graphics, but that's another element ) - I/O is one thing, but what if the video moves around? Why not just open "VIDEO.MEM" as a file, and random I/O ? Now even less memory is required.

Isn’t performance going to take a pretty serious hit here? I don’t know what your video system is going to look like but if you really need to maintain a 128K bitmap adding the overhead of packing every screen update into a 128 byte disk call seems really expensive. (What would it look like to, say, draw a diagonal line over existing content on the screen? That’s an operation that might mean looking in a different sector for every pixel to be written?) Even just scrolling a text screen (if there’s no hardware scrolling, figure something like a TRS-80 or MDA text buffer) will mean reading in 8-32 sectors into a buffer in TPA RAM, shifting the contents over, and writing them back out, incurring all the overhead of disk calls. Seems like if you paged it in you could do that same scroll operation with one set of memory string moves.

cj7hawk · Mar 1, 2023

Bruce Tomlin said:
If you want something that doesn't use address space and has its own counter, then do like the TMS 9918, only with more memory of course. OUT to a command port to set the address, then do INIR or OTIR to the data port. You could even just output just the LBA, and have that reset a 7-bit counter to zero.

Thanks for the suggestion, but I have a very specific screen architecture planned so as to meet the original program objectives. Also I am really looking forward to designing the hardware vector graphics routines in TTL

I want to get a few megapixels per second draw rate on the screen if I can

There's going to be a lot of counters involved.

cj7hawk · Mar 1, 2023

Hi @Eudimorphodon , the 128K will rarely be used and only in special lower-speed modes, where high resolution or color are required, and most high speed graphics would occur within around 56K of screen space. To meet the original specs, I'll probably have to play with the vector routines so that there's a mix of 256x192 and 512x192 onscreen at the same time - which makes sense if you consider that I'm designing for low-resolution monitors, and the bit clock will be the same for both modes, with the number of colors simultaneous being affected by the color depth per pixel - eg, 1 byte = 2 pixels at 512, with 4 bits per pixel, and 1 pixel at 256 with 8 bits per pixel.

And yeah, that's pretty slow, but part of the spec is hardware acceleration around that, so characters can hopefully be drawn in about 8-16uS with a two dimensional DMA, and vector graphics much the same - probably around 250ns per pixel or so... I've sketched out some block diagrams and it looks doable - I just need to find the best way to assemble it with the least hardware and work out what is important. Actually, it will probably take longer just to set up a 2D DMA than to execute it -

Scrolling won't be too bad - similar, I can just set the scroll to happen via the hardware copy, and move things around pretty quickly. I'm also trying to work out whether I can preprogram various transfers into the DMA via a cache - but that might make it too complex for the PCB size constraints.

So I'm hoping that the screen routines should be reasonably fast, and since screen output can be via serial (eg, to a terminal or external computer ) or via a hook to some software that drives the hardware accelerator, I'm hoping it should be pretty quick... Or I could cheat and have multi-plane graphics, with text overlaid a graphics plane that is bit addressable.

It all depends on how I end up rolling the final design, but speed is the objective. Still, the raster can be accessed directly, and either opened as a file, or the program can page it in, in 4K sections and access directly. Either way, it is, as you note, a LOT of memory to throw around just to get a few characters on screen.

The screen isn't emulated at all, so all I/O is via simulated console port at the moment, while I write the software. I still have to finish more of the hardware design and order some PCBs and assemble pieces yet before I can get to the next stage.

Eudimorphodon · Mar 1, 2023

cj7hawk said:
Hi @Eudimorphodon , the 128K will rarely be used and only in special lower-speed modes, where high resolution or color are required, and most high speed graphics would occur within around 56K of screen space.

Even then… to use my example of splatting a diagonal line onto a bitmap screen on top of existing content (so you’ll need to read the target bytes and do an OR operation or whatever) onto a *monochrome* 512x192 linearly-organized bitmap (12k of RAM) you’d have to do 96 sector reads and writes in order to draw a line from the top to the bottom of the screen using the disk paradigm. (Each mono line of pixels is 64 bytes, so you get two per sector.) That means the operation will cost 24 thousand times 21 t-states for the disk operations alone. Then it’s 384 memory read/writes to the copied buffers to actually set the pixels, which is the same number of total operations that you’d need if you paged the memory in. So that’s a roughly 500,000 cycle overhead for this (admittedly worst case, but not unreasonable example) to use the CP/M disk driver instead of paging.

Bruce Tomlin · Mar 1, 2023

cj7hawk said:
Thanks for the suggestion, but I have a very specific screen architecture planned so as to meet the original program objectives. Also I am really looking forward to designing the hardware vector graphics routines in TTL I want to get a few megapixels per second draw rate on the screen if I can There's going to be a lot of counters involved.

I wasn't talking bout graphics, I was talking about ramdisks, but whatever.

cj7hawk · Mar 1, 2023

Bruce Tomlin said:
I wasn't talking bout graphics, I was talking about ramdisks, but whatever.

Ahh, I think I see what you're suggesting - I'm not sure the 9918 supports the kind of memory passthrough I'm looking before, and I'm using the block counter inside the z80 to provide addresses automatically already, which is why the only command I can use is INDR followed by an immediate IND to achieve the same result - it places the B register on A8 through A14 which is mapped to A0 to A6 on the RAM, so there's no setup or registers to change outside of an OUT to the sector and track registers.

It's a purely z80 based quirk that lets me use a port address that is automatically counted by the processor with a specific command -

But it does make me curious to go and have a look again at the TMS 9918 - thanks.

cj7hawk · Mar 1, 2023

Eudimorphodon said:
Even then… to use my example of splatting a diagonal line onto a bitmap screen on top of existing content (so you’ll need to read the target bytes and do an OR operation or whatever) onto a *monochrome* 512x192 linearly-organized bitmap (12k of RAM) you’d have to do 96 sector reads and writes in order to draw a line from the top to the bottom of the screen using the disk paradigm. (Each mono line of pixels is 64 bytes, so you get two per sector.) That means the operation will cost 24 thousand times 21 t-states for the disk operations alone. Then it’s 384 memory read/writes to the copied buffers to actually set the pixels, which is the same number of total operations that you’d need if you paged the memory in. So that’s a roughly 500,000 cycle overhead for this (admittedly worst case, but not unreasonable example) to use the CP/M disk driver instead of paging.

Hi @Eudimorphodon ,

I've replied here: https://forum.vcfed.org/index.php?t...tch-compatible-with-cp-m2-2-programs.1242138/ to avoid hogging Mykes thread further - Thanks, David.

Eudimorphodon · Mar 1, 2023

Bruce Tomlin said:
I wasn't talking bout graphics, I was talking about ramdisks, but whatever.

@Chuck(G) mentioned earlier using SPI (P)SRAMs for a RAMdisk, if you just wanted a RAM disk, and that inspired me to take a look at the datasheet for one. And I've got to say, if you were willing to deal with the hassle of doing the parallel bus->SPI interface (and voltage shifters) you could sure get a heck of a lot of bang for the buck out of that. Here's a chip that Digikey has in stock for $3.66 quantity one; it's four megabytes in size, and it uses an autoincrement read/write system where it looks like after being fed a single 24 bit starting address you can access up to 1024 bytes in a row without feeding it another address.

Seems like writing a RAMDISK driver for this would be trivial. Just convert your track/sector address into a 24 bit memory address by shifting them over the appropriate number of steps, write the starting address, and go to town reading and writing however many bytes you chose for your sectors. Even in strictly serial mode it looks like this thing could potentially run at around 4 megabytes/sec transfer rates with a 33mhz SPI clock, which is far faster than any Z80 computer needs.

I have some dinky little DIP versions (only 32KB) of these chips lying around, I should breadboard something up to try reading/writing one from the TRS-80.

Bruce Tomlin · Mar 2, 2023

cj7hawk said:
I'm not sure the 9918 supports

I said like the 9918, not to actually use one. And proactively I'm also going to point out I didn't mean any of its other registers either, just a port to set the address and a port to read/write data.

Eudimorphodon said:
could potentially run at around 4 megabytes/sec

Yes, but I'm not aware of a Z80 with a built-in SPI controller.

Eudimorphodon · Mar 2, 2023

Bruce Tomlin said:
Yes, but I'm not aware of a Z80 with a built-in SPI controller.

There are schematics to cobble one together with shift registers/a UART and some other discrete parts. (The exact complexity depends on how much you want it to look just like a parallel I/O port..) But yes, it's definitely less straightforward than parallel memory. There used to be some single chip parallel master-to-SPI bridge chips intended for this, I'd seen datasheets, but when I last looked they were either no longer in production and/or whatever stock that might be around had positively loony-toon pricing associated with it. (SPI *slave* to 8-bit chips are out there, annoyingly. Apparently more people want to use ancient I/O chips with modern CPUs than the reverse.)

Of course, some MCUs, like various PIC family members, have support for acting like an 8-bit bus peripheral; just wedge one between your Z80 and the SPI devices and bob's your uncle, but obviously that feels like cheating.

EDIT: SPI interface for a Z80. This is driving an SD card, but one of those PSRAM chips should work just fine with it. This requires a "do the thing" out command on every byte; I'm thinking you could probably fit a state machine to do that part with a GAL so you can just bang-bang-bang reads or writes.

Bruce Tomlin · Mar 2, 2023

I wasn't asking how to build one, I was trying to hint at the point that it's not something native to the Z80, and requires a lot more than just a few loadable counter chips.

Eudimorphodon · Mar 2, 2023

Bruce Tomlin said:
and requires a lot more than just a few loadable counter chips.

The linked example is five chips including the decoder. That’s not a lot considering preloadable counter chips are usually four bits wide. So figure for the parallel RAM version you’ll need two for sector addressing, two eight bit latches for the upper addresses, and something to glue the counters to the read/write signals to implement the auto increment? With decoding looks about like a tie to me.

MykeLawson · Mar 4, 2023

cj7hawk said:
I should add that as a RAM disk - and an O/S, it does some allow some crazy behavior, since a called routines can see the program that called it and all of it's data - So for example, a word processor could embed a document with a basic editor, and call a "printer" routine, that would open the "document" file as a file, then go in, extract the document, and print it, then return control to the document. Or it could execute a "spell check" file that again opens the original workspace that is still executing as a file, then corrects the document, and returns control. It's a crazy idea that would have allowed for some powerful programming architectures that never existed... And of course, simple stuff like task switching is easy to do. It's a super-modular system in which the OS wears it's heart on it's CP/M generated sleeve.
Now I'm going to apologize to Myke for hijacking his thread.
I'll open up a new thread to show some of how it works so far this weekend.

David.

David, sorry I overlooked this. Nothing to apologize for. You didn't hijack it, in fact I enjoy when things get people's thinking going in lots of directions. Heck, I pick up all sorts of tidbits along the way.

MykeLawson · Mar 4, 2023

Well, I have been wrapping the wires on this thing, and only have the internal data bus stuff left to do. Then I can start the rudimentary testing before writing up the machine code to use. Anyway, progress, somewhat intelligently, being made.

cj7hawk · Mar 5, 2023

Thanks Myke - but I did take it a fair way offtopic, so an apology was still due

I'm waiting to see how your new project comes out - It's a shame that it's not still the 80s or that there are no longer any modern CP/M machines... I wonder if it might ever make a resurgance, after all, the occaisonal latecomer such as you and I still pick it up after all these years.

We'll probably have to wait until the human race switches to CP/M en-masse to confuse our new AI overlords as It's probably the only language/OS they don't understand and that is inconsistent enough to trip them up.

David

MykeLawson · Mar 13, 2023

Okay, the hardware is built, and the rudimentary testing (via MBASIC) has been done and able to write & read the memory. However, if my calculations are correct, it will take over 14 hours to test all 12 bit patterns I'd like to test in the 196K of memory space. And that ain't gonna happen. I figured that, and have already started the mental design of a machine language test. But I do have a 'best practice' question as it relates to cycling through the 12 bit patterns....

The loop I'm going to go with will:
1) set the track number (upper address latch)
2) set the sector number (middle address latch)
3) set the byte number (low address latch)
4) write the first bit pattern
5) read it back and compare
6) if there is a mis-match, jump to error routine
7) increment the byte number and jump to 3 unless byte equal 0, in which case increment the sector and jump to 2
8) if both byte and sector equal 0 then jump to 1
9) if all three equal 0 then load the next of the 12 bit patterns and jump to 1

I'm good with all of that, but I'm trying to find the best and easiest way to cycle through the 12 bit patterns I want to use: 00, 01, 02, 04, 08, 10, 20, 40, 80, FF, AA, & 55. That has me kinda stumped. The ideas I have had so far, using DEFW or DEFB seem to be kinda clumsy.... So, I'm open to some insight. Thanks, Myke

durgadas311 · Mar 13, 2023

You might want to think about what you want to test: the RAM, the mapping hardware, etc. Also, writing the same byte to all of memory may miss some types of errors. You might think about a rolling pattern, even something as simple as an incrementing BCD byte (0x00 ... 0x99) - which is pretty easy to do with the DAA instruction. You keep a "seed" byte that is the first value, and then you can vary the pattern on each pass. This pattern is pretty much guaranteed not to repeat when used on any power-of-two sized storage (i.e. will catch errors that repeat on address-line patterns). You also probably want to write a pattern to all of RAM first (different pattern to each sector, or a rolling pattern that doesn't repeat on address lines), then check it, as that ensures the mapping is working correctly (otherwise, you can't tell if you're re-writing the same sector over and over).

Eudimorphodon · Mar 13, 2023

MykeLawson said:
However, if my calculations are correct, it will take over 14 hours to test all 12 bit patterns I'd like to test in the 196K of memory space. And that ain't gonna happen.

Why not? 14 hours is a good overnight burn-in.

cj7hawk · Mar 14, 2023

Congratulations on another working project !

Is the test running under MBASIC, or is it in assembly? 14 hours seems a but much for 512K of Ramdisk... 14 minutes I could understand.

Did you manage to integrate it with the shell, or are you using commands from the command line to access it?

I'd love to see the code you generated to test it -

David

RAMdisk under CP/M 2.2

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Experienced Member

Veteran Member

Experienced Member

Experienced Member

Attachments

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member