Introducing LokiOS. A new version of CP/M, written from Scratch, Compatible with CP/M2.2 Programs.

cj7hawk · Feb 28, 2023

Hi All,

Thought I'd stick this in it's own thread as I progress, since it's come up in many past threads... So I've been writing ( written? ) a new CP/M for Z80 based on CPM/2.2 in much the same way as DOS was created - I began with the system calls and recreated it, then worked to build compatability. It's pretty much like CP/M2.2, but does change a little bit, and there's still bugs.

So I figure I'd start with what the Loki was.

In 1985, Sir Clive Sinclair ( who built the zx80, zx81, zx Spectrum, QL, etc ) was about to go bankrupt. Trying to stave off his creditors, he proposed a new computer system that would outperform most and equal the very best in the way they were the very best. He intended to steal all the great ideas and build them into a new computer called the Loki, and then sell it for around 10 to 20% of what those computers cost.

It had interesting specs.

512 x 192 graphics, 64 colours per pixel, four pixel planes, and built in Bit Blit as seen on the Amiga 1000
Synthesizer quality sound ( think Sound Blaster which came out much later )
128K minimum of memory, heading up towards 512K.
ZX Spectrum Backwardly compatible.
8MHz z80 or maybe even a 12MHz z280
Modular design
And Hardware Vector Graphics.
Light Pen

All for 200 GBP.

It was said to be impossible at the time... however after examining all of the records, there were some specifications that made sense given the level of hardware development of the ZX Spectrum at the time, and I believe it was entirely possible to achieve the goal within the technology limitations in late 1985. All by literally stealing the best ideas from every other computer, just as Sir Clive said he would.

We never got to see the Loki, and it's been a long term goal of mine to finally build it, using only the technology of the era, and within the price constraints of the era.

Oh, and unlike all of Sir Clive's other computers, this one was going to run CP/M software.

Seems he was finally learning from his earlier misteps and realized the best ideas are stolen from others... Or, as the master once said, Absorb what is useful, Discard what is not and add something of your own.

Because I haven't had much time to work on the hardware, I began writing an assembler, then an emulator, and finally a new version of CP/M which I simply call LokiOS. It looks like CP/M, runs like CP/M, and so far, seems to run CP/M software OK. I've been testing it with various BASICs for CP/M so far. Working in an emulator helped me accelerate software development, and has been a great base for learning new skills, as well as better understanding the day to day skills I need for my day job.

The architecture of this intends to aim for up to 1Mb of memory, in a truly abstract model, that is simple to make in Hardware, to achieve the low cost goals of the project. The memory uses what I call Memory As Disk, reusing CP/Ms disk handling routines to control and populate memory management tables, and allow programs to easily break the 64K limit set by the default z80 architecture, even if they don't know anything about paging, or don't support it.

The basic architecture of the memory system looks something like this ( below ).

I am still building it, getting rid of the bugs, and even changing my mind at times. All thoughts and comments or questions welcomed

Thanks
David

granzeier · Feb 28, 2023

Interesting.

With all that RAM, do you intend to increase the disk storage capabilities, over standard CP/M?

cj7hawk · Mar 1, 2023

I intend to implement a multi-bios architecture, so that each storage device can have it's own BIOS that contains the code it needs to run - and any other test routines, like a built-in mini-disk and as such, with a small hardware interface, a drive adapter can be truly hardware agnostic - so there is no built-in drive architecture outside of the necessity to confirm to CP/M standards. It should support floppies, hard disks and just about anything.

Any hardware that is installed will get some "execute time" on cold boot, and can hook itself into the BIOS as well. It can either monitor output ports for track/sector information, or read it from the DPH ( where I'll formalize the scratchpad a little bit as other CP/Ms did ) when it executes on a read or write function. At that point, it can page into the FDOS area and do pretty much whatever it needs to do, allowing for mixed hardware - also there will be function for serial-connected storage (network) based on common IP protocols of the era (SLIP) as an option also. Because the L: (local) drive is a boot rom, that also stores commands, I'll implement the IP protocols as commands and then just need a hook to make them present as a disk interface. I'm still mulling over the best way to do that, but the hardware architecture is the same however I implement the code.

Drive L: and M: have been formalised for Local and Memory disk structures, and I'm thinking about formalising N: for network mounts - which will follow a very simple protocol of sending a single request for read/write with embedded track/sector/... I've even seriously considered making drive A: external like that, and running all drives over high speed serial bus... A small IP network using SLIP as the framing protocol and then make all storage devices serial-attached IP based devices, working over normal asynchronous serial protocols.

I'm also looking for ways to implement subdirectories at some point, and am leaning towards opening a file in the root that leads to a directory image that contains it's own directory, because the idea of retrofitting a directory structure like MSDOS has to CP/M is a little bit more complicated while trying to maintain compatibility.

Where I'm currently up to:

* Built the cross-assembler to work under Windows 10 command line (written in freebasic)
* Built the emulator to run under Windows 10 command line ( written in freebasic also )
* Written the basic CP/M operating system, including the CCP, A built-in basic monitor ( to which I want to add an assembler ) -
* Written the FDOS ( BIOS and BDOS )
* Tested with Microsoft Basic ( version 5 or thereabouts ).
* Created an updated emulator to support the MAD disk function so I can write that code. It also supports the MMU emulation.
* Completed the Bootstrap that builds a file I can burn onto an eprom, and should boot the computer when hardware is made.

The bootstrap is the latest part, and with it, drives A: (external DMA based FDD 720K emulator), Drive M: and Drive L: (ROM) are all working for the first time, and it's interesting to see how they interplay. It's the first time I've had more than one drive running, and there's some bugs in my code for copy, and also I can't execute com files from a non-logged disk, but I can set a search path for COM files and I can perform local disk functions OK.

It's slow going - but it's coming along. So far the debugging suggests that the Memory as Disk aligns perfectly with the MMU so I'm pretty happy the way it's going... And just hoping I won't jump to real hardware only to find I'm emulated something wrong and all my code is messed up. That happened once at the start when I wrote some instructions wrong, then got my flags set up, and when I fixed the emulator it took me a full day to get my code working again.

David

usotsuki · Mar 1, 2023

Wouldn't be the first time, MSX-DOS implements the CP/M 2.2 APIs on FAT12.

granzeier · Mar 1, 2023

Very cool ideas.

cj7hawk said:
...
* Built the cross-assembler to work under Windows 10 command line (written in freebasic)
* Built the emulator to run under Windows 10 command line ( written in freebasic also )
...

David

See, and stuff like this is why I call serious BS on people who claim that BASIC is a terrible language and you cannot do anything with it.

cj7hawk · Mar 1, 2023

usotsuki said:
Wouldn't be the first time, MSX-DOS implements the CP/M 2.2 APIs on FAT12.

I assume you mean at the FDOS level, rather than the BIOS level? That would be a difficult thing to do with a bios alone translating.

edit: OK, looked it up - Interesting - I should spend a little more time looking at at that. - What I am doing is definitely CP/M and works like CP/M and uses the CP/M disk format. I'll depart somewhere along the line, but all of the CP/M functions should work normally... Or close enough that most programs still work. Thanks for mentioning that -

cj7hawk · Mar 1, 2023

granzeier said:
Very cool ideas.

See, and stuff like this is why I call serious BS on people who claim that BASIC is a terrible language and you cannot do anything with it.

Thanks for that, it made me laugh. Because I learned BASIC when I was a kid, it ( and assembly ) are native languages to me, and I can pick them up and dust them off, and use them straight away without learning curve. All other languages I have to spend a lot of time learning and relearning when I don't use them.

Freebasic is a pretty nice BASIC, and it's free. Compiles well under Windows 10 and 11. and I hate operating in a virtual machine, so I write my own CLI routines....

Though I can't promise my code is all that great

It assembles correctly. Most of the time. ( I still find the odd bug in it )

And the emulator works pretty well... At least for documented codes.

David.

cj7hawk · Mar 1, 2023

Eudimorphodon said:
Even then… to use my example of splatting a diagonal line onto a bitmap screen on top of existing content (so you’ll need to read the target bytes and do an OR operation or whatever) onto a *monochrome* 512x192 linearly-organized bitmap (12k of RAM) you’d have to do 96 sector reads and writes in order to draw a line from the top to the bottom of the screen using the disk paradigm. (Each mono line of pixels is 64 bytes, so you get two per sector.) That means the operation will cost 24 thousand times 21 t-states for the disk operations alone. Then it’s 384 memory read/writes to the copied buffers to actually set the pixels, which is the same number of total operations that you’d need if you paged the memory in. So that’s a roughly 500,000 cycle overhead for this (admittedly worst case, but not unreasonable example) to use the CP/M disk driver instead of paging.

Hi @Eudimorphodon - Do you mind if I answer this in this thread to the LokiOS thread and move it out of the RAMDISK thread which was there for Myke's RAMDISK?

I hope no one would do vector graphics in the way you describe - and it would be even worse if they tried to do it in a high level language - but the best way to draw lines should be with line drawing hardware...

And if local bitmap access is required for something specific, and software vector routines were used, then the optimal ways would be;
1) Page it in.
2) Use the I/O mapped method ( Out address bits 12 to 19, Out bits 8 to 11, LD B,(Bits 0-6), LD C, (port), IN A,(C), Modify A, Out (C),A. Then maybe adjust the vector routines to play with these three items depending on which way the line is to be drawn, which is similar in speed to paging, just different maths and an IN operation instead of LD operation. ( The disk ALSO uses this as the underlying transport layer, just with overlaid disk handling routines. )
3) Use the DISK method. Highly impractical, but if you're writing in MBASIC or something, it's still an option at least.

However, what *would* be practical, even if it was being done in basic - would be to open a file on disk, open the SCREEN file, and copy the contents from one to the other to make a picture show up on screen... That's a lot more practical. Would still be slow, but is practical.

Also, in case the screen isn't in the usual place, finding the screen would absolutely have to be done with the disk routines - which would identify where in real memory the screen raster bitmap was located, and if there was more than one screen, then also which one was the main screen.

At some point I want to modify the Sinclair Basic to make it use the hardware acceleration for lines and vector graphics draws too. The original Loki would have used SuperBasic, but I don't have SuperBasic source and even if I did, it's for 68008 and not z80, and I think that the actual path Sinclair would have taken was to use an existing basic for CP/M and then make a backwards compatible version for Spectrum Sinclair Basic ( for which there are a lot of good z80 disassemblies ).

Also, I need to make up a set of bios calls to do stuff like draw lines, and I might see whether I can align those calls with the CP/M graphics call extensions, which would mean the whole lot could still be accessed via the TPA without seeing the screen raster at all, while maintaining CP/M compatability - though I still have to spend more time learning about the graphics extensions for CP/M.

Building a vector routine in hardware as a DMA of sorts is a little more complex, but way faster.
There are two possible modes.

1) Write the location with a new byte ( single cycle )
2) Read the location, perform an arithmetic function on the result and write the location. ( Two Cycle )

The address in this case would be supplied by two up/down counters, which would be fed by some adders that added cyclically, and generated a clock pulse every time they overflowed. This would allow something like a DMA mode where the counters took over the bus and provided addresses at the highest rate the RAM could take, while allowing for not interfering with the video ram, or using the video ram time in the overscan time. This would theoretically allow a pixel to be written as rapidly as it could be scanned as raster, which would be very very fast, and all that is required is to set up the counters with the right numbers, and have the correct number of elements.

I'm also examining whether I can build a high speed table of these results with an EPROM to perform scaling and trigonometric functions so that I could avoid having to calculate those on the z80... But the hardware acceleration will be the final part of the project. I haven't fully planned it out yet. Still, the same routines we use in vector calculations on the z80 should be possible to implement in counters that have direct access to memory.

But the OS and Memory As Ramdisk will still work with just a console output - and no video raster is required at all then... So the video memory locations could just be filled with another 256K of RAM.

David

Svenska · Mar 1, 2023

I'd like to chime in to remind you that CP/M applications really want at least an 80x25 (text) screen resolution. A character width of 8 pixels allows both decent fonts and fast drawing routines in graphics mode. Therefore, I'd recommend at least a 640 pixel wide screen resolution if you really want to support CP/M applications well.

This is the reason why I didn't bother finishing my TMS9929A-based Z80 system: A barely legible 4x8 font provides 64 columns only, with half the color resolution. Obviously, this is less of a concern for games, but I don't think you are aiming at that...?

Alternatively, your video system could provide both text and graphics; some monochrome LCD panels supported both a text and a graphics plane which were mixed together.

cj7hawk · Mar 1, 2023

Hi Svenska,

Yeah, you are absolutely correct - and it is something that has irked me from the beginning. And as you noted, the primary objective here is a games machine - but it *still* has to run CP/M.

Normally, that would be 64 characters ( 12 more than the Osborne 1 ) but if I use 6 x 8 pixels, it still looks right on the screen and I can get a good character set design into 5 pixels across, while achieving 85 characters/line... Then I could just enforce a return at 80 or 85. ( since it's about 85.3 characters across )

I considered 640x200 ( PC resolution ) and that was one of the supposed resolutions of the Loki, but it's not where I wanted to go. All of the evidence I can find suggests it would have been 512x192, with maybe a high resolution interlaced mode of 512x384 ( same as later Macs. ) so I wanted to stay there, and I'm pretty sure the original design concept in the 80s had the same problem. Which is where I think the alternative resolutions of 320x200 and 640x200 came from, but they would have been far more complicated to build, and may have been impossible at the price... There are some design quirks you get in hardware with specific resolutions that reduce the chip count, board size and make things easier. This was also used in early Macs.

Also, as I mentioned, there is a serial console capability, so it would be entirely possible to run serial for the general screen, and leave the high resolution screen for graphics. Multimonitor capabilities was definitely an objective. CAD applications were starting to emerge in the mid 80s that used two monitors - one for text and one for graphics.

Also, the project is modular. The motherboard is a powered but otherwise passive backplane with 10 slots, and the base system consumes about half. So there would have been no problems installing a monochrome character display or other display even then... Or changing later for 640x200 - though it's not a card I intend to build as a project proof. ( While serial console, with two screens is... And another with 32x20 characters, which is even worse for CP/M but fits another requirement ). Also I'm following the PC BIOS design that allows for other types of video card to be supported, just like how any kind of disk system can also be supported.

stepleton · Mar 1, 2023

For 80x25-hungry CP/M apps: can you do the Osborne 1 trick and use the display as a smaller window that can scroll around a larger virtual screen?

Eudimorphodon · Mar 1, 2023

cj7hawk said:
I hope no one would do vector graphics in the way you describe - and it would be even worse if they tried to do it in a high level language - but the best way to draw lines should be with line drawing hardware...

But back in the 80's almost no "home computer"-grade computers had vector graphics hardware(*), and in fact drawing a line by fishing through a bitmap and modifying the bytes holding the pixel data is still a thing you're going to be doing to this day if you're stuck rendering on an unaccelerated framebuffer, which is still a pretty common situation to find yourself in. (For instance, if you need to run an OS in a virtual machine or in a fail-safe VESA graphics mode situation.) And this same observation applies to more than just lines; if you use the disk method every time you want to splat something onto the screen you're probably going to be moving much, much more data back and forth in these "disk sectors" than you're actually going to be manipulating. It seems like the only situation where this approximates efficient is if you're blind writing the contents of an already rendered bitmap from memory into the framebuffer. In that case it's about a tie with doing a memory copy to a paged in section of the framebuffer, although you're still going to be hitting the (disk) paging registers every 128 bytes instead of every 4K.

(* There were some business computers based around the NEC 7220, which was a pretty talented chip, but definitely not the sort of thing that would appear in a Sinclair product. A video system built around it is of similar complexity to a whole IBM PC-XT motherboard.)

Also moving from the other thread, relevant to the idea of hitting graphics memory via I/O ports instead of mapping:

cj7hawk said:
Ahh, I think I see what you're suggesting - I'm not sure the 9918 supports the kind of memory passthrough I'm looking before, and I'm using the block counter inside the z80 to provide addresses automatically already, which is why the only command I can use is INDR followed by an immediate IND to achieve the same result - it places the B register on A8 through A14 which is mapped to A0 to A6 on the RAM, so there's no setup or registers to change outside of an OUT to the sector and track registers.

It's a purely z80 based quirk that lets me use a port address that is automatically counted by the processor with a specific command -

But it does make me curious to go and have a look again at the TMS 9918 - thanks.

There were a number of 8080/Z80 machines that weren't designed with flexible paging schemes and text-only graphics that later had high-resolution graphics shoehorned into them used I/O to access the RAM, but they usually used a system where there'd be a set of latches that directly at least *kind of* correlated to the X/Y pixel grid, so to write an individual pixel you either didn't need to calculate it at all (just program the X/Y latches and set/reset/read), or the latches would point you to the byte holding the pixel so you just had to do a bit shift to get what you needed. The Matrox ALT-256 S-100 graphics card is a good example of a card that resolves memory down to the individual pixel, and the graphics boards Radio Shack sold for the TRS-80 Model III and 4s are typical of the "you still need to figure out the exact byte/bit" version.

As mentioned, the TMS9918 pretty much worked this way, but it also included an auto-increment function so if you wanted to access more than an individual byte (like, for instance, you wanted a way to high-speed dump new tile or sprite data into the graphic chip's private memory... which is usually what you were doing since strictly speaking the chip didn't have a normal bitmap mode) you could just set the starting address and loop through a series of OUTs. (If you had a Z80 you could certainly use INDR/OTDR, it'd just be the hardware in the chip doing the increment independently of the Z80's register contents. A bonus here is you won't have to care if you cross a memory page border, you can just do an arbitrary number of IN/OUTs in a row and the autoincrement hardware keeps you in the right place. On the downside, memory operations on this thing were *slow*; you usually had to wait 8-16 microseconds between each access because of refresh contention.)

The hardware you put in there for the disk access could certainly work, for individual pixel access, similarly to the TRS-80 graphics card example, in the sense that you could program your "track" and "sector" latches to get you within 128 bytes of the desired pixel, letting you do an INDR/OTDR to manipulate the individual target byte. (Or to read/write a streak of up to 128 horizontal pixels if that's useful, assuming a linear frame buffer layout.) That's not going to be any *more* efficient than paging in the frame buffer, it'll still be less because you'll have to touch the disk page registers more often if you're trying to "blit in" an object that covers pixels more than 128 page-aligned bytes apart from each other (think about moving a software sprite, for instance), but it's at least a vast improvement over treating it as a file...

But on the flip side, whatever is gained by the "abstraction" goes away, you're banging straight on the hardware for speed just like you'd normally do on Z80-grade hardware. If you're writing a draw routine for a high level language the user's not going to care what happens behind the scene, and once, by definition, we get into talking with graphics any idea of this being agnostic to any existing CP/M computer goes out the window

Eudimorphodon · Mar 1, 2023

Apparently I wrote a novel and had to break it in two, sorry about that...

cj7hawk said:
Also, in case the screen isn't in the usual place, finding the screen would absolutely have to be done with the disk routines - which would identify where in real memory the screen raster bitmap was located, and if there was more than one screen, then also which one was the main screen.

Is keeping track of where screen memory is really such a huge problem? I get that your method for using CP/M disk blocks in a 4K allocation size lets you play fun tricks with populating the memory page register directly with the block address that's in the CP/M directory entry and therefore assemble a "linear" TPA out of blocks of RAM that are physically discontinuous, but...

Is video RAM going to be dedicated to the video refresh, or is all memory in the computer sharable with video and the video chip maintains its own list of pages that make up the active framebuffer? (IE, is this a "uniform memory architecture" or not?) I guess let's tackle both:
1. If it's a non-uniform architecture, IE, the framebuffer(s) are a linear block of memory, but your concern is that maybe you want to be able to have more than one, or have one big block of memory that has multiple virtual screens and each "process" gets mapped one of the two... what's the problem at the application level with knowing which pages you can flip? Just have an OS call that returns the base page address and the number of pages assigned to the process' graphics context.
2. If it IS a UMA (which, honestly, I'd be careful about committing to, that's harder to implement on the Z80 than some other CPUs without taking a massive wait-state hit from refresh contention) and each processes' framebuffer could be a random list of 4K pages cobbled together in a directory entry that the video hardware parses to keep track of what it should be converting into pixels at any given time, is that actually any different? Instead of an OS call that tells you base page+extant to access the VRAM directly you need an OS call that says "You've been assigned 16 video pages, ask me to page in which one you want" and *that* handles the "scatter/gather" operation of switching in the correct physical page of a "virtually linear" frame buffer from the pages associated with the directory entry.
There are plenty of normal computers that have multiple possible locations for the frame buffer or multiple pages. Even the humble Apple II had two possible locations for high-res graphics, and machines like the TRS-80 Color Computer with the 6847+SAM can move their framebuffers around pretty much at will. Maybe more to the "abstraction" point, the original Macintosh hardware also had two possible graphics pages, and which one was active was chosen by a global that just pointed to the base address of the page you're supposed to be using. This was actually leveraged by later models of the Macintosh in a way that makes it extremely straightforward to make a dumb framebuffer video card for one; QuickDraw only supported linear framebuffers and "fat pixels", but as long as your framebuffer hardware fit those restrictions you pretty much just needed to say in the driver/ID ROM "here's the memory address, dimensions, and color depth of the thing you should draw the desktop on" and the OS will use it with no other fiddling.

I guess the point here is there's a lot of strategies you can use to say "you wanna know where to draw? OVER HERE!". Since you're not emulating any existing computer and any graphics programs are going to be written *for this system* you're free to impose whatever strategy you want...

For completeness' sake, and since you alluded to it, yes, there was already a "Portable" graphics library for CP/M (Plus) called GSX-80. If you wanted to target that then that would be an option for running existing software, although that doesn't really change anything; your GSX driver/library will just need to use whatever method you chose to know where its framebuffer fragments are (or where the vector drawing hardware is and how to talk to it) like anything else.GSX-80 probably isn't that interesting since there's not much software written for it and its feature set is heavily slanted around "plotter-like" applications, but apparently some hardcore folks have written games using it...

This again kind of points to maybe you should consider targeting CP/M Plus, or at least something like it, if abstraction is really what you want? GSX is essentially a BIOS extension that appears above the TPA, per the usual CP/M architecture, but in a system that could really use it properly the actual code/extensions for doing the drawing is going to reside in a system context that's paged in in response to the API call. (A full CP/M Plus system with a functioning GSX extension that didn't have paging would probably have a pretty cramped TPA.) If you want the rule to be that graphics have to be abstracted through an API but still want them to be fast a BIOS layer like that is going to be far more performant than treating everything like a disk, I'd think, except in very rare cases. (Like just chucking a BMP into a framebuffer.) By definition system-level drivers should be designed to wring the hardware for everything they're worth if you don't want user applications to do it.

cj7hawk said:
Building a vector routine in hardware as a DMA of sorts is a little more complex, but way faster.

Maybe what you want for a game machine is a general purpose blitter instead of a vector draw-er. If you had that then you could also use it for your disk transfers; I'm guessing it might be possible to substantially beat 21 machine cycles per byte transferred.

Eudimorphodon · Mar 1, 2023

cj7hawk said:
Normally, that would be 64 characters ( 12 more than the Osborne 1 ) but if I use 6 x 8 pixels, it still looks right on the screen and I can get a good character set design into 5 pixels across, while achieving 85 characters/line... Then I could just enforce a return at 80 or 85. ( since it's about 85.3 characters across )

Are you planning to have an actual hardware character mode, or is it all graphics all the time and characters will be drawn as bitmaps? (No shame for the latter, that wasn't that uncommon on the rare CP/M-period machines that supported full bitmapped graphics. It's only that 6-bit wide characters are kind of awkward because you have to put 4 characters into every three bytes, which means bit shifting every time you write a glyph.) For a video system I've been working on in my completely inadequate spare time I wrote GAL code to drive pixel generation that's switchable between 8 and 6 bit wide cell modes, so it's easy enough to implement if you want a special 80 column "pure" text mode in 480 pixels separate from a 512 pixel wide linear bitmap layout.

(Since the 9918 came up it's worth noting that its "normal" tile layout was 32x24 8x8 cells, which would imply a 32x24 text mode, but it also had a specific 40x24 hardware text mode with 6 bit wide characters. When running in this mode tile-mode features like the sprites were disabled.)

I'm definitely interested in what you have in mind for this video system, especially if it's going to be relatively "low transistor count" (IE, discrete TTL and/or low integration GALs) design instead of an FPGA. (Or a heavy duty MCU/SOC that's just programmed to act as a GPU/terminal verses actually memory-mapped.)

Chuck(G) · Mar 2, 2023

25 x 80 line serial terminals were uncommon in the 1970s--most were 24 x 80 or less. Many early programs were even intolerant of lowercase.

FWIW.

cj7hawk · Mar 2, 2023

@Eudimorphodon

I'm planning on having raster graphics all the time, and drawing the characters, in 6x8.
The planned hardware is like an Amiga BitBlT, but with a tweak, combining raster BLTs and Vector BLTs into the same hardware.
First consider that the source is always a contiguous stream of data. The output is two dimensional - in that there is an X counter and a Y counter to count how many bytes are sent.
Then there's a read operation on the first range, and a write operation on the second rannge, the first counter is incremented, and the X and Y of the destination are incremented according to the transfer characteristics, eg, Increment Y for every X bytes transferred and reset X when you get to the end of X count and complete when you get to the end of Y count.
So far so good. But we have another function - Instead of, say, just increasing X then Y for each set of X, why not simultaneously increase X and Y, with another counter that adds s fixed amount to a counter, and the overflow is either funneled to the X or Y counter instead of the clock depending on a quadrature analyser, which also changes the direction of the count in some quadrants.
Now the output will form a vector, and will draw a line in a different direction.
And imagine instead of just dropping this down the Y axis on subsequent passes, you have another proportional counter, which adjusts the start position.
Now you have a way to draw vectors at the clock speed of the counters, and if you're copying a block, then you can rotate it to any angle based on the counter contents as well as copy it, and perform logical functions on the counter.
One final section - imagine that instead of repeating the same counter angle control for each "x" line, you adjust this counter so that the output varies slightly on each line, perhaps based on an angle or other trigonometric input.
Now you have a way to draw with respect to a vanishing line, with auto scaling, and you can draw 3D faces or facets.
With potential bitmaps on them.
So you end up with a 3D engine that can draw vectors very fast, and can draw objects as a series of vectors, to create 3D polygons. All with just a few counters and some memory access.
Does the above all make sense? You'd draw characters that way too. Just a table of predefined values that get blitted to the screen as needed. And the same hardware would handle scrolling, and clearing the screen, as well as drawing the various shapes of the CP/M graphics calls.
I think it would be pretty cool. And not that slow - certainly it would be blisteringly fast compared to CPU bound similar routines during the 80s.
What I haven't decided in is whether to do this within the raster time between scans, or to just hog the bus and perform it on the CPU side.
I also haven't decided whether I keep it to a smaller objective - eg, single-line vectors and blits, with CPU stepping and changing of counter registers manually, or fully automated.
Fully automated is faster, but also has a slightly higher chip count, and I also need to decide which side of the bus it sits on, and how I handle "pages" of video memory, and whether I extent to make the vectors handle 512 graphics in two steps, or I block both bits together... As I get to designing the circuit, I'll see how far I can push it.
At the moment, it should take around 20 GALs to do all of the hardware elements, but it should also be roughly the equivalent of a single uncommitted logic array from the 80s, so I'll design it with GALs.
As for the memory architecture, I was thinking to buffer RAM writes, multiplex the RAM, and devote half to the video circuit and half to the CPU. Then the writes can happen asynchronously. Read would have to happen with wait states, but would also be latched to minimise wait states... So the z80 can directly access video memory as pages, or I/O, or via the hardware routines ( and would still have to access as either I/O or Disk to write the bitmaps etc, since the BLT has to copy from somewhere ) - Or I might just use dual-access video ram. Again, still deciding that point.

Other chips could definitely be added later, but as you point out, there's no way they would add a high-end video chip to a Sinclair that is supposed to sell for the price of a high-end video chip alone

@stepleton
I definitely don't want to scroll the screen - The Spectrum +3 did that, and I don't like it. I thin it would be looked down upon. Even in the 80s, and even with the Osborne 1, I think most people assumed a CP/M screen needed to be 80 characters.

@Chuck(G) now you mention it, what did CP/M programs expect vertically as a character resolution?

Svenska · Mar 2, 2023

cj7hawk said:
@Chuck(G) now you mention it, what did CP/M programs expect vertically as a character resolution?

Chuck(G) is correct; the regular terminal resolution is 80x24 instead of 80x25. My mistake.
However, many programs can use additional vertical resolution, either by accident or configuration.

cj7hawk · Mar 2, 2023

Thanks Svenska - 24 I can manage without too much complexity. Do you know if there's a minimum of is 24 pretty much what is expected?

Svenska · Mar 2, 2023

I'd say it depends purely on how much usability you are willing to give up.
If I remember correctly, Wordstar requires at least 16 rows to function. Ladder is fixed at 80x24.

Line-based programs should work at any resolution, but that is because they implicitly assume an infinite screen height (paper trail). Not sure about the minimum number of lines to enjoy a text adventure without backscrolling.

Experience in programming (if I remember correctly, that was a study by FreeBSD) has shown that the number of bugs in source code goes up significantly as soon as the relevant code block (function) exceeds the length of the screen. Any additional line will improve quality of life for text editing work.

My AVR-based terminal implements a physical 80x30 screen (the world is PAL), but can enforce 80x24, 80x25 and 80x30 limits.

Eudimorphodon · Mar 2, 2023

cj7hawk said:
Does the above all make sense? You'd draw characters that way too. Just a table of predefined values that get blitted to the screen as needed. And the same hardware would handle scrolling, and clearing the screen, as well as drawing the various shapes of the CP/M graphics calls.

It sounds very ambitious, certainly. I am not educated enough in graphics hardware design to say how realistic it sounds to do that with 20 GALs.

What color depth are you planning to use, and is the organization planar or fat pixels?

VCF West	Aug 01 - 02 2025,	CHM, Mountain View, CA
VCF Midwest	Sep 13 - 14 2025,	Schaumburg, IL
VCF Montreal	Jan 24 - 25, 2026,	RMC Saint Jean, Montreal, Canada
VCF SoCal	Feb 14 - 15, 2026,	Hotel Fera, Orange CA
VCF Southwest	May 29 - 31, 2026,	Westin Dallas Fort Worth Airport
VCF Southeast	June, 2026	Atlanta, GA

Introducing LokiOS. A new version of CP/M, written from Scratch, Compatible with CP/M2.2 Programs.

Veteran Member

Experienced Member

Veteran Member

Experienced Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

25k Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member