Hi Eudimorphon
Your observations are correct - Here's the thinking behind it -
1) 128 byte transfers - this really irked me, but it's a limit of the DMA space in CP/M - so I figure I might as well roll with it. Once that limitation is accepted, anything bigger with deblocking is just a compromise, since RAM doesn't care, as you point out - the only thing I considered was whether to make it 256bytes, since I'm writing the OS from scratch, but compatability concerns won out - as did some vague rumours of software and not wanting more than 32 sectors, but then I get this nice coincidence that 7+5=12 which makes 4K blocks, so I went with it, even if it is clunky - Also, the support hardware/latches to support that fit on just 2 PAL/GAL chips.. Aligning with hardware is also an objective.
2) I considered just paging, but I wanted a compatible architecture, with backwards compatibility to older apps, and I have no idea where the DMA might end up - and unless I want some freaky code to keep track of it all, I might end up paging out the DMA - which would be a bit messy. Also I figure people are going to do strange things like moving the DMA around while making calls, so having the DMA source in I/O land makes a lot of sense. I did consider paging it into the BIOS area, but it's 4K blocks and I have scratchpad DMA for directories in the BIOS... so I/O seemed a better choice.
3) Paging is *not* required. It's an elegant solution in which the OS or program can byte-step through memory, up to 256Mb, via clean I/O commands in 128 byte chunks, and I use the upper address lines to avoid a latch. It seems a more powerful architecture that allows separation of code and I/O space, and makes things easier for programmers to address.
4) Accessing the screen without paging - yep, you guessed it - I/O space or as a file. Direct bitmapped high resolution 128K video memories can be accessed by a 64K program and it works with character I/O or high resolution bitmaps ( and hardware graphics, but that's another element ) - I/O is one thing, but what if the video moves around? Why not just open "VIDEO.MEM" as a file, and random I/O ? Now even less memory is required. And if the video memory moves around, then so can the file, and the user doesn't care. It creates a very clean abstraction layer when compatability with future hardware is desired over fixed mapping. As long as "VIDEO.MEM" exists, everyone can find the video map. This avoids the "Osborne 1 60K problem" - Besides, I can't find an elegant solution to paging in 128K of video memory into 64K... Sure, it still works, and programmers can use it whichever way they prefer.
5) Wastage- for the MMU, yes. Each extent only has four allocations too, which is a pain, but there's a 1:1 mapping for memory, and it doesn't matter where memory is. Got too video cards? Want to use some upper memory for programs? Now you have 640K, or 706K or even 770K and use serial for console... Also, if you want a more efficient ramdisk for a specific application, you can do that too. The 64K disk at F0000 is a 27512 Eprom sized block that contains the boot code and has it's own 2K directory, and uses 1K allocations. It's seen as a single block in the M: but the L: sees files and that's where I store extended functions, like BASIC and other desirable commands like FORMAT etc. Also it can be deleted or ignored. But it gives fast boot and makes the system more like a home computer. Boot CP/M from ROM ( can also boot from a disk ) and any commands searched are searched from the local disk, then L: - which is useful, because you can't run MS Basic any other way under CP/M
Also it might be a bit wasteful, but it's RAM - so it won't see a lot of files in M: in normal use I imagine. But M: can be used like a normal ramdisk too, so files and system processes all get mixed up and work together with a nice clean resource management provided for free by CP/M... Which I think is also kind of elegant. My current CCP is 4K and the FDOS is also 4K, so there's a lot of spare space too.
6) - Yes, I will write a program to run "snapshots" including starting and stopping them under NMI, or via switchouts. And a way to have RST's automatically map to processes to create drivers - eg, Video drivers can be called by RST10, which pages in the process, and it is aware of this, and knows how to return to the original process. Initially the user process is process 0, and the user can do what ever they like, but it would make for a powerful multiuser system as well. And yes, it should be possible to snapshot a program, save it to disk, load it back into completely different memory space, and start execution again, or even move it between computers and execute it somewhere else like a container.
If I went multitasking, I'd add one more piece of hardware - a comparitor on the output from the MMU to generate a NMI when a program accesses memory that doesn't exist - then all I need to start a small CP/M program is 4K for the task and a shared CCP and FDOS, and I can leave the rest of it sparse, and I can still save and move it around, or duplicate it, with only a single moving allocation. And when it accessed memory it didn't load into, eg, data memory, then allocate a new block through the file system and just keep going... But that's not something I intend to do this time- it's just an idea that came up that would make a very cool "second generation" model...
I may have a trigger on jumps and calls to 0005 though to page in the FDOS as an option, and it can page back out when done, meaning that the FDOS can be called from anywhere as long a it's not trying to use a DMA above F000 and the TPA becomes 64K then. Minus the zero page of course, and even that can be overwritten, but the lower 64 bytes will ALWAYS be located at 10000 to 1003F - So that when images page in, any drivers using RSTs will still work.
And being able to snapshot CP/M software while running opens up all kinds of possibilities. It's a lot of fun. And don't forget this was intended as a GAMES machine - so you could pause your game, save the state, and put it away on disk when you shut down - I was thinking back to a time when this was difficult to do and when the power went out, you came back the next day and did it all again.
Oh, and to cover the bonus question - Let's say a 64K program extends itself with normal disk writes to 256K. It's still a single file, and can exist as a single process. Even if it's running in the default user TPA ( process 0 ). What will happen is that the first 16 pages are mapped to memory from 0000 to FFFF and the ENTIRE file is mapped to DISK random I/O from 00000 to 3FFFF - How it's mapped in the actual allocations and blocks of memory doesn't matter, but as you pointed out, you can look at an allocation and map it into memory anywhere you like.
Svenska - Yes, that is something I didn't want to impose with this architecture.
Super bonus question - Yes, I actually want to swap out the Zero Page - so that any snapshots taken while the 005C-00FF region is in use will still work. And I'll use this memory to save the snapshot state - and any peculiarities it wants, eg, shared memory, fixed allocations etc. Since it's not always desirable to randomly assign resources.
Chuck - I thought about going more granular, but decided on 4K since at some point, managing the memory becomes more of a problem than a benefit. Most systems seemed to settle on 8K pages, so I went a little further than that to align with my default "track". But 1K pages would have allowed better use of the MMU directory structure... I only get 4 allocations per extent.. ( now you know why I was asking those crazy questions about that earlier ).
At the moment - I have my "default" CP/M built, and the emulator functioning. I've recently installed the MAD ports into the emulator, and the DISKs L: and M: work - A through H are reserved for external and physical media, and I through P are reserved for internal mappings, with L: and M: fixed. L: is the BIOS image, and M: is the Memory Map. ( Which only shows the default TPA at the moment, but needs more default files, one for missing RAM and one for the BOOT DISK image (L: ) - But it boots like a real machine should - ie, loads the bootstrap, copies the BIOS and BDOS, and next I need to get the BIOS to load the CCP from disk. The CCP and BDOS don't require the additional hardware to run, and I want to keep it that way, so that it could run a straight CP/M image also.
But if I can write a backwardly compatible version of CP/M that can lay the smackdown on DOS, I will be happy