Thought experiment: CP/M directories

hjalfi · Mar 3, 2023

It's Friday, I'm bored, so...

Not long ago I wrote a 6502 port of CP/M. I ended up getting so heavily invested with the details that I had cpmfs dirents oozing out of every pore. A question bugged me: how difficult would it be to add proper directories to CP/M 2.2 with as few changes as possible?

The way this would work is:

- directories are represented as simple zero-length directory entries. One of the status bits identifies them. (You could probably get away with just using the S bit for this.)
- the user code field on the on-disk dirent. Instead of being a 0..15 value representing the user code, it's now identifies that entry's enclosing directory.
- a directory's code would be its directory entry index (with some mapping, see below).
- directory code 0xe5 is still special and still marks a deleted file.
- the get/set user code BDOS call becomes a get/set current directory code call.
- there would be entries in the root directory for directories 1..15.

There would be very few BDOS changes required, mostly limited to the code which turns an FCB into a dirent and back again.

Pros:

- non-directory-aware programs on a directory-aware BDOS would work fine. Anything which didn't use user codes would use the current directory; anything which did would get access to the root directory and special directories USER1...USER15.
- using directory-aware disks in non-directory-aware BDOSes would _mostly_ work. You'd be able to see files in directories 0..15. Other directories would be inaccessible.

Cons:

- thoroughly incompatible with disk labels, timestamps, CP/M 3 passwords, etc.
- resolving a path into a current-directory/FCB pair would be a reasonably chunky amount of user code. It'd make sense to try and embed this into the BDOS because nearly every program which uses filenames will require it, but I doubt there's room.
- you'd need to add a CD command to the CCP.
- it'd be trivially possible to put files into non-existent directories, at which point they get lost.
- there's a single current directory for all drives. This is pretty problematic. You might need to repurpose one of the DPH bytes to hold a drive's current directory, but that would need the BDOS to use a different algorithm for block-to-CHS conversion to free up space.
- you're limited to ~255 directories. That's probably enough for any reasonable CP/M filesystem, but on systems with a directory bigger than 255 entries, it would be possible to run out of candidate dirents to place a directory in. You'd probably want to map the directory code to the dirent index by spreading them through the directory, e.g. dircode = directoryindex/4 + 1. (The +1 is so that the root directory, with code 0, doesn't occupy a directory slot.)

This all seems quite feasible; fiddly if you want to maintain backwards compatibility, but pretty simple otherwise. The directory parsing code would be the single biggest job. Architecturally there's nothing really to it --- Apple's MFS, which was used on early Macs, used essentially the same scheme.

...which all kinda begs the question; why hasn't anyone tried this before?

Robbbert · Mar 3, 2023

I don't think standard CP/M 2.2 had user numbers - at least it didn't on my late Kaypro.

However I've often thought that CP/M 3 with its user numbers could conceivably regard them as directories instead. The only issue is you can't DIR to see which user numbers are in use - that's something that needs to be overcome.

Chuck(G) · Mar 3, 2023

User areas have been around in CP/M since at least 2.0. Just not well documented prior to CP/M 3.0 and MP/M.

GeoffB17 · Mar 3, 2023

I think also that CP/M 3 introduced the ability to show the User No as part of the drive prompt, as in 3A:, which earlier systems did not - although I've seen mention of a patch to enable this. On some systems?

I'm using JonB's uIDE unit on my PCW (CP/M 3) and in effect I'm using the User areas as sub-dirs, and this works fine.

Long ago, I downloaded a small assembly prog called DIRDUMP or something like. I made a couple of variants of this for use with the PCW's LocoScript which used User areas 0 to 7 routinely, and 8 to 15 for 'limbo' files (deleted versions). The prog listed the files on the disk User by User. Further variants could list for one specified User ONLY (but that can be done with DIR anyway (the current user), also an option to list just the Users free, or just the users in use?

Geoff

geowar1 · Mar 3, 2023

A few years ago I replaced all my (failing) floppy disk systems with s100computers.com’s dual CF/IDE cards.

FYI: An 8 GB CF card can hold 1024 8 MB ”hard” CP/M partitions.

Using this I came up with a hierarchical directory system.
All I needed was a way for a “CD” application to change the current partition for each logical drive.
Likewise “PWD” would print out the current working directory (path) for a specific drive.
The “MKDIR” command would create new directories.
If interested you can see my project (with notes & sources!) at HFS4CPM.

Chuck(G) · Mar 3, 2023

Anent user numbers--tidbit. There was a CP/M clone out there that used 96 as a user number.

ChickenMan · Mar 3, 2023

ZCPR 3 allows you to actually name your User areas also. This is ZCPR3 running on my Microbee where A0: is named BASE where User 15 is named ROOT.

cj7hawk · Mar 3, 2023

Creating subdirectories as image files is quite possible. I built that intent into the LokiOS version of CP/M that I've been writing and this much is all standard CP/M - however I a haven't fully implemented it as a subdirectory system yet for a few reasons and there are significant impediments to doing this under CP/M that don't exist with the FAT kind of file system.

But I have still implemented subdirectories by manipulating the Disk Offset parameter in the DPH. This allows CP/M to locate a new directory structure anywhere on the disk, and seems both supported and intentional in CP/Ms design.

The biggest problem I encountered is what do you do when you get directories within directories (path related) then how do you change directories? And how do you handle paths? At the moment I just fix their sizes and mount them as new drives as I still don't have a solution for this, but it would be practical to stay on the same drive, translate the AV tables, and implement as a true subdirectory. It also allows for a direct increase in the number of files that can be stored on a disk system without changing the root directory size, using later space for additional directory structures, and I believe it fully CP/M compliant in it's implementation.

It may not be the solution you're looking for however. I'll explain the drawbacks, strengths, requirements and how it might be practical depending on your objectives, so you may find parts of this strategy useful for implementing true subdirectories within CP/M on any CP/M system.

Firstly, if you do it this way, you need to use VERY large allocations. How large? One block per track. You can have as many records as you want in an allocation, and you're going to have some problems with disk access routines, and may want to use long-multiplication for track finding on a very large disk ( eg, 256Mb with 4K allocations ) but it does seem supported by CP/M, and it is supported by the version I've been writing. You can fudge it by skipping blocks if you have more than one block per track, but that's not going to work if you have a mostly full and very fragmented drive, so it's a realistic limitation.

Once you have this, you can create offsets of 16 bits anywhere in the disk structure to create subdirectories. This is reflected in the DPH in the track offset, which tells CP/M where to find the subdirectory. I think it was originally intended for "Partitioning" large disks into multiple smaller disks, and it does that well, but it's also good for shifting directories throughout the disk structure, and there's no rules about where you can stick a directory, and how big it can be and how it has to work. CP/M doesn't care where it finds it's directories -it just finds them based on adding the track offset to the base of the disk, and that's where the directory allocation is, and CP/M doesn't care if these allocations exist within other disk spaces. It doesn't do any checking for that.

And if the root has a file that covers the directory (eg, MYSPACE.DIR ) and the allocations for the entire subdirectory and all file contents are marked here, nothing else will use those allocations. It also means you can use existing CP/M commands to deal with the directory, eg, "PIP NEWSPACE.DIR=OLDSPACE.DIR" should make a new subdirectory and copy all of the contents from the old subdirectory and will automatically set it up as a subdirectory - No changes to CP/M are required to support that. Subdirectories can also be deleted by this same method, renamed etc. and shifting between subdirectories can be handled by the CCP, so doesn't affect the TPA.

At the moment, I pre-allocate the size of the subdirectory and mount it as a drive, which is where the limits of standard CP/M end - I will note that you don't need the same allocation size in the subdirectories ( I use 1K blocks in subdirectories, and 4K in the root ) - however if you are on the same disk system, you must use 1 block per track for any subdirectory that requires to have it's own deeper level subdirectories. That's a limit of CP/M, so if you want multilevel subdirectories, and not just root-subdirectories, you have to keep that in mind.

So far, that's all normal CP/M. Here's where it diverges, and where I haven't worked out how to better implement it in a better way yet.

To make for truly dynamic subdirectories, you need a way to mark ALL of the blocks allocated back through the existing directory tree, not just a single large linked in a FAT like DOS uses. Changing directories is possible under CP/M - and it's relatively easy - just edit the disk CPH, and shift the track offset, and boom, you're in a working directory, If the space is preallocated and you change the number of tracks, you're done. You could even implement a call in BDOS that recognises where the track offset it and knows when to shift down to the root, but that's all extensions - it's not covered in the standard BDOS.

Also, if it's a dynamic subdirectory, the BDOS needs to be aware that it's a subdirectory, and that when allocations ( which are offset also for the subdirectory, but that's calculable ) are set for blocks, that it "pages" has to both add the allocation to the root file as well as the subdirectory so that another subdirectory, or access to the root, doesn't overwrite it. Also, it needs to pick up new allocations from the Root AV tables, which are going to become your defacto un-linked FAT structure.

That much represents changes to CP/M. As does a method to calculate and remember paths if you're going to have "Meta" subdirectories, because you will have to change the allocation of every directory file in the path and add to them one by one.

This can be done through recursive code I imagine, but the BDOS would still need to be aware of it when read/write sequential or random, and only maintain a single AV table for the entire disk, not one for each directory you may be in. A path is going to require at least 2 bytes per level, since it needs to track the block that is allocated to directory structure for each directory traversed in the system, so that the BDOS can recursively update each directory structure - and allocations are going to be eventually deleted within existing file structures, so you have to support sparse files since it's going to get messy, and you're going to need a new way to write the first available sparse allocation as a part of this routine also, to avoid problems, but it is doable, and would support changing directories up and down with BDOS extensions, while allowing most of the BIOS/BDOS routines to stay exactly the same.

The next problem you're going to hit is TPA erosion if this is all maintained in the BDOS, but if implemented as above, it might not be too bad, and could still work with existing software which is not aware of the directory structure.

It also allows subdirectories to be mounted as root directories to an extent if the system is aware they are subdirectories, so for software that doesn't support any path structures, you can separate program and file storage space.

But from the above, I think it would be very practical to implement a directory structure in CP/M and even to implement a dynamic one. Of course, you could also just write files, and mount them as per ram disks, but using virtual memory, however that requires a driver and shims and is a bit messy, while the above fits elegantly within the base CP/M structure... It's also compatible with earlier CP/Ms, right back to 1.4 I think - and I'm doing it under a system designed to be compatible with CP/M 2.0.

Regards
David

Chuck(G) · Mar 3, 2023

You do know that there are versions of CP/M-86 that can use the FAT filesystem. John Elliott gives some details. If you're going to implement subdirectories, it might be useful to follow CP/M-86 conventions.

cj7hawk · Mar 3, 2023

Hi Chuck,

From what I can see, that's pushing CP/M over the top of a different file system through a translation level implementation - But if that's the case, the question arises as to whether it's just a second OS component wedged in over the top of the existing one, or whether it is a single component with two capabilities - ie, how does it handle a mix of both file systems simultaneously?

I imagine there would be ways to do both - I wonder if it's a better approach. Certainly moving to the x86 with it's segment architecture, it would be fine memory wise - Do you know what the memory overhead would be if it all existed within the 64k space without paging?

@durgadas311 's implementation of CP/M under JAVA would also be abother way to go with that kind of approach - it just emulates CP/M from the call level and he uses the allocations in the file table to refer to the links to real files, and then it just translates everything. I never found anything it didn't work with, and it was incredibly helpful while I was debugging my own implementation.

A big challenge is how are you going to store direct path structures under CP/M with a FCB? There's 16 bytes of data in the FCB that can be used in a translation system, and could be used as block pointers for a few levels of subdirectory if the filenames are symbolic and kept to a minimum eg, the file system might translate to a single letter directory name and single letter filename - or even a byte-allocated directory and character based filename. That would provide up to 15 levels of subdirectory within a translated structure with a relative normal FCB... As long as nothing tried to access allocations directly or use that information directly either.

And how might you ascend a level with BIOS calls? Open file to go deeper and close to reverse out a level? And would you use a directory extension in the +3 to indicate it's a directory and not a file?

Regards
David.

geowar1 · Mar 3, 2023

cj7hawk said:
But I have still implemented subdirectories by manipulating the Disk Offset parameter in the DPH. This allows CP/M to locate a new directory structure anywhere on the disk, and seems both supported and intentional in CP/Ms design.

See my notes on this at HFS4CPM

Chuck(G) · Mar 3, 2023

cj7hawk said:
From what I can see, that's pushing CP/M over the top of a different file system through a translation level implementation - But if that's the case, the question arises as to whether it's just a second OS component wedged in over the top of the existing one, or whether it is a single component with two capabilities - ie, how does it handle a mix of both file systems simultaneously?

Both filesystems interface through the FCB, so no big deal. You can treat the FAT filesystem as a flat implementation without hazard. You have to remember that the CP/M filesystem was originally designed for and optimized for floppy access, where seek times are considerable and rotational latency is significant.
As a matter of fact, why not jettison the CP/M filesystem completely? You can always have a utility to translate from floppies using the old system. A lot of 8-bit word processors use it, after all. You can keep the FCB interface as-is.

FAT doesn't require a lot of storage. On my MCU work, I use Chan's FATFS quite a bit--and it can be selectively configured to support all sorts of extensions, such as exFAT for very large volumes.

cj7hawk · Mar 4, 2023

Chuck(G) said:
FAT doesn't require a lot of storage. On my MCU work, I use Chan's FATFS quite a bit--and it can be selectively configured to support all sorts of extensions, such as exFAT for very large volumes.

MCU=Microcontroller? Just went looking - Chan's FATFS has been eliminated from the Internet - Fortunately I found some nice examples elsewhere with logic analyser outputs... I'll have to go looking on archive.org later to see if I can find earlier versions.

Thanks for the pointer

Chuck(G) · Mar 4, 2023

Nonsense--elm-chan.org is still very much online. You may want to take a look at his Petit FATFS code.

cj7hawk · Mar 4, 2023

Chuck(G) said:
Nonsense--elm-chan.org is still very much online. You may want to take a look at his Petit FATFS code.

I get not-found error trying to access anything on that domain... I assume then that it's only causing me issues so it might be a server fault or something in it's distribution internationally. I saw Archive.org could access as recently as a week ago and I assume you can get to it just fine?

Chuck(G) · Mar 4, 2023

Yup, the whole site works here in the US. A search on the IP address (211.13.196.134) shows that the host is located in Tokyo. Can you get to this link on archive.org?

In the microcontroller universe, FATFS is extremely popular with some hardware vendors using it.

cj7hawk · Mar 4, 2023

Bizarre, same DNS resolution here. Just tried Microsoft Edge - Works. Tried Firefox, Failed. Seems to be a firefox issue with my machine

- Archive.org worked fine with both browsers, but the server throws up an error when my Firefox tries to access it. I checked the tcpdump just to make sure there wasn't something weird in the middle like a proxy ( it's port 443, so I can't see the contents anyway ) and it all looks like mr Chan has a server configuration error or something on his site doesn't like my Firefox.

Robbbert · Mar 4, 2023

I can access that site just fine with Firefox - you must have a local issue.

Make sure you're using Google's DNS rather than the default (your ISP).

cj7hawk · Mar 4, 2023

DNS is fine, I checked it was resolving correctly. And Firefox is going to the correct IP - It just gets a different server result. Could be related to almost anything server side - Maybe a no tracking request, cookie issues, or a broswer version string. Impossible to know since it just returns a server error.

Only seems to affect Firefox. It is most likely specific to my FF config - but who knows what the trigger is. Only the server logs know for certain.

Chuck(G) · Mar 4, 2023

I browse the site using FF 110.0.1, but I've accessed it over the years using a variety of browsers. Never a problem.
Of course, I'm running Linux...

VCF West	Aug 01 - 02 2025,	CHM, Mountain View, CA
VCF Midwest	Sep 13 - 14 2025,	Schaumburg, IL
VCF Montreal	Jan 24 - 25, 2026,	RMC Saint Jean, Montreal, Canada
VCF SoCal	Feb 14 - 15, 2026,	Hotel Fera, Orange CA
VCF Southwest	May 29 - 31, 2026,	Westin Dallas Fort Worth Airport
VCF Southeast	June, 2026	Atlanta, GA

Thought experiment: CP/M directories

Experienced Member

Veteran Member

25k Member

Veteran Member

Experienced Member

25k Member

Experienced Member

Veteran Member

25k Member

Veteran Member

Experienced Member

25k Member

Veteran Member

25k Member

Veteran Member

25k Member

Veteran Member

Veteran Member

Veteran Member

25k Member