Relocatable Z-80 Code and a puzzle

gp2000 · May 19, 2017

Writing Z-80 code that works no matter where it is loaded is difficult as the Z-80 has no relative CALL instruction and no direct access to the program counter. Most of the time there's no need for such flexibility, but driver programs do need to relocate themselves and it can be handy if you embed Z-80 code in BASIC strings or variables.

I remembered 80 Microcomputing had a good article about this and while looking for it I found a follow-up:

80 Microcomputing, April 1983, "Understanding Relocatable Code"
80 Microcomputing. April 1985, "Relocatable Programs: Microcomputing's Hoboes"

Besides some good techniques it points out that the BASIC ROM has a critically important subroutine for relocation at $000B. A "CALL $000B" will return with HL set to the address after the call. The routine itself is just two instructions: "POP HL; JP (HL)". A self-relocating program can't include this subroutine itself because it wouldn't know where it is. On a Model 4 LDOS and TRSDOS provide the @WHERE system call (SVC-07) that does the same thing.

Here's the puzzle part: Can you figure out a way to get the program counter in code without using these known subroutines?

I came up with one technique. I'm ever hopeful that there is some other way but in any case I'll post my solution in a few days so as not to spoil anyone's fun.

I know from a practical perspective it is almost a given that HL or some other register will contain the location of loaded code. After all, whoever loaded you had to know here to start and "JP (HL)" is as good a way to get to your code as anything else, but my solution doesn't depend on anything like that.

Also be careful not to depend on any hidden addresses. It might seem like "label: LD HL,label" is an easy way around the problem but "label" will be put in as an absolute address at assembly time and won't tell you anything about where the code was actually loaded.

Although it lacks in PC-relative addressing modes, the Z-80 has lots of registers and a relative jump so at least writing the startup code in a relocatable fashion isn't too hard itself.

Chuck(G) · May 19, 2017

Without relying on external hardware (e.g capturing the bus address at an M1 cycle in an external latch) or using the absolute location of something in memory (an RST trap for example), no. A CALL or RST (or even an interrupt) will push the current value of the PC onto the stack, but the call or interrupt or RST has to reference some absolute location. Likely another inheritance from the 8008.

Petrofsky · May 19, 2017

gp2000 said:
Here's the puzzle part: Can you figure out a way to get the program counter in code without using these known subroutines?

I've always figured you need to either: (1) know in advance of a fixed address that will always point to three bytes of RAM (on any hardware configuration you support) and that lies outside the range that the loader might ever place your own code; or (2) know that the loader will always set SP for you to such a location, and also know of a fixed ROM address that has a RET instruction.

The "CALL $000B" is just a fancy version of option (2). Simply knowing about the RET at $0043 is enough. You use the bare RET by doing "CALL $0043; DEC SP; DEC SP; POP HL".

****SPOILER***** (Rest of text is white; highlight to see it)

For option (1), on a Model I/III, the RAM at $4000 suffices. You can use it like this:

Code:

LD HL, $4000
DI
LD A, (HL)
LD (HL), $C9
LD HL, ($4001)
LD SP, $4003
CALL $4000
DEC SP
DEC SP
POP DE
PUSH HL
LD ($4000), A
EI
EX DE, HL

Edit: here's cleaner version. It depends on 4 bytes of known RAM at $4000 (rather than 3), but uses only 17 bytes of code (rather than 25).

Code:

[CODE]
LD SP, $4000
LD HL, $E9E1	; POP HL / JP (HL)
DI
POP BC
PUSH HL
POP HL
POP DE
CALL $4000
PUSH DE
PUSH BC
EI

Chuck(G) · May 19, 2017

Again, as I said, unless you know the absolute location of something, you can't get the value of PC. Period.

Were it possible, we'd see self-relocating ROM code. Which we don't.

Petrofsky · May 19, 2017

Chuck(G) said:
Again, as I said, unless you know the absolute location of something, you can't get the value of PC. Period.

Were it possible, we'd see self-relocating ROM code. Which we don't.

How about if I showed you code that depended on no hardware (other than the Z-80 itself), used no absolute addresses, and returned the lower 14 bits of the PC? I'm not sure that's useful, but it's kind of interesting.

Chuck(G) · May 19, 2017

I'd like to see that.

Petrofsky · May 19, 2017

Chuck(G) said:
I'd like to see that.

Oops, I can't do it. I was thinking you could do a JR, which copies the PC to the undocumented WZ (a/k/a MEMPTR) register, and then read the lower 14 bits of WZ by counting how many increments of WZ (accomplished by the CPI instruction) it takes before bit 13 of WZ changes (bit 13 can be copied into bit 5 of F with the BIT n, (HL) instruction).

I was forgetting you need to already know your place (i.e. where you were loaded) before you can read WZ. First, because to read bit 5 of F you need to PUSH AF into memory somewhere, and second because the loop around the CPI instruction cannot use JR (because that overwrites WZ), and must instead use the JP (HL) instruction (with the branching accomplished by arithmetic on L before the JP), and that requires we know what address to initialize HL to.

Oh well.

gp2000 · May 19, 2017

Petrofsky said:
I've always figured you need to either: (1) know in advance of a fixed address that will always point to three bytes of RAM (on any hardware configuration you support) and that lies outside the range that the loader might ever place your own code; or (2) know that the loader will always set SP for you to such a location, and also know of a fixed ROM address that has a RET instruction.

I think it's fair to assume that the stack pointer is pointing somewhere reasonable -- not overlapping your loaded code and has at least 16 bytes of free space available. My guess is that any solution making that assumption can be parlayed into code that copes with a possibly overlapping stack pointer.

I'd thought about using a fixed address but discounted it as insufficiently general. I like to think of the problem in terms of a Z-80 running with a nice, pure 64K of RAM where the code could be running anywhere. However, I think the idea can be made to work even if the fixed address chosen happens to be right on top of your running code.

Nonetheless, that's different from what I had in mind. And it occurs to me now that in a 64K RAM environment one could similarly work around the overlap problems of using a RST instruction.

MikeS · May 19, 2017

Chuck(G) said:
I'd like to see that.

You'd think by now people would know better than to argue with you ;-) ; I'm still learning...

Chuck(G) · May 19, 2017

There's an excellent reason behind the inaccessibility of PC on the x80 (8080, 8085, Z80 and NSC800) CPUs--and it comes from the 8008. There, the "stack" was simply a set of 8 14-bit registers with a 3-bit register (inaccessible to programs) pointing to the currently active PC--so a PC and a 7-deep stack that could only be accessed by CALL and RETurn instructions. No pushing or popping the general registers--and if your nesting got too deep, the oldest return address would be overwritten.

This was improved in the 8080 by a memory-resident stack and instructions to push and pop general registers on it. But the 8008 architecture still was the template.

The inability to get the value of the PC counter without "tricks" is very common in older mainframes. Some didn't even let you get it even with tricks---the IBM 1620 had a "branch and transmit" instructions that would store the P-counter into an invisible register and a "branch back" instruction that would restore it to the PC, but there was no way to read that value. Since it was a single-level affair, it wasn't used much.

The CDC6600 CPU had a "Return jump" instruction, that, like the PDP-8 JMS instruction, would store the PC counter (and an unconditional jump) at a subroutine entry point, but otherwise no way for a user program to discover the P-counter value.

On the other hand the IBM S/360 used base-displacement addressing, so it was important that a program discover where it was. This could be done with a 'Branch and Link" instruction, specifying register 0 as the target address, which the hardware interpreted as "no jump". So a program might establish its base register as R15 by coding "BALR 15,0" and then telling the assembler to assume that all addresses were displacements from the next instruction.

The PDP-11 had perhaps the most elegant solution--simply make the P-counter (and the stack pointer) one of the general registers. R7 is PC and R6 is SP. Simple and elegant. The GI CP1600 copied this model.

Petrofsky · May 20, 2017

Okay, here's a solution that doesn't assume anything about SP, uses no absolute addresses, requires no special hardware, uses no undocumented features, and only takes 14 bytes of code.

However, it assumes that: (1) the code itself is running from RAM, not ROM; (2) no harm will come from reading any memory address; and (3) if reading from a memory address returns B7, then no harm will come from quickly writing 37 then B7 to the same address. This is obviously true for George's all-RAM scenario, and also holds for any combination of RAM/ROM/video/keyboard. I'm not sure whether it's safe for all Model I configurations.

***SPOILER***

Code:

        LD A, $B7       ; Opcode for OR A
	DI
Loop    INC HL
        CP (HL)
        JR NZ, Loop
        LD (HL), $37    ; Opcode for SCF
SomeOp  OR A
        LD (HL), A
        JR NC Loop
	EI
        ; Now HL = SomeOp

gp2000 · May 20, 2017

Petrofsky said:
Okay, here's a solution that doesn't assume anything about SP, uses no absolute addresses, requires no special hardware, uses no undocumented features, and only takes 14 bytes of code.

Well done. You can save a byte by using "XOR A" and "NOP" but I haven't found anything smaller than that.

Petrofsky said:
However, it assumes that: (1) the code itself is running from RAM, not ROM; (2) no harm will come from reading any memory address; and (3) if reading from a memory address returns B7, then no harm will come from quickly writing 37 then B7 to the same address. This is obviously true for George's all-RAM scenario, and also holds for any combination of RAM/ROM/video/keyboard. I'm not sure whether it's safe for all Model I configurations.

Yeah, on a Model I you'd want to season the code to avoid danger areas. A similar method might make some headway on working from ROM but at best I imagine it could only give a probabilistic answer.

Chuck(G) · May 20, 2017

Doesn't help at all in the case of code in ROM (as I mentioned)--and if you're moving code to RAM, supposedly you know where you moved it in the first place. And woe betide you if the system uses memory-mapped I/O.

I don't see how this solves the basic problem of reading PC.

gp2000 · May 21, 2017

Chuck(G) said:
I don't see how this solves the basic problem of reading PC.

Well, once the routine completes it gives you the value of "SomeOp" in HL. Knowing the value of one location in your program you can compute other locations using relative offsets. For example, if you want to find the address of a label called "table" you would:

Code:

	ld	de,table-SomeOp
	add	hl,de
	; now HL == table
...
table:

If table wasn't data but code and you want to jump to it then "JP (HL)" will work quite nicely. And you can construct relative calls like so:

Code:

	ex	de,hl
	ld	hl,cret-SomeOp
	add	hl,de
	push	hl
	ld	hl,subroutine-SomeOp
	add	hl,de
	push	hl
	ex	de,hl
	ret
cret:
	...
subroutine:
	...
	ret

Alternatively you could take the address of "SomeOp", use it to find a table of offsets that denote where your code must be patched in order to operate at the current address. Much less cumbersome than doing address calculations constantly and keeping the address of "SomeOp" available in a register pair.

Chuck(G) · May 21, 2017

My point is that something had to load the code into RAM--and that something 'knew' where the code was being loaded. The loader could just as easily pass the location to the routine. Of course, a good loader would keep a relocation table (e.g. PRL type file) and relocate all of the code.

What I want to see is a way for a ROM-based routine to get the value of the P-counter without relying on finding some key string in the code. And that's a real problem--writing a run-anywhere ROM.

gp2000 · May 21, 2017

Chuck(G) said:
My point is that something had to load the code into RAM--and that something 'knew' where the code was being loaded. The loader could just as easily pass the location to the routine. Of course, a good loader would keep a relocation table (e.g. PRL type file) and relocate all of the code.

If you read the 80 Microcomputing articles they give some examples where code is loaded but TRS-DOS or BASIC isn't going to do any relocation. Indeed, you could still do something to pass the address of the code but really all that was just background information meant to motivate the puzzle which, honestly, I'd only meant for entertainment purposes.

Chuck(G) said:
What I want to see is a way for a ROM-based routine to get the value of the P-counter without relying on finding some key string in the code. And that's a real problem--writing a run-anywhere ROM.

I think I can solve that problem under some fairly reasonable constraints. Though you'd have to do the discovery on every entry into the ROM which would be rather inconvenient. I rather suspect the solution won't be deemed practical but here's the rough outline.

At a minimum the ROM will have to be called with a valid stack with at least two bytes of space available. We'll also need to be able to run uninterrupted for at least a brief period so that we can modify one or two bytes of RAM. The amount of RAM can vary, but it makes things a lot easier if the RAM is large and the ROM code knows where it is. The ROM can be switched in over RAM but some reasonable amount of RAM must be left exposed.

All the ROM needs is some number of CALLs to fixed RAM locations. The number of CALLS will depend on the possible system configurations. Suppose we have a 64 K of RAM and a 16 KB ROM can be switched in at any address. No matter where it is put there will be at least two 16K banks of RAM that it will not overlap. So we'll have 4 CALLs, each of which calls some location in each bank. $0000, $4000, $8000, $c000 will do fine.

The discovery code will check for RAM at each call destination (with interrupts disabled during each check, just to be safe). Track that as a series of bits in a register, setting bit N if there is ROM at N << 14. It will also set bit M to indicate which bank holds the stack pointer. That's just in the very unlucky case that the stack pointer happens to be right on top of one of our call locations.

With this particular system configuration we know at least one one of the 4 bits will not be set. Now a straightforward bit of code can test each of the 4 bits in turn. For a clear bit we save what is in the byte at the start of that bank, disable interrupts, write a $C9 (Z-80 return) byte there and branch to our CALL for that bank. Upon return we restore the byte we wrote, decrement the stack pointer twice and pop HL which will contain the return address of that CALL. Enable interrupts again and we've got a known address in our code.

Obviously this gets very involved if there is only a small amount of RAM and its location is unknown. And at some point we might run out of ROM space or registers bits to track all the possibilities. But I'll leave it as an exercise for the reader to map out the space of possible ROM/RAM configurations and the accompanying code size needed to support it.

And if we want a ROM that can support different systems with serious differences in possible configurations, well, I'd say that's just one goalpost too far.

Oh, here's another approach. For very large ROMs start off by having a contiguous chunk 512 bytes of the ROM filled with $C9. This will guarantee at least one 256 byte page of memory will be nothing but $C9. If the ROM can only be loaded on page boundaries you can get away with only 256 bytes of $C9. Now loop through memory 256 bytes at a time checking for a $C9 at $0000, $0100, $0200, $0300 ... $ff00. Use HL for that loop and break out when you find a $C9 and verify that it is a ROM location. Well, you could use a $C9 you find in RAM as you'll need to run uninterrupted for the search, but then you'd have to check that SP isn't in the way. With HL pointing to a $C9 in ROM and L == 0 you can then drop into a big ladder of conditional calls:

Code:

	dec	h
	call	z,$0100
	dec	h
	call	z,$0200
	dec	h
	call	z,$0300
	...
	dec	h
	call	z,$fe00
	dec	h
	call	z,$ff00
	dec	h
	call	z,$0000

Then take two from the stack pointer, pop DE and you have an address in the ROM. At that point you can scan forward or backward to find the end of the $C9 block and the fixed point.

Hmm, if you instead use $00C9, $01C9, $02C9, ... $FFC9 as the call addresses then you don't need the contiguous $C9 block. Just change the search loop to look at 4 consecutive bytes in each page to find the $C9. So a pretty general scheme can be put together for a little over 2 KB in overhead. Definitely not practical, but it at least puts an upper bound on the code needed to locate one's ROM.

Chuck(G) · May 22, 2017

That's straining a bit at the problem.

Compare this awkward scheme with the x86 option ROMs. No fancy tricks--calls are relative as are jumps. Absolute calls or jumps are uncommon.

I'd rather just have a resident handler in memory to handle relative calls. Much more efficient--and with no more memory used if you can use a one-byte RST for invocation.

Back the day, when we had to do a multi-user system, we simply resorted to P-code and let the interpreter handle the relocation details. It also enabled implementation of strong typing. It was fast enough to implement 4 WYSIWYG word processor sessions on a 3.5MHz 8085.

lowen · May 22, 2017

My solution? Upgrade to a Z280. Then use the 16-bit relative CALL (FDCDlluu) (and the conditional kin) and you're done.

At some point the constraints need to be realistic. There are ways of getting the program counter; if there were a 'LD HL,PC' instruction you could always set a constraint of 'get the PC without using LD HL,PC' too.

This use case is precisely why LS-DOS provides the RST 28H SVC @WHERE.

I've written a few relocating (as opposed to relocatable) routines in my day, mostly for device drivers. True relocatable code runs without needing to know where it's running.

All of these issues are solved with the Z280 instruction set, but that's a bit of a moot point.

Chuck(G) · May 22, 2017

Which mostly goes to my point that the Z80 and 8080/8085 carry the burden of the Datapoint 2200 architecture, where even the content of the call stack was not accessible to user code. It also goes a way toward explaining the awkwardness of having stack-resident variables on the x80 processors.

lowen · May 22, 2017

Chuck(G) said:
Which mostly goes to my point that the Z80 and 8080/8085 carry the burden of the Datapoint 2200 architecture, where even the content of the call stack was not accessible to user code. It also goes a way toward explaining the awkwardness of having stack-resident variables on the x80 processors.

This was another thing the Z280 fixed, IMHO. SP-relative addressing was supported, as was system-vs-user mode dual stacks. The Z280 could have been a great processor, had it been delivered as the Z800 when Z80 machines still mattered for the mainstream. As it was, at least one vendor of smart serial I/O cards used the Z280 as an I/O processor (Specialix XIO; a few on eBay right now).

But none of this solves gp2000's challenge..... which was about Z80, not Datapoint 2200/8008/8080/8085. The history is good, and reading the reasoning about why things are the way they are is nice, but the challenge remains.

I'm a bit of a fan of the 'search for a known pattern' technique and have an embedded signature that would be very unlikely in real code. Something similar to this was actually used by the TRSDOS/LDOS/LS-DOS driver module system, where each loadable module had a fairly standard header that included the module's name or abbreviation, allowing the system to walk the module linked-list and print out the module names (the 'MEMORY' command in LS-DOS 6.3.1 does this).

VCF West	Aug 01 - 02 2025,	CHM, Mountain View, CA
VCF Midwest	Sep 13 - 14 2025,	Schaumburg, IL
VCF Montreal	Jan 24 - 25, 2026,	RMC Saint Jean, Montreal, Canada
VCF SoCal	Feb 14 - 15, 2026,	Hotel Fera, Orange CA
VCF Southwest	May 29 - 31, 2026,	Westin Dallas Fort Worth Airport
VCF Southeast	June, 2026	Atlanta, GA

Relocatable Z-80 Code and a puzzle

Experienced Member

25k Member

Experienced Member

25k Member

Experienced Member

25k Member

Experienced Member

Experienced Member

Veteran Member

25k Member

Experienced Member

Experienced Member

25k Member

Experienced Member

25k Member

Experienced Member

25k Member

Veteran Member

25k Member

Veteran Member