Self Modifying Code... Best Practice or avoid?

cj7hawk · Jan 17, 2024

Hi All,

Just wondering about the group's thoughts on self-modifying code on CP/M systems.

Some people regard this as contentious while others see it as a valuable way to save space and extend the system.

In this context, I'm talking about either modifying the instruction itself, or modifying the bytes extending the instruction, so that when the instruction executes, it's functionally changed - It could be as simple as setting up a label over the top of the displacement byte in an IX/IY operation, or as complex as changing the code function that executes within a routine - eg, Changing ADD to SBC etc.

I've heard arguments supporting and denouncing it over time, but was wondering on how common a practice this is within CP/M and what people think about the practice itself.

Thanks
David

krebizfan · Jan 17, 2024

I tend to be in the opposed to self modifying code camp. Six months after it is written, no one including the author is exactly sure what the code is doing.

If there is no other method to get the code to work, then use it. Try alternatives first.

durgadas311 · Jan 17, 2024

I was also "brought up" on the opposing side. I'm pretty sure DRI avoided it completely (although I have not scoured every line). There were some vendors that used it, sometimes in a fairly controlled way like to be able to make an I/O port variable/configurable. Some have used it to obfuscate the code from reverse-engineering. I do dislike any code that can't cleanly be represented in the assembler, like when an address or other operand ends up being an opcode under certain circumstances.

Plasmo · Jan 17, 2024

I try not to do self modifying code unless I absolutely have to. My most recent example is generating 800Khz pulse code modulation with Z80 to drive WS2812 (Neopixel) display. I imagine there might be self modifying CP/M BIOS code dealing with the floppy disk.
Bill

WSM · Jan 17, 2024

After 50+ years of writing assembler code, I'm very much against self-modifying code. Having written a LOT of S/370 assembler code in a large real time multitasking environment, it was always written as reentrant code. From this experience I learned to avoid programming "tricks" and adopted the philosophy that someone else would probably be trying to understand and/or modify the code at a later date. One unique instruction in the S/360 set is the EXecute instruction which effectively modifies another instruction for one execution i.e. setting the length of a move instruction.

With microprocessor code, I try to keep the code and data areas separate plus not use self-modifying code. Besides avoiding the obvious difficulty in debugging self-modifying code, this allows the code to be directly executed from either ROM or RAM. It's also fairly easy to implement a form of overlays when required for something like common areas in a banking environment. Code to be used for multitasking takes a bit more effort to only use data areas that are either on the stack or obtained via some kind of ALLOC mechanism.

resman · Jan 17, 2024

Self modifying code or just-in-time compilation? Kind of depends on how you think about it. On a small computer, generating/modifying code on the fly can be advantageous to save space and/or improve performance. But it requires a great deal of thought to pull it off properly. Not something you should consider without weighing the alternatives and understanding the implications of your choice. However, I use it in my bytecode interpreter to great effect. It is a very small routine that runs in a well defined area of memory, so pretty easy to contain the implementation details.

Just another perspective…

1944GPW · Jan 17, 2024

There's at least one programming environment where self modifying code is not only useful, but actively encouraged

...
Back in the mid-80s Scientific American had an article about a new game called Core War where two programs battle it out in memory, each trying to hunt down and destroy the other program.
Sort of like a software version of Battle Bots I guess, but they came much later.
It was a huge lot of fun and a friend and I really got into writing our programs in Redcode, the assembly-like language the fighting programs are written in and running them against each other. Self modifying code could be used to improve the program for things like enemy detection hits etc. as it moved about in memory.
https://en.wikipedia.org/wiki/Core_War

whartung · Jan 18, 2024

resman said:
Self modifying code or just-in-time compilation?

...

Just another perspective…

Actually, as in common use, they're fundamentally different.

I don't think anyone considers code compiled on the fly to be "self modifying code". While the code is generated, the executive managing that code is unchanged, and generated code is also, typically, treated as temporal in nature. It's an artifact of a larger process (i.e. the compiler).

Self modifying code has historically been done to save space, or for very performance sensitive code.

For example, rather than putting a calculated value into memory, that can then be loaded into a register, the instruction stream is modified to update that variable in place. Thus instead of (in 6502 parlance) storing 0xBC into address 0x55, having LDA $55, you instead use the immediate LDA #00, and replace the 00 with 0xBC.

This saves a cycle (LDA from zero page is 3 cycles, LDA immediate is 2 cycles). Perhaps this in important for a tight loop or an interrupt handler.

6502 FIG Forth. It's NEXT routine is an indirect JMP. We store the next words address into a memory location, and us the 6502 indirect JMP instruction. You could easily just replace that with an absolute JMP, and keep changing the JMP address. Mechanically its identical -- store two bytes in RAM, and JMP to the JMP instruction. Semantically, they're quite different. At a minimum, now the kernel is ROMable.

The only other folks that use self modifying code are folks trying to obscure and obfuscate their code (copy protection routines, etc.).

As a general rule, for routine programming, self modifying code is not worth the long term headache it creates.

Svenska · Jan 18, 2024

cj7hawk said:
Just wondering about the group's thoughts on self-modifying code on CP/M systems.

There are multiple schools of thought. Self-modifying code can be very useful, but it has some pretty steep disadvantages restricting its use.

cj7hawk said:
I've heard arguments supporting and denouncing it over time, but was wondering on how common a practice this is within CP/M and what people think about the practice itself.

Code which modifies itself is hard to read, write and modify as part of development, and obviously never ROMable. Advanced runtime systems, such as overlay managers, also need to be aware of it, as it prevents some optimizations related to swapping/paging. Some CPU architectures (such as AVR) do not support self-modifying code at all, others require jumping through some hoops (manually invalidating instruction caches, etc) to make it work. In modern, deeply pipelined and heavily cached CPUs, self-modifying code will run with heavy speed penalties. In other words, it is always a headache, even when used properly.

On the other hand, self-modifying code can save a lot of code duplication, work around instruction set deficiencies (especially on 6502), and significantly improve performance when used well. Sometimes, you simply have no choice but to write such code. If you know your environment well and you have no intention of ever porting the code to a different environment, self-modifying code is a good tool have access to. Otherwise, it's not worth doing.

To a lesser extent, this also applies to code generation / just-in-time compiling. If you really need the performance, it is good to have the option. But it is always better to start without using it - and it will stay useful as a fall-back solution even when you switch over.

edit: As far as your thread title goes: Self-modifying code is never Best Practice. It is more a tool of last resort (except on 6502).

Hugo Holden · Jan 18, 2024

It might depend on what the program does.

I wrote a program for filling memory with bytes in my PET as part of a DRAM diagnostic system, it is blocked moved from ROM to a small area of RAM so that it can use self modifying code. The real disadvantage of it struck me; is that the program cannot live and operate from ROM.

Also not being an experienced programmer at all, it was the only way I could see to get it to work. But it was more about modifying data contained in the program as immediate values, rather than the instruction itself, but the program code still gets modified to make it work. Also, many programs can be human modified, by poking a data value into the program, that is also what I did so that the memory could be filled with any particular byte, so that I guess would be called "Human modifying" code.

But, it is a one pass affair and after the program terminates it "repairs" itself by going back to the original state/values, so its ready for use again.

Maybe in a case like this, it is not such a crime, because I could not see how it could result in any harm or significant risk of malfunction.

But I did wonder with large programs with self modifying code everywhere, especially instruction changes, it could turn into a complete nightmare to understand it or debug it, especially when it was part way along some complex process. So maybe try to avoid it, unless forced into it.

cj7hawk · Jan 18, 2024

I've never disliked self-modifying code, though nearly all my early work was embedded so there was little reason to use it on eproms. I still program both the harvard and von-neuman architectures and mix techniques from time to time.

Also, like many, I've avoided SMC, mainly because I was taught to always do that... Yet on reconsidering it, it's difficult to find a valid reason to always avoid it. Likewise I was taught never to use GOTO because it wasn't "structured" yet it's entirely valid in structured programming and can be quite useful at times. Mostly the reasons for avoiding SMC are the same as the reasons for avoiding Goto - because they make it more complicated to follow how a program works.

Like Whartung, sometimes the instruction isn't there - it's not just a case of a shortened instruction time, and is more of a "Why can't I do this with my registers" issues.

One of the two opcodes I wish z80 had was LD (IX+R),r/n - That is, to specify an offset from a register rather than a physical byte in memory when using index registers.

This opcode *does* lend itself to self-modifying code, since you can do something like this;

LD (IX+0),$00
EQU OFFSET1,PC-2 ; Set a label at the offset so that I can change it's offet value (Changing the displacement, not the value written).

As Hugo pointed out, you reset the variables pre-execution and there's no fault in the program. It's not like there's an equivalent set of longer instructions to do this. It's pretty much a case of use HL and DE and (HL) to achieve the same outcome, which is a workaround at best.

I see the feelings around self-modifying code haven't changed much from what I remember back in the 80s, so I guess then the question is when *should* SMC be acceptable?

And when used, what *are* the best practices for documenting etc?

Badly documented code is difficult to read whether it's Self Modifying or not. Well documented code should be no more difficult for someone else to read than well-documented self-modifying code.

Plasmo · Jan 19, 2024

I wonder whether you'll consider this as self-modifying program when a program creates another program in memory then executes the resulting program? This is commonly done in bootstrapping when a small bootstrap loads & executes application program from mass storage or when a loader program decrypt or unzip an application program then run it. The process can become rather elaborate when a processor executes a serial data stream (or compact flash data stream) as instructions to create a small program in memory and then executes that small program to load & run larger application program.

I've designed ROM-less computers based on serial-bootstrap or CF bootstrap.
Bill

cj7hawk · Jan 19, 2024

@Plasmo The Amstrad PCW was romless. It bootstrapped from raw data from the floppy... Even the CRTC does not work until it boots... Software defined everything. It can't even write a message to tell you to insert a disk.

If your applicaiton just moves code around and executes it, then it doesn't sound like it's self-modifying unless the instruction contents get changed along the way, or some of the opcodes are conditionally written depending on state (eg, changing the payload of a LD statement ). Though an application unpacking it's own executables might still be self-modifying depending on the reasons. At the end of the day, I guess if you decide it's far enough from the norm, then it's in the self modifying camp.

At the lighter end of the scale, hooking jumps is self modifying code without a doubt, as is relocating absolute code, but I doubt anyone would question those specific examples despite that it's a clear and unambiguous example.

On the other hand, changing a few instructions each execution in a routine to change the function of the routine seems to be at the far end of the spectrum, and is definitely frowned upon.

Makefile · Jan 19, 2024

Plasmo said:
I wonder whether you'll consider this as self-modifying program when a program creates another program in memory then executes the resulting program? This is commonly done in bootstrapping when a small bootstrap loads & executes application program from mass storage or when a loader program decrypt or unzip an application program then run it. The process can become rather elaborate when a processor executes a serial data stream (or compact flash data stream) as instructions to create a small program in memory and then executes that small program to load & run larger application program.

I've designed ROM-less computers based on serial-bootstrap or CF bootstrap.
Bill

Optimizing BitBlt by generating code on the fly - The Old New Thing

Artisanal bit block transfers made to order.

devblogs.microsoft.com

resman · Jan 19, 2024

whartung said:
Actually, as in common use, they're fundamentally different.

I don't think anyone considers code compiled on the fly to be "self modifying code". While the code is generated, the executive managing that code is unchanged, and generated code is also, typically, treated as temporal in nature. It's an artifact of a larger process (i.e. the compiler).

Self modifying code has historically been done to save space, or for very performance sensitive code.

As there may be a semantic difference, the truth is both are doing fundamentally the same thing, and for the same reasons: performance and even size, depending on the reason. The EGA/VGA Windows driver used a compiling BitBLT routine that generated the ROP code on the fly, as there were just too many permutations to have static versions. Just in time compilation really wan't used on small 8 bit computers, but in certain instances self-modifying code can be very useful, especially if certain opcodes just didn't have the right addressing mode needed. So, a difference is semantics? Perhaps, but less than one might think.

whartung said:
For example, rather than putting a calculated value into memory, that can then be loaded into a register, the instruction stream is modified to update that variable in place. Thus instead of (in 6502 parlance) storing 0xBC into address 0x55, having LDA $55, you instead use the immediate LDA #00, and replace the 00 with 0xBC.

This saves a cycle (LDA from zero page is 3 cycles, LDA immediate is 2 cycles). Perhaps this in important for a tight loop or an interrupt handler.

6502 FIG Forth. It's NEXT routine is an indirect JMP. We store the next words address into a memory location, and us the 6502 indirect JMP instruction. You could easily just replace that with an absolute JMP, and keep changing the JMP address. Mechanically its identical -- store two bytes in RAM, and JMP to the JMP instruction. Semantically, they're quite different. At a minimum, now the kernel is ROMable.

Unless your name is Bill Ragsdale, *he* actually put the indirect JMP in zero page, right in front of W. So it is actually using self-modifying code, even though it's very small.

whartung said:
The only other folks that use self modifying code are folks trying to obscure and obfuscate their code (copy protection routines, etc.).

As a general rule, for routine programming, self modifying code is not worth the long term headache it creates.

I'm not saying one should use self-modifying code willy-nilly, however don't discount it as a useful tool in your toolbox. Kind of like saying you should NEVER use GOTO.

Someone · Jan 19, 2024

Self modifying code isn't the only hurdle. There are many sneaky techniques to minimise the program size and maximise the performance of Z80/8080 programs.
When someone used self modifying code from ROM a template is kept in ROM then copied to RAM during initialisation with the RAM copy being self modified prior to its execution.

Remember "It takes twice the genius to decipher a genius's program"

cjs · Jan 19, 2024

Sometimes it is best practice; sometimes (probably far more often) it is not. Sometimes it should be avoided, sometimes not.

Like so many things, you can't generalise this to an "always" case: self-modifying code has a utility and and a cost, and both the utility and the cost depend on the particular situation in which it's used.

So if you're tempted to use it, examine in more detail the utility and especially the costs (e.g., difficulty in understanding what's going on, chances of failure, etc.) for the particular situation and then make a value judgement.

The most recent version of this I saw was in the TK-85 trainer board ROM: to support the IN and OUT routines in the monitor (which take a port number and a value) it created "dyn_in" and "dyn_out" routines (my names) in RAM that it would patch to do an IN or OUT to a particular port. This was costly not only in terms of understanding and risk (something could trash the "static" parts of the routine after it was set up), but it also used up 6 bytes of memory, which doesn't sound so bad until you release that's reduced the amount of user RAM on the machine from 918 bytes to 912 bytes.

But there's no other reasonable way to do this, since the 8085 doesn't have the ability to do an IN or OUT with the port specified in a register, and this monitor functionality is quite useful. So there's little question that this was the way to go.

whartung · Jan 19, 2024

resman said:
As there may be a semantic difference, the truth is both are doing fundamentally the same thing, and for the same reasons: performance and even size, depending on the reason. The EGA/VGA Windows driver used a compiling BitBLT routine that generated the ROP code on the fly, as there were just too many permutations to have static versions.

The distinction, I feel, between SMC and JIT compiling is that the former "changes" the code. If you look at the source code, and then look at the binary, notably after it has been modified, the code has changed. Using the Forth example, it no longer JMPs to where the source code says it would jump. This changes the flow and logic from what is listed in the source code.

JIT transforms the code but maintains the semantics and logic. It doesn't change where to jump (at least logically, the address may be different but the "code" at the destination is the same as it was before the transformation), just how the jump is done.

Like global variables and every other "never do that" trope, SMC is simply something that may belong in a toolkit, but is best left at the very bottom, underneath heavy things like the pipe wrenches, and pulled out for very limited and specific purposes.

resman · Jan 19, 2024

whartung said:
The distinction, I feel, between SMC and JIT compiling is that the former "changes" the code. If you look at the source code, and then look at the binary, notably after it has been modified, the code has changed. Using the Forth example, it no longer JMPs to where the source code says it would jump. This changes the flow and logic from what is listed in the source code.

There is no indirect JMP in the 6502 figForth source code. It is actually built during initialization in RAM so there is no change to any code from a listing. Your Forth example would be more a rudimentary combination of JIT *and* SMC. But, to your point, in this day and age one could say there is a distinction between SMC and JIT. However, back in the day there were (as per your own example) cases that blurred the distinction.

whartung said:
JIT transforms the code but maintains the semantics and logic. It doesn't change where to jump (at least logically, the address may be different but the "code" at the destination is the same as it was before the transformation), just how the jump is done.

Like global variables and every other "never do that" trope, SMC is simply something that may belong in a toolkit, but is best left at the very bottom, underneath heavy things like the pipe wrenches, and pulled out for very limited and specific purposes.

Sure, but when you need it, it's quite useful.

Dwight Elvey · Jan 19, 2024

Self modifying on today's processor would be a pain. Code cache and out of order execution would make it hard to do.
For your CP/M machine, and personal use, do what works for you. Just don't expect others to understand what you are doing.
I've been guilty of sch tricks when pushed into a corner.
Dwight

Self Modifying Code... Best Practice or avoid?

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Experienced Member

Veteran Member

Experienced Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member