Writing Assemblers... What should a good assembler do?

cj7hawk · Apr 20, 2024

cjs said:
I would suggest that you want macros to take preference over internal mnemonics. There can be good reasons for wanting to "replace" a mnemonic, and allowing the user to do that does not remove other functionality, since doing so is completely optional.

It would be very inefficient on a z80 based system to search up to 50K of label space before decoding opcodes but could be done with a command line switch... Can you give me an example where you would want to replace an existing functional opcode with a macro, yet still have it look the same? If I can understand the context, I might be able to improve on the concept.

cjs said:
This sounds awkward and unnecessary. Unless I've missed something about Z80 assembly language you can always tell from context whether an instruction, location or a register name is required for any particular value, so there's no need to have instruction mnemonics or register names taking up label space or interfering with user labels.

Because when I am looking for the pattern, it's easy to remember, eg;

; Opcodes;

LD: DB 2,'LD'
PUSH: DB 4,'PUSH'
POP: DB 3,'POP'

EX: DB 2,'EX'
EXX: DB 3,'EXX'
LDI: DB 3,'LDI'
LDIR: DB 4,'LDIR'
LDD: DB 3,'LDD'
LDDR: DB 4,'LDDR'
CPI: DB 3,'CPI'
CPIR: DB 4,'CPIR'
CPD: DB 3,'CPD'
CPDR: DB 4,'CPDR'

ADD: DB 3,'ADD'
ADC: DB 3,'ADC'
SUB: DB 3,'SUB'
SBC: DB 3,'SBC'
AND: DB 3,'AND'
OR: DB 2,'OR'
XOR: DB 3,'XOR'
CP: DB 2,'CP'
INC: DB 3,'INC'
DEC: DB 3,'DEC'

DAA: DB 3,'DAA'
CPL: DB 3,'CPL'
NEG: DB 3,'NEG'
CCF: DB 3,'CCF'
SCF: DB 3,'SCF'
NOP: DB 3,'NOP'
HALT: DB 4,'HALT'
DI: DB 2,'DI'
EI: DB 2,'EI'
IM: DB 2,'IM'

It's just how I wrote the decoder in my case. Not the most efficient way in retrospect, but works well enough and is easy to edit and read back in my code.

cjs said:
How do you distinguish between 12 as a number indicating a location and as a label defined as, e.g., 12 equ 456?

Decimal numbers are evaluated before label values. There's a precedental order.

But 12_ and _12 would be valid. ( As examples ). And some invalid decimal numbers would be legitimate.

cjs said:
There are few assemblers out there that do not allow any non-alphanumerics, and I doubt that there are any for microprocessors that do not allow underscore and several others

Have you defined character sets for your assemblers. In particular, I would expect a cross assembler running on a modern system to accept Unicode (typically UTF-8) input, and correctly determine which non-ASCII characters are alphabetic. I don't know if I've ever used accented characters in labels (e.g., ârret), but I've definitely used Greek letters, and have regularly wanted to use the prime characters.

It's intended for 8 bit systems circa 1985, so unicode isn't relevant to my project. I hadn't considered other character sets, but limit the code to 7 bit ASCII common to CP/M, though of course it handles data as data. The cross-assembler won't be any different since 8 bit systems don't handle unicode, but IIRC most programmers in those cases use romanized expressions. The character set supports some greek letters, but generally I wouldn't count on that for US-ASCII systems or cp/m systems.

cjs said:
This sounds overly complex and confusing compared to just having scopes and simple scoping rules.

The user *can* do that - I just don't enforce it. And translatory labels is a convenient medium. Besides, the example you list is over 100 times larger than my assembler - even without the graphics interface, and runs on computers more than a million times faster than my target architecture. I still have to be mindful that I intend this to run as a CP/M application primarily.

cjs said:
Nesting macros is pretty much a basic requirement of any modern macro-assembler.

Not older ones. Not even for Macro Assembler, which couldn't even chain them.

However I really liked the arguments presented about why they *should* and I can't disagree, so I'm adding the functionality.

cjs said:
That sounds confusing, too. Why not just do what any sensible programming language does and have the user provide a formal parameter list for the macro or function, and to these symbols bind the argument values supplied at the call site?

As I mentioned, that's entirely possible. You just declare them with EQUs or SETs in the MACRO - which is only syntactically different from declaring them elsewhere. I do it that way for memory and performance reasons.

And as mentioned, I don't enforce it, so you can just reference it directy, and you face the same risks as you do in other languages when you choose to do it that way.

The programmer can decide which way they prefer. Both are supported. Are there examples you can think of where both ways shouldn't be supported?

cjs said:
As we all said: if you're writing two separate versions of the assembler, they will be different assemblers.

Well, yes, they are a native assembler and a cross assembler. With a common syntax, and paired releases. And can reasonably be expected to produce the same binaries, at least as well as multiple revisions of the same assembler should.

cjs said:
Yes, this is a very common thing to do. It's also normal in these files to use conditional assembly to allow the client (the program including SYSTEM.ASM or whatever) to configure exactly what the included file will generate.

I've honestly never seen that before - with a set of the assembly routines and an assembler and editor included on the boot disk for exactly that purpose. I'd love to hear about other systems that did the same to find out how well used it was and how the users perceived it -I can't imagine it was a particularly common architecture and I wouldn't do it either if I didn't plan on having a ROM based bootdisk.

The BBC BASIC was the closest I could find to that previously, where you could type in assembly from the OS or in basic.

If the included file creates macros, then there is no need for conditional assembly outside of memory management. If the macro isn't called, the code isn't generated. Though I'd definitely use conditional assembly flags on library files if included.

cjs · Apr 20, 2024

cj7hawk said:
It would be very inefficient on a z80 based system to search up to 50K of label space before decoding opcodes...

Not really, no, if you use appropriate data structures.

cj7hawk said:
Because when I am looking for the pattern, it's easy to remember, eg;

LD: DB 2,'LD'

I see no reason that any of that has to go into your label dictionary.

cj7hawk said:
Decimal numbers are evaluated before label values. There's a precedental order.

In other words, you allow programmers to define labels that are invalid. I don't know about you, but I'm not terribly pleased when a program hides errors from me.

cj7hawk said:
The user *can* do that - I just don't enforce it.

But the user could also just write his entire program in machine code. The point of programs like assemblers is to make life easier for developers, not just to "allow" them to do things.

cj7hawk said:
Not older ones. Not even for Macro Assembler, which couldn't even chain them.

Again, wrong. Merlin, for example, supports 15 levels of macro nesting. The Microsoft Macro Assembler (circa 1980) isn't explicit on this, but I suspect that it supports nesting as well, given their documentation for EXITM.

cj7hawk said:
As I mentioned, that's entirely possible. You just declare them with EQUs or SETs in the MACRO....

That is not a formal parameter list.

cj7hawk said:
The programmer can decide which way they prefer. Both are supported. Are there examples you can think of where both ways shouldn't be supported?

Yes. Basically every programming language ever.

cj7hawk said:
And can reasonably be expected to produce the same binaries, at least as well as multiple revisions of the same assembler should.

Multiple revisions of the same assembler are the same source code with relatively few changes. That's a very different thing from two entirely different programs.

I think you mean "hoped" instead of "expected" there. This is very much a, "what could go wrong?" approach.

cj7hawk said:
I've honestly never seen that before - with a set of the assembly routines and an assembler and editor included on the boot disk for exactly that purpose.

I am not surprised. It should be clear by now that there's a lot of stuff that even old assemblers do that you've not seen before.

cj7hawk said:
If the included file creates macros, then there is no need for conditional assembly outside of memory management.

Macros are conditional assembly. It doesn't matter whether you call a macro to generate a bunch of data definitions and functions that you then call, or you set a flag to generate a bunch of data definitions and functions that you then call. But generally, it's easier to use defined constants and "if" to parametrize conditionally generated code than a macro, especially if there's any substantial amount of it.

whartung · Apr 20, 2024

If you can nest INCLUDEs there's no reason you can't nest MACRO expansion. Mind, not MACRO DEFINITION. Doesn't make a whole lot of sense to but a DEFMACRO within a DEFMACRO. But, going back to point 1 (about INCLUDEs), "eh", why not? (Just be cognizant that MACROs may always have GLOBAL scope, but...that's ok.)

You should absolutely allow DEFMACRO inside of INCLUDE.

Conditional assembly, outside of trivial cases, is a absolute requirement. There are all sorts of scenarios where its valid. Support for different hardware is one example.

There are all sorts of scenarios where conditional assembly will apply to INCLUDEd files. A simple contrived case is some kind of "DEBUG" flag that interlaces a bunch of extra debug or tracing code into the final program if it is set.

The typical limitation of nesting is usually stack space.

At a casual glance of the Z80 instruction set, I don't readily see examples where there would be a conflict between a register usage and a memory usage.

So, that should allow you to use labels like HL or BC if you want.

I certainly wouldn't recommend it.

But, again, still, finally, it's your project. "If it assembles your assembler, then it's feature complete".

cj7hawk · Apr 21, 2024

whartung said:
If you can nest INCLUDEs there's no reason you can't nest MACRO expansion. Mind, not MACRO DEFINITION. Doesn't make a whole lot of sense to but a DEFMACRO within a DEFMACRO. But, going back to point 1 (about INCLUDEs), "eh", why not? (Just be cognizant that MACROs may always have GLOBAL scope, but...that's ok.)

Macro's don't have a global scope. All macros in my assembler are labels, so if you can hide a label, you can hide a macro. The only exception is that you can't copy macro contents in a formula, for obvious reasons.

Attempting to use a macro in a formula will only reflect it's length, though that can be used if someone wants to do something unpredictable like report on their macro use as a percentage of memory or similar.

whartung said:
You should absolutely allow DEFMACRO inside of INCLUDE.

100% agree. It's also possible to include an include within a macro, though I can't figure out a good reason to allow or disallow that yet. It's just a default behavior.

whartung said:
Conditional assembly, outside of trivial cases, is a absolute requirement. There are all sorts of scenarios where its valid. Support for different hardware is one example

At the moment I support four conditional statements.

IFZ
IFNZ
IFPOS (16 bit) - Includes zero as positive.
IFNEG (16 bit)

The conditional line checks the subsequent value, which can be a formula, and acts on it, and there is no equals, so to do something like if LABEL=5 the statement would be
IFZ LABEL-5

Which assembly programmers should have no difficulty with.

It's a little primative, but I can't think of anything it can't do with just those tests. It should be possible to turn Macro's into IF statements also, so something like;

MACRO IFBIT7

  IFNZ ARG1@$80  ; I could declare the value here as something other than ARG, but it's a single line so I'm going to be lazy and reference it like this.

MEND

And it would be called just like the other IF statements, eg;

IFZ <value>
IFBIT7 <value>

Though I haven't tested that specific case yet. Though I can't think of any reason it wouldn't work that way given the existing code.

After that, it would work just like any other IF statement, except it's "If bit 7 is set then conditionally execute the following".

It would also be equally possible with macros to do something like, IFBIT 7,<value> so that the bit being tested could be changed. Though thinking of that makes me realize I have another mathematical operator I need to add for up to 16 levels of indices.

I will include such examples though in the manual so that they give an idea of how the macros can be used.

Interestingly, it takes less memory to define that as a macro than it does to code it in the source, so I nearly left out IFPOS and IFNEG.

whartung said:
.

There are all sorts of scenarios where conditional assembly will apply to INCLUDEd files. A simple contrived case is some kind of "DEBUG" flag that interlaces a bunch of extra debug or tracing code into the final program if it is set.

I use that specific example in the assembler code myself. If debug is non-zero, it turns on a very, very verbose mode of operation. It's very helpful at times and shows me exactly what the assembler is doing.

whartung said:
The typical limitation of nesting is usually stack space.

I use the stack for the nesting of includes, since it saves the previous file state in memory, and I share system stack and nesting stack, so nesting halts the current process file, saves it on the stack along with all the assembler state variables, and loads up a new file then starts off the assembly process as though it was just starting, including changing how it reports on the file causing the error, etc. Then when it completes, instead of exiting, it returns to what it was doing previously. Some system variables such as the PC are affected, of course, but most return to their original state.

Macros just have three variables. A pointer, a counter and a pointer to the macro itself so it can refer to itself if necessary. 6 bytes. So I need to reserve space for it's own "stack" but that's a permanent memory loss, since the system stack grows from the top down, and the label space grows from the bottom up, and they give up when they meet in the middle sufficiently that a single operation could crash the system stack and generate an error.

whartung said:
At a casual glance of the Z80 instruction set, I don't readily see examples where there would be a conflict between a register usage and a memory usage.

So, that should allow you to use labels like HL or BC if you want.

I certainly wouldn't recommend it.

But, again, still, finally, it's your project. "If it assembles your assembler, then it's feature complete".

Yes, that one's a personal preference, so I don't imagine everyone would do that the same

The only labels that can't be used are system labels and argument labels, and if the programmer really wants to, they can reuse even those, so only the system marker is sacrosant in the end since it's used to turn system labels on and off. But that shouldn't cause too many issues. I think in a later iteration, I did shortcut IX to HL to add the extra undocumented commands without too much extra code, so IX would may not work as expected in some of those cases. But that's not one I intent to change so I'll document it.

I still have a few gotcha's in my code, but it's a small enough list that I can note it on a few lines. Another is that I don't support EX AF,AF' - only EX AF,AF. Same thing, but the syntax drops the exchange register flag syntax.

Thanks again for the thoughts and support.

Svenska · Apr 23, 2024

cj7hawk said:
Or do you mean an "ADD" label as in something like ADD: or EQU ADD,VALUE? An ADD label is possible on my assembler, but not an ADD macro since the ADD opcode would take preference in the parser. Well, it is possible to have an ADD macro too, but it would be impossible to call it.

Yes. I am especially thinking about a label named "LOOP" conflicting with the x86 instruction "LOOP". That has always struck me as a stupid restriction for no good reason. On the Z80, there is no loop instruction - but there is still no good reason to disallow all mnemonics implicitly.

cj7hawk said:
Given it's a z80 assembler, you can imagine that I have labels already like A, HL, DE, BC etc as well as labels line INC, DEC, LD and all the other opcodes are labels.

Yes, that's an unnecessary restruction. Your parser should know whether it expects an instruction or not, and then have separate tables.

cjs said:
Have you defined character sets for your assemblers. In particular, I would expect a cross assembler running on a modern system to accept Unicode (typically UTF-8) input, and correctly determine which non-ASCII characters are alphabetic.

I am nor particularly fond of full character set freedom. In the past, I've seen Japanese source code and being unable to read symbol names crippled any good attempt at dealing with it. It is not fun. Even transcribed names are much easier to deal with.

Proper Unicode support is painful and expensive to implement and not compatible with 8-bit systems. They can neither input nor render all Unicode characters. I'd recommend simply staying ignorant, treating everything as ASCII (and relying on the guarantees of UTF-8 to make it work in the common case).

CP/M itself is a 7-bit system (not all systems are even 8-bit transparent, and the CP/M API itself is not). Theoretically, that rules out UTF-8 support completely.

Personally, I would never use greek or 'special' characters in labels and consider it a Bad Idea(TM).

cj7hawk said:
It would be very inefficient on a z80 based system to search up to 50K of label space before decoding opcodes but could be done with a command line switch...

That means your code is most likely not scaling efficiently. Do you use hash tables?

cj7hawk · Apr 23, 2024

Svenska said:
Yes. I am especially thinking about a label named "LOOP" conflicting with the x86 instruction "LOOP". That has always struck me as a stupid restriction for no good reason. On the Z80, there is no loop instruction - but there is still no good reason to disallow all mnemonics implicitly.

With the latest update, only opcodes and assembler directives can't be used as labels, and group labels and normal labels can coexist with the same name. The latest update supports local and global labels with any degree of granularity, and it's even possible to have groups within groups. Routines can turn individual labels on and off when required - eg, a subroutine or an include ( or even a macro ) can have the same labels as every other subroutine. Anything in a closed group is bypassed in search. All system labels can also be turned on and off as required, even mid-macro.

Svenska said:
Yes, that's an unnecessary restruction. Your parser should know whether it expects an instruction or not, and then have separate tables.

Each instruction handler knows exactly what to expect and goes looking for it. If it doesn't find it, then it looks for alternatives and so on. It's not a restriction that I can see. I was saying it's perfectly acceptable to have labels like HL: and DE: and to mix them in instructions, where the register isn't expected. Of course, if the instruction expects a register, then a register takes precedence. Though all tables are shared. It keeps memory requirements down ( and it's not a table driven assembler ).

Svenska said:
That means your code is most likely not scaling efficiently. Do you use hash tables?

I don't. The data tables are a two way function for debugging purposes so labels are stored as ASCII in a linked list. Hash tables would increase memory efficiency, but decrease search efficiency. Unless I sorted them or something like that.

cj7hawk · Apr 23, 2024

Svenska said:
Yes. I am especially thinking about a label named "LOOP" conflicting with the x86 instruction "LOOP". That has always struck me as a stupid restriction for no good reason. On the Z80, there is no loop instruction - but there is still no good reason to disallow all mnemonics implicitly.

With the latest update, only opcodes and assembler directives can't be used as labels, and group labels and normal labels can coexist with the same name. The latest update supports local and global labels with any degree of granularity, and it's even possible to have groups within groups. Routines can turn individual labels on and off when required - eg, a subroutine or an include ( or even a macro ) can have the same labels as every other subroutine. Anything in a closed group is bypassed in search. All system labels can also be turned on and off as required, even mid-macro.

Svenska said:
Yes, that's an unnecessary restruction. Your parser should know whether it expects an instruction or not, and then have separate tables.

Each instruction handler knows exactly what to expect and goes looking for it. If it doesn't find it, then it looks for alternatives and so on. It's not a restriction that I can see. I was saying it's perfectly acceptable to have labels like HL: and DE: and to mix them in instructions, where the register isn't expected. Of course, if the instruction expects a register, then a register takes precedence. Though all tables are shared. It keeps memory requirements down ( and it's not a table driven assembler ).

Svenska said:
That means your code is most likely not scaling efficiently. Do you use hash tables?

I don't. The data tables are a two way function for debugging purposes so labels are stored as ASCII in a linked list. Hash tables would increase memory efficiency, but decrease search efficiency. Unless I sorted them or something like that.

Bruce Tomlin · Apr 23, 2024

cj7hawk said:
Given it's a z80 assembler, you can imagine that I have labels already like A, HL, DE, BC etc as well as labels line INC, DEC, LD and all the other opcodes are labels.

This is what Svenska was talking about. They are "tokens", but they don't need to be in the same symbol table with labels. When you get the next token from the line, the "get word" or whatever gets a whole chunk of alphanumeric characters, or just a single character, and returns a type of whether it's valid for a symbol name. It can also try to identify "reserved words" (like HL, DE, BC, etc.) and those can be possible return values. Special characters like parens and commas would be returned as a single character. If a symbol is appropriate, then you can go look it up from the symbol table.

In my own assembler I usually call a function with the expected reserved words for the context when needed, then it can return a value from a list of register names, etc. to be used in a switch statement. Something like findreg("BC DE HL SP") that returns 1-4 for a match. Also, opcodes are in their own table, along with whatever hints they need (usually an instruction format type and a parameter), separate from symbols. Macro names are also in another table, they only matter when doing the opcode lookup.

As for hash tables vs linked lists, etc., all that is just optimization that can be hand-waved away from the pov of the assembler. Either way, it should just be addsym/findsym functions that don't reveal how they work inside. My own assembler still uses linked lists underneath because it has never been anything but instantaneous on even a 1990s CPU, so I have never had the need to improve it. And for sorting to print the symbol table, I use a linked list bubble sort because it was so easy to write and understand.

Svenska · Apr 23, 2024

Using slow and non-scaling algorithms (linear search, bubble sort) is fine as long as the problems are small. A cross-assembler for an 8-bit CPU will always be fast enough. But repeatedly searching a large symbol table on a slow Z80 processor using a bad algorithm will hurt.

Hash tables can substantially improve search speed at the cost of a little memory. In general, algorithmic improvements will always beat code optimizations (except for pathologic cases).

cj7hawk · Apr 24, 2024

Bruce Tomlin said:
This is what Svenska was talking about. They are "tokens", but they don't need to be in the same symbol table with labels.

Ahh, got it - My bad description is at fault here. The assembler doesn't hold normal instructions in the label table. I just use the same name as the opcode for my labels for memory purposes when writing the assembly source, since they are small remove any interpretation errors when debugging.

There are some system labels I keep in the label table so that the source can reference these ( eg, the Program Counter, Pass counter etc ) It does cause some interesting "meta" problems when the assembler assembles itself since it's trying to create labels that already exist.

But the instructions are not in this list.

Bruce Tomlin said:
In my own assembler I usually call a function with the expected reserved words for the context when needed, then it can return a value from a list of register names, etc. to be used in a switch statement. Something like findreg("BC DE HL SP") that returns 1-4 for a match. Also, opcodes are in their own table, along with whatever hints they need (usually an instruction format type and a parameter), separate from symbols. Macro names are also in another table, they only matter when doing the opcode lookup.

I do the same. And I have one for BC DE HL AF and one for BC DE HL SP. Also a single one for A B C D E H L (HL) and I just logically add these to an opcode frame. The only thing I do a little weirdly is how I process IX and IY, which both get contextually translated to HL, then I add a prefix and a suffix and sometimes a displacement. But the assembler literally sees something like A,(IX+n) as A,(HL+n) with added flags for prefix/suffix and Displacement since the getword function knows when to translate IX/IY to HL. It's a great way to add all of the undocumented opcodes, and I'm pretty sure I have a few undocumented opcodes possible to assemble without error that are not listed because of that.

Bruce Tomlin said:
As for hash tables vs linked lists, etc., all that is just optimization that can be hand-waved away from the pov of the assembler. Either way, it should just be addsym/findsym functions that don't reveal how they work inside. My own assembler still uses linked lists underneath because it has never been anything but instantaneous on even a 1990s CPU, so I have never had the need to improve it. And for sorting to print the symbol table, I use a linked list bubble sort because it was so easy to write and understand.

A good hashing function also takes time on an 8 bit cpu, which reminds me I need an algorythm for CRC16 generation for a project I'm working on.

Linked lists work well for label tables, especially given I use mine to store macros and marked labels also, and for storage of some system variables. The main driver was code reuse. I use the same label matching code for every kind of symbol that isn't an opcode. It's not that inefficient since if it gets a mismatch, it jumps on to the next label without checking further, so in many cases, it only checks a single character and moves on to the next label directly.

It doesn't need to calculate the start of the next label either because it's a linked list, and only when I get a full match do I check for other things like whether it's been tagged, or if it's a macro or has other content ( eg, closed groups ). This makes for a very versatile data structure, and means I can expand into other memory where available since I only use two data structures that grow towards each other in memory. The system stack and the label table. I might reorder instructions to reduce the line to line delays, but I've yet to try the assembler on a real z80 based CP/M machine yet. What's missing then will be other assemblers to compare it to.

Svenska said:
Using slow and non-scaling algorithms (linear search, bubble sort) is fine as long as the problems are small. A cross-assembler for an 8-bit CPU will always be fast enough. But repeatedly searching a large symbol table on a slow Z80 processor using a bad algorithm will hurt.

Hash tables can substantially improve search speed at the cost of a little memory. In general, algorithmic improvements will always beat code optimizations (except for pathologic cases).

The idea of using a hash table for testing labels though never entered my mind when designing the assembler. Using a linked list that is dynamically manipulated though isn't a bad way to search for labels, since entire sections of the list can be disabled from search and the algorythm just skips over them and jumps directly goes to the recent labels. It's still linear, but that's more due to the fact the source code is linear.

Thinking that through, making a reverse linked list would have made more sense, since statistically, I'm more likely to see a recently declared label than an earlier declared one as I get further through the source code pass.

Those kinds of ideas were things I was looking for when I started this thread.

For practical purposes once the assembler is complete, I'll likely never write the z80 version again, but sometimes I do wonder at the "What if" scenario had I written this during the 80s.

cjs · Apr 24, 2024

Svenska said:
Using slow and non-scaling algorithms (linear search, bubble sort) is fine as long as the problems are small.

Actually, at least on modern machines, sometimes it's better. On modern systems memory locality usually has far more effect on performance than how many instructions you're running: having to go to memory for something that's not in the cache can lose you hundreds of machine cycles, and it just gets worse if the memory is mutable and shared with other threads of execution. This is why back when I was programming a high frequency trading system in Haskell a lot of my smaller (20-40 item) data structures were straight linked lists.

cj7hawk said:
A good hashing function also takes time on an 8 bit cpu...

Not necessarily. The effectiveness of hashing algorithms depends on the data that they're hashing; very small and fast algorithms can be fine, even optimal, on the correct data. Though for instruction mnemonic lookup I'd have a look at using a trie, which, with some careful design should be not only very fast, but very compact. (After all, you have only five bits per character to index, which you might split into 2 for the first level and 3 for the level below, and for the first 2-4 levels you can probably save even more space by not storing indices in the for those levels but just calculating an offset into full expansions of 4 or 8 values for the next level.

That all wants to be pre-calculated, which you can do with a fixed set of opcodes. Macro names cannot be because you need to insert on the fly, but there there's likely to be few enough that a straight linked list (or even array) will be plenty fast, allowing you to do macro lookup before opcode lookup so that macros can override opcodes.

cj7hawk said:
Linked lists work well for label tables, especially given I use mine to store macros and marked labels also, and for storage of some system variables. The main driver was code reuse. I use the same label matching code for every kind of symbol that isn't an opcode.

You can separate all of these and still use the same search code if you simply parametrise it with the address of the head of the list.

cj7hawk said:
Thinking that through, making a reverse linked list would have made more sense, since statistically, I'm more likely to see a recently declared label than an earlier declared one as I get further through the source code pass.

Oh, lord, are you appending to the tail of your linked list? All that does is make you keep an extra pointer. (And, as you point out, probably greatly increase your search time as the list grows because locality of use.)

Additionally, if you're not putting much or anything else on the heap, you can significantly reduce space and increase efficiency by, when you allocate a new entry, checking to see if it's adjacent to current head and leaving out the link pointer if it is. (You just need a tag bit in one of your other fields to indicate whether or not this has been done.)

Svenska · Apr 24, 2024

cj7hawk said:
A good hashing function also takes time on an 8 bit cpu,

You don't need a particularly good hashing function. One one hand, a Z80 doesn't have much memory to spare for a large table, so there will be collisions - on the other hand, your hash keys (symbols, instructions, registers) are not random data. Your could take a few bits off the first character, which is cheap.

A 3x or 5x speedup in symbol searching will be noticable on a Z80. On a modern system, it won't be.

cjs said:
Actually, at least on modern machines, sometimes it's better. On modern systems memory locality usually has far more effect on performance than how many instructions you're running: having to go to memory for something that's not in the cache can lose you hundreds of machine cycles, and it just gets worse if the memory is mutable and shared with other threads of execution.

Very much true. In general, scalability considerations tend to ignore the constant part, and a cheap O(n²) may outperform an expensive O(log n) for quite large problems. After all, even randomized algorithms with a worst-case O(∞) are used in practice.

But hey, we are discussing a Z80 host here. I promise that it won't suffer much from cache misses.

cjs · Apr 25, 2024

Svenska said:
You don't need a particularly good hashing function. One one hand, a Z80 doesn't have much memory to spare for a large table, so there will be collisions - on the other hand, your hash keys (symbols, instructions, registers) are not random data. Your could take a few bits off the first character, which is cheap.

My random thought on this is to use a 256-entry table, and generate the key by just looping through all the characters in the symbol, for each XORing it into the current hash value and then rotating the hash value left by 1 bit.

Though I'm also attracted to a prefix trie. The problem with that is, the code to update a trie is probably substantial, and needs more sophisticated memory management. None of that is an issue with using a trie for mnemonics, though, since that is constant and so can be pre-generated.

Svenska · Apr 25, 2024

cjs said:
My random thought on this is to use a 256-entry table, and generate the key by just looping through all the characters in the symbol, for each XORing it into the current hash value and then rotating the hash value left by 1 bit.

I was thinking about hjalfi's approach, which uses 5 bits of the first character and hand-codes (!) the initial hash table for all instructions and registers, for a 32-entry table. New symbols are added to the same table.

A prefix tree for instructions probably won't be efficient: Instruction mnemonics are four letters at most, same as two 16-bit pointers (and there are not many instructions to start with). I guess loading the pointers to follow the tree takes longer than loading the actual values. The 8080/Z80 isn't very good at pointer-chasing.

cj7hawk · Apr 25, 2024

I'm also curious as to how much more efficient different schemes would be.

And what size hashes we're talking about? 32 bit? 64 bit? I'm not sure anything over 24 bit is going to be significantly faster.

Many mnemonics are 2 to 3 bytes, so a hash requires an encoding operation, is computationally impractical if decoding is used, and would need to be more efficient for things like LD, INC, IN, OUT, RRA, RLA, etc... Very few opcodes exceed three characters in length or start with the same character, so taking a 2 byte code, hashing it, then looking through a hash table which is just a match for a similar number of bits. It really doesn't provide any advantage.

I can mask out 4 characters as the maximum opcode and send everything else to label processing, and provide shortcuts for functions like that. Any label that is encountered where an instruction is intended is assumed to be a macro until proven otherwise.

So if we're only searching 2 to 3 bytes on average, and we don't match once a non-match is confirmed, and we match on length as the first test, and if we break mnemonics up into groups of 2,3 and 4 then I wonder if a hashing algorytm will be any improvment over that at all.

Especially if it could do 2 byte, then 3 byte then 4 byte mnemonics and then labels at the first step.

And for search I have a table, but it's format is something like;

FIRSTLABEL: DW SECONDLABEL ; Next label location.
LENGTH: DB $00 ; Bytes of label ( not total label length )
LABEL: DB 'LABEL' ; Text of the label here.
VALUE: DW $0000 ; The value of the label.
EXTRA: DB '?????' ; Macro contents or other special data.
SECONDLABEL:
And so on.

So the algorythm checks the length in the response first. If it doesn't match, it jumps to the next label.

Then if the first character doesn't match, it goes on, and so on. It only continues to match if it gets matches.

It's not a complex algorythm, but at the same time, we're still only matching a couple to a few bytes per opcode. The objective of hashing here would likely be consistent opcode spacing (eg, regularly 6 bytes per opcode ) which saves space, but if that's ignored, again the short size of labels, and the extra work to create the hash when it's found may not provide a significant advantage - but having the labels in memory lets me do other things like list all the labels in a group, or list the table in it's entirety. Something that I'm not sure how to approach with a hash table.

cj7hawk · Apr 25, 2024

Svenska said:
The 8080/Z80 isn't very good at pointer-chasing.

I've seen people use the stack pointer for those kinds of objectives. Load the next vector, compare 2 registers at a time. Very linear. Set up the stack and just keep on popping off the table contents.

Svenska · Apr 25, 2024

cj7hawk said:
And what size hashes we're talking about? 32 bit? 64 bit? I'm not sure anything over 24 bit is going to be significantly faster.

A hash table with 32 buckets uses 5 bits to select the bucket. Hashes are used to cheaply skip comparisons, not to make the string comparisons more efficient.

Example: Instead of a linked list of symbols, you create 32 of them. As a hash function, you take the low five bits of the first bytes (which is similar to "the first character, case-folded"). The hash (5 bits) select the list to search in/add to. You never need to touch the other 31 lists because they will never match.

cj7hawk said:
I've seen people use the stack pointer for those kinds of objectives.

That's a neat hack, but still doesn't mean the CPU is good for pointer-chasing. It also doesn't work with interrupts enabled.

cj7hawk said:
algorythm

It's spelled "algorithm".

cj7hawk · Apr 25, 2024

Svenska said:
A hash table with 32 buckets uses 5 bits to select the bucket. Hashes are used to cheaply skip comparisons, not to make the string comparisons more efficient.

Example: Instead of a linked list of symbols, you create 32 of them. As a hash function, you take the low five bits of the first bytes (which is similar to "the first character, case-folded"). The hash (5 bits) select the list to search in/add to. You never need to touch the other 31 lists because they will never match.

For opcodes, you can still do that with unhashed characters and the search space is quite limited for opcodes, even just matching first letters and you can drop that down to 4 bits and 16 buckets if you ignore Bit0 and are case agnostic. Given the number of 2 byte opcodes, I'm not sure significant gains would be made, but then I haven't tried my assembler on a real z80 system yet, so I may yet change my mind.

I never thought of using multiple linked lists though for my labels implementing something like this, but it would mean rethinking my data structure which isn't practical with the way I wrote the present iteration of code.

Hashes might still be a far more efficient an approach than I've used, but I'd have to work out some rough numbers to better understand it. It's a bit late to plan adding it to my code now, but might be useful in future, and as mentioned, I still have yet to field test my code to determine how it runs in the wild.

Though I finally thought of a good benchmark for my existing code... Well, a functional benchmark. I converted CP/M's BDOS/CCP to my assembler format, so I can see how well my current assembler performs against the original 8080 assembler to go from source to binary. Given my assembler doesn't require manually linking, if I ignore keyboard time, it should be a good comparison. I'll have to get things set up on a real system to try it out.

Svenska said:
That's a neat hack, but still doesn't mean the CPU is good for pointer-chasing. It also doesn't work with interrupts enabled.

Though it makes me wonder, as I never tried out extended z80 architectures. Did they ever come up with things like LD DE,(HL)?

cjs · Apr 26, 2024

Svenska said:
I was thinking about hjalfi's approach, which uses 5 bits of the first character and hand-codes (!) the initial hash table for all instructions and registers, for a 32-entry table. New symbols are added to the same table.

I am pretty sure that you're better off in many ways using separate tables for instructions/pseudo-ops, macros and non-macro symbols. In particular, you can never have anything other than an instruction, pseudo-op or macro in the instruction field, so this lets you use size and efficiency tricks for the constant table, and the macro table will typically be very small.

Svenska said:
A prefix tree for instructions probably won't be efficient: Instruction mnemonics are four letters at most, same as two 16-bit pointers (and there are not many instructions to start with). I guess loading the pointers to follow the tree takes longer than loading the actual values. The 8080/Z80 isn't very good at pointer-chasing.

Ah, but who said anything about using any pointers?

One obvious way of doing a trie that contains no pointers is to use the index of the entry at any level to find the table of entries for the next level. A small (and silly) example would be to have five-entry tables containing ABCDE; this exact top-level table would be followed by five more tables also containing ABCDE, which in turn would be followed by twenty-five pointers to linked lists for the lists containing strings starting with AA, strings starting with AB, etc. etc.

With some care, I think it may be possible to construct an entirely pointer-free trie for instruction mnemonics, though it might use a little bit of linear search at the very end.

And just in case it's not obvious, there's no reason that you need to make any level in the trie match the way you happen to break down your data. E.g., instead of using all letters A, B, C, ..., Z on the first level, you leave two bits for the next level, making the top level @-G, H-O, P-W, X-_. And in the next level, there's no need to limit it to those last two bits, but you could include further bits from the next letter. Nor does even the key size need to be the same from level to level.

Bruce Tomlin · Apr 26, 2024

I know it's fun to argue about what's the best way to do a hash, but the simple fact is, it's premature optimization. The hash doesn't matter if the rest of the assembler doesn't work, but the assembler will still work if it uses a simple linked list.

Writing Assemblers... What should a good assembler do?

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Experienced Member

Veteran Member

Experienced Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Veteran Member

Experienced Member

Experienced Member