The "Ahl Benchmark" of BASIC performance

Chuck(G) · 2024-05-17T05:19:56+0100

cjs said:
Opcode mnemonics are part of the syntax, and very often don't match the CPU documentation. I can't remember whether which of LDA A,#0 or LDAA #0 is official Motorola 6800 synta

For a non-isomorphism, consider 8086 assembly. What does "MOV" translate to? One thing that I found moderately amusing is that the same mnemonic and syntax translates to a different binary rendition when run through, say, MASM vs. DEBUG.

One thing that I found annoying about MASM 2.0 or thereabouts was the automatic and silent transmogrification of LEA reg, imm to MOV reg, imm. Damnit--when I say "LEA" I mean "LEA".

voidstar78 · 2024-05-17T05:59:51+0100

Hmm, but when you do a LIST, you are effectively "de-compiling" the set of BASIC tokens back into a set of symbolic keywords. So, from a parsing perspective it just seems a lot of similar concepts: if your opcode or p-code is one byte, then you have a set of 256 symbols that get interpreted into some kind of operation or action.

I'm just saying there is a reason BASIC became popular on those late 70s micros rather than something like FORTRAN or C. With very limited ROM resources and effectively no (affordable) file system (the 76-79 timeframe), something like C gets a lot harder to pull off even if you have 64K. Implementing BASIC is similar to an assembler, in just being a kind of token-translator (but of course opcodes get implemented by some microcode, while BASIC tokens are implemented in a ton of opcodes).

Most BASICs (at least that I'm aware of) aren't compiled and are continuously interpreted. Like looking at that 1974 "Illinois" BASIC, they chose to use ASCII 134 for "IF". So if you write your code on the screen

Code:

10 IF A = 5 THEN PRINT "FIVE"

BASIC's "workspace" stores ASCII 134 somewhere, corresponding to the token of "IF". The interpretation of that token changes depending on your "mode" (i.e. doing a LIST versus RUN). LIST will just expand ASCII 134 back to the letters "IF", while during RUN a whole slew of things has to happen (starting with looking at the operands following that token), with the support of those tokens all implemented in ROM.

Similar stuff for an assembler - it encounters the symbolic sequence "ADD" and narrows down a set of opcodes, then parses a few more operands to resolve addressing mode and such, to make a final decision about the corresponding byte code.

I know some "TinyC" and "TinyPascal's" were written for micros - I'm just not sure how effective they were. Like for the SuperPET, I can't recall if its Pascal was in ROM? Or just mentally think about how to implement a "real" high level language without a file system: you're parsing say 10KB of plain-text that's in RAM. That plain-text is parsed into multiple assembler files (say one for each function, so say its 5 of those - you have to decide where in RAM that gets stored because "who knows when" any of those functions gets called {yes by analysis you could figure that out and decide to inline or not}). Now you have to load your assembler -- maybe that's in a ROM, so you can just initialize a vector of the addresses your assembly is in RAM (all while not stomping over the original 10K plaintext -- which maybe you could flush to a casssette storage and just make users reload that -- which I think C on the C64 was like that {a very painful experience}), and pass that to the assembler. That assembler is going to still need its own RAM for some buffering... Finally you get a linker, that has to stitch it all together using either "relocatable code" or carefully not interfere with any of the RAM addresses that your dev tools are using. Not impossible, but I'd say a lot more to coordinate and tackle than tokenized BASIC And that's how I can appreciate BASIC for what it is - make some logic decisions and control output reactions, and store stuff in arrays, in a much easier symbolic fashion even on very-resourced limited systems.

krebizfan · 2024-05-17T06:27:40+0100

Some of the Microsoft BASIC descendants used double byte tokens, piggybacking off one of the higher value characters to permit an extra 128 possible tokens. The Color Computer had only one set of double byte tokens. IBM BASIC for the IBM PC had 3 sets of double byte tokens providing a potential of 512 tokens in addition to the 128 characters of 7-bit ASCII. AFAICT, no MS BASIC variant tried giving a token a value under 128, even as the second byte of a two byte token.

BASIC wasn't the only ROM choice for micros. Forth was famously used by the Jupiter Ace and a number of trainers*. Some trainers had a simple editor and assembler in ROM. The Nascom had an option for Blue Label Pascal in ROM which saved 16K of RAM. Blue Label became Turbo Pascal. Then computers reached the point of switching to software that wasn't for development with applications in ROM.

* The Multitech MPF-1/88 might be a leader in this. ROMs could have BASIC, Assembler with Editor, and Forth installed with a menu to decide which was in use. Plus, with the addition of an expansion board, graphics cards, floppy controllers, and memory cards could be added resulting in a system that could run some IBM PC software. Multitech became Acer and they switched to more conventional clones fairly soon.

Plasma · 2024-05-17T13:48:36+0100

cjs said:
Opcode mnemonics are part of the syntax, and very often don't match the CPU documentation. I can't remember whether which of LDA A,#0 or LDAA #0 is official Motorola 6800 syntax—I think the former—but many programs use the latter and there's no issue there. For a less trivial example, I use only Z80 mnemonics for 8080 programming, even through they don't match the documentation of the 8085 CPU on my trainer board where I run them.

The key again is isomorphisms: there is a strict one between an 8080 opcode and its operands and its 8080-syntax assembly language representation, and the same between an 8080 opcode and its operands and its Z80-syntax assembly language representation. So all three represent the exact same thing, and you can derive any one from any of the others.

If you want others to be able to read your code, they should match or at least be close (LDA vs LDAA). I would consider the Z80 vs 8080 mnemonics a special case, and you are still matching the documentation for one of those. It would be very unwise for an assembler author to "invent" their own mnemonics that are completely different from any CPU docs.

cjs · 2024-05-17T14:57:10+0100

Chuck(G) said:
For a non-isomorphism, consider 8086 assembly. What does "MOV" translate to? One thing that I found moderately amusing is that the same mnemonic and syntax translates to a different binary rendition when run through, say, MASM vs. DEBUG.

This is why I was careful to say that "Opcode mnemonics are part of the syntax" and "there is a strict one between an 8080 opcode and its operands." MOV itself could translate to multiple different opcodes; you need the operands in order to know the opcode.

voidstar78 said:
Hmm, but when you do a LIST, you are effectively "de-compiling" the set of BASIC tokens back into a set of symbolic keywords. So, from a parsing perspective it just seems a lot of similar concepts: if your opcode or p-code is one byte, then you have a set of 256 symbols that get interpreted into some kind of operation or action.

No, tokenised BASIC is very different from P-code. Tokenised BASIC is another form of source code; P-code is object code generated by a compiler that is one implementation of a compilation of the the source code (out of many possible ones).

That there's a direct and easy isomorphism (performed by the tokeniser and the LIST command) should make it clear that tokenisation is source code. And you can also observe that tokenisation does no checks of even syntax, much less semantics: you can type 10 =IF)7 into a C64, which will happily accept it, and show it back to you with LIST, though this is nonsense as a BASIC program.

voidstar78 said:
I'm just saying there is a reason BASIC became popular on those late 70s micros rather than something like FORTRAN or C.

Yes. And that would be because Paul Allen, Monte Davidoff and Bill Gates wrote MS-BASIC right at the start of the personal computer revolution, marketed it well, and sold enough of it early on that it essentially became a standard that soon (nearly) every manufacturer decided to follow.

They chose BASIC because it was what they happened to know; there were other options that were both more powerful and easier to implement, such as Lisp.

voidstar78 said:
Implementing BASIC is similar to an assembler, in just being a kind of token-translator....

No, that's very, very wrong. To see why that's so wrong, write a little "token-translator" that can deal with something even as simple as:

Code:

10 J=0
20 GOSUB 100
30 END
100 IF J > 100 THEN RETURN
110 FOR I = 1 TO 3: J = J + I : NEXT
120 GOSUB 100

What you will come up with will be very much different from an assembler.

voidstar78 said:
Or just mentally think about how to implement a "real" high level language without a file system: you're parsing say 10KB of plain-text that's in RAM. That plain-text is parsed into multiple assembler files....

Well, yes, that would be very difficult, if you chose such a ridiculous way to implement it. Not even compilers with a file system available are quite so foolish.

If you're going to implement a compiled high-level language without a filesystem, you'd generally do it the same way assemblers without filesystems were done: you have an area of RAM for source code, another area of RAM for object code, and you directly assemble/compile the source to object code. There's no need for or point to a linker in such a situation; linkers are useful when you have library archives, which clearly you don't in that situation.

And, in fact, at last one (somewhat) high-level language was done in exactly this way. I can't recall the name of it now, but I believe it was something from Motorola designed for doing mathematical programs.

Chuck(G) · 2024-05-17T16:00:27+0100

cjs said:
This is why I was careful to say that "Opcode mnemonics are part of the syntax" and "there is a strict one between an 8080 opcode and its operands." MOV itself could translate to multiple different opcodes; you need the operands in order to know the opcode.

But that was my point--MOV syntax being exactly the same between MASM and DEBUG won't generate the same code in some instances. Note that there are special-cased "short" forms of MOV as well as long forms to accomplish the same thing. MASM will pick the short form, while DEBUG always uses the long general form.

krebizfan · 2024-05-17T16:08:51+0100

cjs said:
If you're going to implement a compiled high-level language without a filesystem, you'd generally do it the same way assemblers without filesystems were done: you have an area of RAM for source code, another area of RAM for object code, and you directly assemble/compile the source to object code. There's no need for or point to a linker in such a situation; linkers are useful when you have library archives, which clearly you don't in that situation.

And, in fact, at last one (somewhat) high-level language was done in exactly this way. I can't recall the name of it now, but I believe it was something from Motorola designed for doing mathematical programs.

Turbo Pascal did that in its early versions as befits a design that started off on a cassette only system.

The heavily documented p-code system was UCSD's and information on that can be found at http://pascal.hansotten.com/ucsd-p-system/more-on-p-code/

The "Ahl Benchmark" of BASIC performance

Chuck(G)

25k Member

voidstar78

Veteran Member

krebizfan

Veteran Member

Plasma

Veteran Member

cjs

Experienced Member

Chuck(G)

25k Member

krebizfan

Veteran Member