• Please review our updated Terms and Rules here

redefining 32-bit

kerravon

Experienced Member
Joined
Oct 31, 2021
Messages
137
I only have a vague goal, so take this with a grain of salt.

Basically with the benefit of hindsight, I would like to have bought (or rented time on perhaps - I had only just started working in 1986) an 80386 computer in 1986 or close, and written a Win32 clone (what PDOS/386 is currently). I do need some supporting software though, like a C compiler. I had access to a mainframe running MVS/XA, so cross-compiling could have been done. Basically the hardware existed, and I "just" need software support - and it's not a "lot" of software to support the "minimal" goals I have in mind.

And what I am looking for is for those Win32 applications (a subset - where I set the rules), to still run on Windows 11, and now I have a new "opportunity" - it appears that those Win32 applications, if carefully written, can be run on a fairly minimal layer upon 64-bit UEFI (itself an OS, basically). This involves running 32-bit code on LM64, something that was nominally not possible, but now the jury is basically out on the definition of "possible".

Here is what I have ...

This is a simple Win32 program designed to call puts, as provided by msvcrt.dll. It is written in assembler instead of C because I don't currently have a C compiler capable of generating the required assembler:

https://sourceforge.net/p/pdos/gitcode/ ... demo32.asm

I have this loader code (search for the second occurrence of w32puts):

https://sourceforge.net/p/pdos/gitcode/ ... /exeload.c

And this is the assembler stub that is referenced:

https://sourceforge.net/p/pdos/gitcode/ ... 32hack.asm

I have tested this on both qemu and real hardware. But it's still proof of concept, and I'm not sure if I'm missing anything.

There will be issues for something like printf, where I don't know how many parameters there are. I could potentially get around that by making it mandatory for (my) Win32 executables to call getmainargs and if argc is (faked to be) greater than x'80000000' (I did something similiar in PDPCLIB for the Amiga) then it means that argv is also faked, and is a structure, and you need to go there to fetch the real values of argc and argv, as well as a "global" variable that contains the number of arguments when you call a variable-argument function. The compiler would generate code to say that if that global pointer is not NULL then set it to the argument count, otherwise, take no action as you are running under a normal Win32 environment.

You can download a disk image containing the system from http://pdos.org at the bottom of the University Challenge x64 section.

Basically I could have started programming in C90 in 1986, and produced 32-bit executables that worked fine then, and still work fine now, with minimal fuss on a UEFI machine, and not be dependent on Microsoft.

I mainly needed some "rules" on how to write 32-bit code. Knowing in advance that the x64 would obsolete something as basic as push eax. ie AMD could have set out this "roadmap" in 1986.

I'm mainly wanting to know of any show-stoppers. I thought there might have been one when my code was working on qemu but not real hardware, but 2 fixes later it started working on real hardware. ie - are there any instructions that are not available and cannot be worked around? printf was another example of something I initially thought I couldn't reasonably work around - but I think (haven't proven) that it can be worked around.
 
I'm not entirely sure I understand what you're trying to do here. Generate machine code that runs the same in both 32- and 64-bit mode? Why not use (or create, if running on UEFI?) a compatibility segment descriptor for the 32-bit code?

In long mode, any update to a 32-bit register will zero the upper bits. So the code in w32hack.asm can be made shorter:

Code:
push rbp
mov rbp,rsp

mov ecx, 12[rbp]
call puts
xor eax,eax

mov ecx, 8[rbp]
pop rbp
add rsp, 8
jmp rcx #ret

But I don't see why the upper bits of the return address would ever not be zero at this point, so you might be able to just use "ret".

Basically, all this seems to be doing is converting from stack-based to register-based calling convention. And ensuring the return code is zero, for some reason. Am I missing anything here?

-------------

msdemo32.asm: This isn't RISC, no need to load everything into a register first!

Code:
...
        sub     esp, 16
        mov     dword ptr [esp + 4], offset lc0
        mov     dword ptr [esp], offset retaddr1
        jmp     dword ptr [offset _imp__puts]
retaddr1:
        add     esp, 16
        
        sub     esp, 16 ;low-hanging fruit for peephole optimization, but going to leave it in
        mov     dword ptr [esp], 0
        jmp     dword ptr [_imp__exit]
;retaddr2:
;does not return

And I'm not convinced that emulating the 32-bit 'push'/'call' serves any real purpose here. If it's calling a 64-bit library directly without glue code, it would have to use the register calling convention instead of the stack.

Oh wait, I think I got it now, it's calling your 'w32puts' even though it doesn't use that name anywhere in the code. And loading the return address into (E/R)CX is then of course necessary, because the upper bits are occupied by the string parameter.

But why do it like this? You could have a stack-based calling convention in 64-bit too (with glue to translate it before calling any external library), while still using the normal 'push'/'call'. And if your own code doesn't use SSE, then there is no need to bother with stack alignment outside of the glue code.
 
But why do it like this? You could have a stack-based calling convention in 64-bit too (with glue to translate it before calling any external library), while still using the normal 'push'/'call'. And if your own code doesn't use SSE, then there is no need to bother with stack alignment outside of the glue code.

Thanks for the technical information. I don't understand this though. I can't use push in my 32-bit code because a "push eax" will actually be "push rax" if I am in LM64, won't it? And thus the stack will change by 8 bytes, but the C-generated code will be looking for parameters at 4-byte steps.

Another thing I only just "realized/suspected". If I have 32-bit code where an index is being used, and the index is negative, or resolves to a lower location in memory (expecting a wrap at the 4 GiB mark) - that code will not work, right? As it will instead index up into the 4 GiB to 8 GiB region. I had this issue on the mainframe (and completely forgot about it until now) and that was resolved by using page tables to map the 4-8 region to 0-4 to produce an effective wrap. On the mainframe I run my 32-bit code in AM64. I do this because there is (normally) no AM32 (except on the S360/67 and Hercules/380), only AM31 is available so you only have access to 2 GiB of memory instead of 4 GiB.
 
The 32-bit "flat mode" almost as long as the 80386.

Sorry - what is this? Is this for an 80386 or a x64? Note that I can't really (*) exit LM64 because then I would lose access to boot services.

(*) It looks like if I disable interrupts I can switch to CM32 and run code normally before returning to LM64, and this is probably a superior (but quite different) approach to the issue, and I will probably switch to that, as it is good enough for what I want (although staying in LM64 has the advantage that they can't change the rules to stop my 32-bit code from working - e.g. by deleting CM32 entirely because "no-one uses it anymore").
 
Sorry - what is this? Is this for an 80386 or a x64? Note that I can't really (*) exit LM64 because then I would lose access to boot services.

(*) It looks like if I disable interrupts I can switch to CM32 and run code normally before returning to LM64, and this is probably a superior (but quite different) approach to the issue, and I will probably switch to that, as it is good enough for what I want (although staying in LM64 has the advantage that they can't change the rules to stop my 32-bit code from working - e.g. by deleting CM32 entirely because "no-one uses it anymore").
I keep hearing that but it is complete bollocks. ;)
 
Compatibility mode is a hardware feature and won't be going away for a long time (you can even still have 16-bit protected mode segments mixed with 32- and 64-bit). Without doing any further research on this, I assume your code is running in ring 0 under UEFI, so even if there is no GDT entry for 32-bit code set up already, you can create your own?
 
Now you've done it! I thought that I'd consigned bad memories of 16/32 bit "thunking" to the garbage heap of memory.
I'm going to have nightmares for the next week... :)
A big linear address space makes things so easy, however. Got a file you need to play with? Just map it into a 64-bit address range and play with it as if it were memory-resident. None of this silly sector/block-oriented I/O.
 
Compatibility mode is a hardware feature and won't be going away for a long time (you can even still have 16-bit protected mode segments mixed with 32- and 64-bit). Without doing any further research on this, I assume your code is running in ring 0 under UEFI, so even if there is no GDT entry for 32-bit code set up already, you can create your own?

Yes, that is my exact plan. The previous plan could have coped even if UEFI had been annoying and run my code in ring 3 until I exit boot services, and also cope with a presumably cheaper processor that didn't have CM32 or CM16 baggage.

However, all plans have been superceded as I just got a good enough public domain C compiler to compile the C code I actually have in PDPCLIB and PDOS-generic (after some minor mods).

Which means I'm about to make a switch to Win64 executables running under UEFI. I think I didn't mention that here because it's not vintage. Basically there's not a lot of "glue" code required to convert the UEFI OS (which is what it is, really), into a Win64 (subset) clone, which is what I had already done as proof of concept, but now that I have the C compiler (and Win64 is the only supported target), I should be able to have a completely public domain programming environment.

It will only working on a modern machine, not a vintage machine, but that has advantages for some people. Like almost everyone.

But yeah, I may well become a "traitor" at this point, as I "follow the compiler".
 
Sorry - what is this? Is this for an 80386 or a x64? Note that I can't really (*) exit LM64 because then I would lose access to boot services.
It's the non-segmented 32-bit mode of a 386+. Basically, you get a nice, unbroken linear address space to work in--hence, the "flat" moniker.
I'm sure that this also applies to x64.
 
It's the non-segmented 32-bit mode of a 386+. Basically, you get a nice, unbroken linear address space to work in--hence, the "flat" moniker.
I'm sure that this also applies to x64.

I have that also for the 386, but I'm not using some sort of special mode. I just go into PM32 and set code and data segments to point to a flat 4 GiB address space.

I did things the hard way to achieve that, did I?
 
No, that's the simple way. Initially, Win3.1 was a mess of segment-based memory allocation, I suppose to keep some sort of conformity with the 286 mode. You call it PM32, but it's been known as "32 bit flat mode" for decades.
 
Back
Top