• Please review our updated Terms and Rules here

MIPS ARCSystem Magnum 4000PC-50 LSI Logic L1A7385 Prototype

TMM

Member
Joined
Apr 16, 2023
Messages
16
Location
The Netherlands
I recently came in possession of a MIPS ARCsystem Magnum 4000PC-50 but it is acting a little weirdly. It appears quite unstable in Windows NT4 and its behavior in the ARC runtime. For instance the ARC runtime on my Magnum will not reliably execute MIPS-II based ECOFF files that I produced whereas both Qemu and MAME will execute the programs just fine. I don't know if this is an emulator deficiency or if my board is weird. The reason I think my board may be weird is because of all of the bodge wires on it:
PXL_20230415_173243082.jpg

I have noticed that in place of a NEC μPD31432 ARC address path chip there's a chip called L1A7385 (The same part number listed on the Jazz development board at the Smithsonian https://www.si.edu/object/microsoft...d-jazz-rev-1-mips-r4000-processor:nmah_742557). There is also a whole extra chip mounted to it "Dead bug style" as well.
PXL_20230416_010806060.jpg
The problem I have is that I can't find any pictures of a Magnum 4000 motherboard online anywhere, so I do not know if this is normal or if my board is weird.

On Qemu and MAME Windows NT4 appears to be quite stable, whereas on my real machine it crashes under high I/O load. I've done quite a bit of troubleshooting (In no particular order):
* Visually inspected all the solder points on the chips.
* Used contact cleaner on the entire front surface of the board and all connectors.
* Tried with era appropriate scsi devices instead of modern ones.
* Tried Windows NT4.0 and 3.5.1.
* Wrote a memory testing program for the Arc firmware which, in a loop, writes patterns to every memory location not used by firmware and reads it back (no errors detected there). (https://github.com/hpvb/MIPS_ARC_memtest)
* Ported doom to the arc firmware (runs successful in attract mode for over an hour sometimes, then freezes).
* Tried booting with some cards removed, but the system will not function without them so that's not an option.
* Replaced the memory with known good memory.
* Replaced the Dallas clock chip with battery with new old stock.
* Removed the cpu, used contact cleaner in the socket, applied thermal paste instead of a pad.
* Applying light pressure to the board in various places, flexing it slightly, while the system is running to see if perhaps there's some hairline crackes.

Bootnote

For anyone interested I have also attached two ISO files to this post: ARCDoom_iso.zip which is doomgeneric ported to run "natively" on ARC as well as ARC_memtest_iso.zip. Which is, well, the memory test. The source code for the memory test is also here: https://github.com/hpvb/MIPS_ARC_memtest

Running the isos (on emulation make sure you select "initialize system" from the "Setup option at least once.)

Running ARCDoom: "Run a program" and then "cd:\doom"
Running Memtest: "Run a program" and then "cd:\memtest"
 

Attachments

  • ARCDoom_iso.zip
    2 MB · Views: 4
  • ARC_memtest_iso.zip
    5.5 KB · Views: 3
  • PXL_20230415_174038991.jpg
    PXL_20230415_174038991.jpg
    2.2 MB · Views: 31
  • PXL_20230415_235710381.jpg
    PXL_20230415_235710381.jpg
    1.7 MB · Views: 28
Your board is weird. My three Magnum 4000 machines all have L2 cache, for a start. I'll have to write a longer reply a little later.
 
Your board is weird. My three Magnum 4000 machines all have L2 cache, for a start. I'll have to write a longer reply a little later.
Oh wow, you have three of them? I don't suppose you'd be interested in selling one to me with a normal board? 😄
 
Oh thats epic!!! I can test it on my Deskstation Tyne.

Is there a source code available for ARC Doom?
 
Longer reply still pending. Here's a photo in the meanwhile. I pulled this one out because the machine is very flaky, it crashes all the time in ARC (or the unix firmware), it won't even boot an OS. I suspect a L2 cache fault but wanted to try to swap the CPU first... but I can't actually figure out how to remove it. The heat sink is mechanically coupled to the ground fence.

Anyway.
 

Attachments

  • DFCBE5EB-5CFB-4FAD-9503-44040C2B91B7_1_105_c.jpeg
    DFCBE5EB-5CFB-4FAD-9503-44040C2B91B7_1_105_c.jpeg
    498.9 KB · Views: 26
Oh thats epic!!! I can test it on my Deskstation Tyne.

Is there a source code available for ARC Doom?
For doom not yet as there's also a port of newlib that needs to go with that, and i haven't cleaned it all up yet. I will though!

The doom port probably won't work on the Tyne as arc doesn't offer graphics support ARCDoom basically has a "driver" for the g364 framebuffer. I think the Tyne uses an S3 based card? I can perhaps add support though! 😄

EDIT: I had a quick look at what information is available on the Tyne and it is very scarce, without knowing what address the VGA chipset is mapped to I don't know how to make ARCDoom run on it. If there's any text-mode program you'd like to see on the Tyne from ARC I'd be happy to try and port it for you though :) Or I'd need access to a Tyne, but they are a bit thin on the ground here in The Netherlands.
 
Last edited:
Longer reply still pending. Here's a photo in the meanwhile. I pulled this one out because the machine is very flaky, it crashes all the time in ARC (or the unix firmware), it won't even boot an OS. I suspect a L2 cache fault but wanted to try to swap the CPU first... but I can't actually figure out how to remove it. The heat sink is mechanically coupled to the ground fence.

Anyway.
I think I can help with that, I think that is how the heatsinks are supposed to be connected. If the heatsink is attached to the CPU the clips can very easily touch some CPU pins.

There should be a little notch in the legs of the cooler, insert a screwdriver in those notches and very carefully bend the opposite direction and lift. If you do that on two clips on one side the clips should come loose.

After that the clips on the other side should be very easy to remove. Then you can lift the finstack off and you should be left with just the cpu in the socket.
 
Short update: I installed 128MB of ram from a tested source and it seems more stable in NT now, but that might simply be because now it doesn't have to swap nearly as much.

I also wrote a disk check for arc that writes and reads to a 10mb file with several patterns which didn't find any issues.

The instability of this board remains a mystery.

@bear I think that your boards might be 4000SC-50 models and not 4000PC-50 like mine. The R4000PC doesn't support L2 cache at all.
 
I thought I read (somewhere?) that one of the main design changes that occurred between the Jazz and the Magnum was the addition of L2. It's news to me there were any non-SC Magnum 4000s.

I've removed the spring clip from the heat sink (you can see from my photo it's missing), but it's still stuck fast.

I need to clear some bench space to put the working ARC firmware Magnum (the other is set up for RISC/os) out and try your binaries.

edit: now you mention it I do see other references on the internet to 4000PC Magnums. so there it is, I suppose.
 
This is the badge on the case of my magnum.
PXL_20230418_181546380.jpg
Do yours say SC50 by any chance?

I'd still be very interested in buying one off of you if you're selling. I'm trying to get some information on this system together online, and get the sound hardware working in qemu and/or MAME.

Since I'm still not sure whether this board is a prototype or not this machine isn't super suitable for that kind of research. 😄
 
never mind re: heat sink. just needed some gentle-ish prying. also it shed chrome flakes off the ground fence all over the place. yikes.
 
never mind re: heat sink. just needed some gentle-ish prying. also it shed chrome flakes off the ground fence all over the place. yikes.
Auch! I hope you managed to clean it all off!

As for the stability of my system, the different ram didn't really help, still got a new bluescreen.

PXL_20230418_183154741.jpg
 
That exception is probably a clue, the arguments are all zero. The middle two should be instruction/context addresses. So you've got a (probable?) null pointer dereference in the kernel which, given the limited range of available/supported hardware for this platform, one would think should be a moderately strong indicator of a hardware fault.

 
That exception is probably a clue, the arguments are all zero. The middle two should be instruction/context addresses. So you've got a (probable?) null pointer dereference in the kernel which, given the limited range of available/supported hardware for this platform, one would think should be a moderately strong indicator of a hardware fault.

Yeah, I agree. I suspect the scsi controller chip. It has a weird shiny mark on it that people seem to agree with me is likely a burn spot.

I bought some chipquick, a new FAS216 chip and a PLCC-84 socket. I'm going to try to replace it and see what happens.

Did you manage to replace your CPU? Did it help? :)
 
I've installed Linux on the magnum now, still seeing some instability issues but maybe now I can get some more information as to what is actually wrong. I have a serial console now, so perhaps Linux kernel will panic or oops in a way that's a bit more useful than a bsod.

PXL_20230419_231324351.jpg
 
I wrote a little program that users O_DIRECT access to the SCSI drive and after not too long it'll begin writing nonsense to the drive. It starts with some extra zeroes in between good data, and after a while just zeroes.

I'm pretty confident at this point that the SCSI controller chip is fried. Running the same program on a tmpfs works just fine (without O_DIRECT)

Without O_DIRECT it also works fine on the real drive, but judging by the indicator lights it is all happening in cache at that point. (Which is why I used O_DIRECT in the first place)

I'm going to try to replace the FAS216 chip.
 

Attachments

  • PXL_20230420_160145601.jpg
    PXL_20230420_160145601.jpg
    3 MB · Views: 9
I can think of some other potential faults giving this sort of result, but the FAS216 is probably the easiest thing to find a spare for, so a reasonable place to start.
 
I can think of some other potential faults giving this sort of result, but the FAS216 is probably the easiest thing to find a spare for, so a reasonable place to start.
Yeah, it could be the DMA controller as well, possibly the cpu as well. If you have other ideas I'd love to hear them!
 
The DMA controller was along the lines of what I was thinking as possible other sources of trouble. Could also be something like a marginal gate involved in the enable line of a bus buffer - wouldn't enjoy the thought of having to track that down with a logic analyzer.

CPU seems unlikely to me as I'd expect it to show up with memory and other I/O devices in that case. I wasn't sure if the Magnum even had a DMA controller, or if so what other devices on the board would make use of it for testing. Ethernet maybe, floppy disk maybe, but... I wasn't sure, so I didn't say anything specific.
 
The the floppy drive uses DMA as well, but the linux driver for the Jazz floppy controller seems to have bitrotted a bit and some quick hacking didn't fix it.

Trying ethernet might be a good idea though, that's also DMA driven.
 
Back
Top