Multia VX40 troubleshooting

coredump4 · Aug 26, 2023

I was taking a look at an old Multia that's been sitting idle for a long time. It does not boot normally, but I'm able to do a fail-safe boot from the FSL image on floppy. When that boots, it spews a bunch of machine checks on the serial console. Before I write this system off, is anyone able to infer anything from these messages? If there is something I could do to fix it, I'd love to know. There's pretty much no FRUs on the VX40s other than RAM however, and I swapped in multiple pairs of SIMMs in one bank at a time to no avail. TIA.

Unexpected Machine Check through vector 00000067

IPRs:
EXC_ADD:000000000013085C ICCSR: 0000000000000000 HIER: 000000001FFFDC70
HIRR: 0000000000001042 MM_CSR: 0000000000005260 DC_STAT:0000000000000007
DC_ADDR:00000007FFFFFFFF ESR: 6FF0D8F800000C15 EAR: 6FF0D8F80000C480
STAT0: 0000000200000002 STAT1: 000003FD000003FD VA: 0000000000000000
EXC_SUM:0000000000000000 BC_TAG: 0000000000000000

Process idle, pcb = 001569E0
pc: 00000000 0013085C ps: 00000000 00000000
r2: 00000000 001306D4 r5: 00000000 00001F04
r3: 00000000 0002A9C8 r6: 00000000 001569E0
r4: 00000000 00000048 r7: 00000000 0013085C

exception context saved starting at 00157140

GPRs:
0: 00000000 0000001F 16: 00000000 00000800
1: 00000000 00000000 17: 00000001 20000000
2: 00000000 0013085C 18: 00000000 00000001
3: 00000000 00000002 19: 00000000 00000001
4: 00000000 00000001 20: 00000000 00027D40
5: 00000000 00142420 21: 00000000 0010F2C0
6: 00000000 001569E0 22: 00000000 00156AF8
7: 00000000 0013085C 23: 00000000 0000001F
8: 00000000 000E2C68 24: 00000000 00000000
9: 00000000 00142B90 25: 00000000 00000001
10: 00000000 00000001 26: 00000000 0005824C
11: 00000000 001295E0 27: 00000000 00130FB0
12: 00000000 00000001 28: 00000000 00128B60
13: 00000000 00000000 29: 00000000 00157B48
14: 00000000 00000040 30: 00000000 00157280
15: 00000000 00000000

dump of active call frames:

PC = 00130858
PD = 0010CED0 (krn$_idle)
FP = 00157B48
SP = 00157280

R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R29 saved starting at 00157B58

R2 = 00001000
R3 = 00130E70
R4 = 00157BC8
R5 = 00157BC8
R6 = 000001E8
R7 = 00000000
R8 = 00000000
R9 = 00157BC8
R10 = 00000000
R11 = 0000001D
R12 = 00000400
R13 = C0000E20
R29 = 00000000

Wildfire · Aug 27, 2023

I sent this to someone who is active in comp.os.vms
Eventually he is able to help or can at least say if this system is a dead brick now.

He has deep knowledge in reading crashdumps...

coredump4 · Aug 27, 2023

Wildfire said:
I sent this to someone who is active in comp.os.vms
Eventually he is able to help or can at least say if this system is a dead brick now.

He has deep knowledge in reading crashdumps...

Thank you, I appreciate it!

Wildfire · Aug 28, 2023

Hello,

first of all i want to ask you about the ram you used - are you 100 percent sure it is the right ram ?
I could take a look in my VX40 which i want to get up and running when i have the time...
The ram must be full parity 36 bit, i have 2 modules 4m x 36 in it.

If yours is also right there's no good news, i got the following info:

literal EMB$C_MCHECK_670 = 2; ! A Machine Check 670 - Processor UCE

UCE = UnCorrectable Error

The infos you posted are no crashdump bc the system was still pre boot.

But he was able with this here:
Unexpected Machine Check through vector 00000067

to take a look into some OpenVMS source code he has access to and got the upper info.

There seems to be a troubleshooting manual at https://archive.org/details/udb-man/page/n41/mode/2up

When you take a look in it there is said for the VX40 you can only try new memory modules as a test.
If this would be the other versions VX41 or 42 you could also remove the cache module and replace it as a test.

If that all fails - replace the system board !

Are you a little bit into electronics repairing ??
If yes you could test to get new cache chips and solder them onto the board.
Eventually there is a small chance that some solder points are bad eventually.

If that also fails the system is just a brick... :-(
I think there may be some interest of people here if you don't wanna keep it.
At least for the case with powersupply.
I would desolder or eventually take a saw to remove the cpu with heatsink to keep it as a display item, eventually this processor
will also fit in the AXPpci33 mainboard...

HTH - at least a little bit...

ajacocks · Aug 28, 2023

Very interesting. Cache seems as though it would be a likely source of machine check exceptions. I have a working VX40 and a VX41 (with the CPU from a VX42), and @coredump4 and I have been working on these Multias for many years.

- Alex

tradde · Aug 28, 2023

I have a VX40. The battery voltage was low so I removed it. So to boot it I have to go into the BIOS (or whatever setup is required) to make it boot. It has NT 4.0 on it. I really don't use it but can look at things for you if needed.

Wildfire · Aug 28, 2023

@coredump4

What happens when you try to boot it normally without the fsl floppy - is there any output over the serial console you can also post ?

coredump4 · Sep 12, 2023

Re: RAM, I used genuine DEC 8MB that came from other VX40s and tested good last year.

Looking at the machine closer, it had had E215 replaced due to Heat Death, but the repair wasn't clean. I'll re-do the repair and see if the machine checks persist.

Multia VX40 troubleshooting

coredump4

Member

Wildfire

Experienced Member

coredump4

Member

Wildfire

Experienced Member

ajacocks

VCForum Administrator

tradde

Veteran Member

Wildfire

Experienced Member

coredump4

Member