RVM

Crash during first major GC on PPC64 tests when running with -X:processors=2

Details

  • Number of attachments :
    1

Description

The stack appended below is a fairly wide-spread crash symptom on the latest ppc64-aix runs. (http://jikesrvm.anu.edu.au/cattrack/results/excalibur.watson.ibm.com/core-ppc64/3750). This particular one is in _213_javac, but very similar stacks are showing up as the cause of most of our failures.

I'm a little surprised that we think we're using addresses as high as 30cc7c57fffffff0, so my initial wild guess is that there is some 32 bit assumption in and around the code in MMtk in this stack trace. I'll do a little poking around to see if I can determine what is happening.

mem=30cc7c57fffffff0
fp=000000004000ea68
pr=00000000310a6d60
trap/exception: type=Segmentation fault
ip=00000000343342c8
instr=0x7c64182a
exn_handler=0000000034023af8
lr=0000000034299654
pthread_self=0000000000000001
JikesRVM: internal error trap

Fatal error: Unknown hardware trap within uninterruptible region.
Died in GC:
Exiting virtual machine due to uninterruptibility violation.

– Stack –
at [0x000000004000e960] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2044
at [0x000000004000e990] Lorg/jikesrvm/runtime/VM_Runtime; deliverHardwareException(II)V at line 773
at [0x000000004000ea50] <hardware trap>
at [0x000000004000ea68] Lorg/jikesrvm/objectmodel/VM_JavaHeader; readAvailableBitsWord(Ljava/lang/Object;)Lorg/vmmagic/unboxed/Word; at line 617
at [0x000000004000eaa0] Lorg/jikesrvm/objectmodel/VM_ObjectModel; readAvailableBitsWord(Ljava/lang/Object;)Lorg/vmmagic/unboxed/Word; at line 504
at [0x000000004000ead0] Lorg/jikesrvm/mm/mmtk/ObjectModel; readAvailableBitsWord(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/Word; at line 358
at [0x000000004000eb10] Lorg/mmtk/policy/LargeObjectSpace; isInNursery(Lorg/vmmagic/unboxed/ObjectReference;)Z at line 258
at [0x000000004000eb58] Lorg/mmtk/policy/LargeObjectSpace; traceObject(Lorg/mmtk/plan/TransitiveClosure;Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 166
at [0x000000004000ebc8] Lorg/mmtk/plan/generational/GenMatureTraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 116
at [0x000000004000ec20] Lorg/mmtk/plan/generational/marksweep/GenMSMatureTraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 55
at [0x000000004000ec70] Lorg/mmtk/plan/TraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;Z)Lorg/vmmagic/unboxed/ObjectReference; at line 301
at [0x000000004000ecc8] Lorg/mmtk/plan/TraceLocal; processRootEdge(Lorg/vmmagic/unboxed/Address;Z)V at line 123
at [0x000000004000ed48] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 140
at [0x000000004000ee28] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 79
at [0x000000004000eee8] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333
at [0x000000004000ef28] Lorg/mmtk/plan/generational/GenCollector; collectionPhase(SZ)V at line 99
at [0x000000004000ef88] Lorg/mmtk/plan/generational/marksweep/GenMSCollector; collectionPhase(SZ)V at line 143
at [0x000000004000efe8] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477
at [0x000000004000f0f8] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390
at [0x000000004000f140] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39
at [0x000000004000f170] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385
at [0x000000004000f300] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617

Activity

Hide
David Grove added a comment -

I've looked at 10 different failures. They all have the appended portion of the stack in common:

at [0x000000004001a168] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 140
at [0x000000004001a248] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 79
at [0x000000004001a308] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333

So, I think there's a very good chance that the problem is that the code that is building up references from the encoded bootimage map is not correct on 64 bit platforms. There are a couple of suspicious 4's and int/word conversions in org.jikesrvm.mm.mmtk.ScanBootImage.

Show
David Grove added a comment - I've looked at 10 different failures. They all have the appended portion of the stack in common: at [0x000000004001a168] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 140 at [0x000000004001a248] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 79 at [0x000000004001a308] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333 So, I think there's a very good chance that the problem is that the code that is building up references from the encoded bootimage map is not correct on 64 bit platforms. There are a couple of suspicious 4's and int/word conversions in org.jikesrvm.mm.mmtk.ScanBootImage.
Hide
Ian Rogers added a comment -

This perl script parses the root map as is done by boot image scanning.. it dumps a list of all the references in the root map. It may be informative to run this on the problematic root map.

Show
Ian Rogers added a comment - This perl script parses the root map as is done by boot image scanning.. it dumps a list of all the references in the root map. It may be informative to run this on the problematic root map.
Hide
Ian Rogers added a comment -

btw: if anyone thinks its worth distributing the rmap decompressor perl script then I'd support this. It's rough and ready at the moment. The uploaded version has constants from an x86_64 build.

Show
Ian Rogers added a comment - btw: if anyone thinks its worth distributing the rmap decompressor perl script then I'd support this. It's rough and ready at the moment. The uploaded version has constants from an x86_64 build.
Hide
David Grove added a comment -

The system appears fairly stable with 1 virtual processor (can reliably run 5 size100 iterations of _213_javac).

With 2 virtual processors, it reliably dies in the first major GC with the above stack trace.

Both tests on a prototype image.

Show
David Grove added a comment - The system appears fairly stable with 1 virtual processor (can reliably run 5 size100 iterations of _213_javac). With 2 virtual processors, it reliably dies in the first major GC with the above stack trace. Both tests on a prototype image.
Hide
Steve Blackburn added a comment -

I have looked at this a bit further with Dave and Daniel today. No big insights yet, I'm afraid.

Some info:

The LOS references on the failed stacks are probably red herrings given that traceObject will throw high addresses at the PLOS, so if the address were corrupted high it would end up in the LOS code.

It seems that the code is dereferencing a mis-aligned 8 byte address (30cc7c5800000000).

It seems that this is happening during boot image scanning, and therefore is most likely happening due to one of two bugs:
a) the boot image contains a mis-aligned pointer (which the scanning code is correctly passing on)
b) the scanning code is out by four and thus seeing a misaligned value

For b) to happen, it seems that the value would have to be misaligned back at line 129 and earlier of ScanBootImage, since after that point it is only incremented by BYTES_IN_ADDRESS. However, we don't fail on line 131, which suggestes that if b) is true, we're getting a little lucky (perhaps reading zero at 129?).

It would be a good idea to throw a few assertions into ScanBootImage.processChunk(). When I get the chance I'll do that. I'm still underwater right now though.

Show
Steve Blackburn added a comment - I have looked at this a bit further with Dave and Daniel today. No big insights yet, I'm afraid. Some info: The LOS references on the failed stacks are probably red herrings given that traceObject will throw high addresses at the PLOS, so if the address were corrupted high it would end up in the LOS code. It seems that the code is dereferencing a mis-aligned 8 byte address (30cc7c5800000000). It seems that this is happening during boot image scanning, and therefore is most likely happening due to one of two bugs: a) the boot image contains a mis-aligned pointer (which the scanning code is correctly passing on) b) the scanning code is out by four and thus seeing a misaligned value For b) to happen, it seems that the value would have to be misaligned back at line 129 and earlier of ScanBootImage, since after that point it is only incremented by BYTES_IN_ADDRESS. However, we don't fail on line 131, which suggestes that if b) is true, we're getting a little lucky (perhaps reading zero at 129?). It would be a good idea to throw a few assertions into ScanBootImage.processChunk(). When I get the chance I'll do that. I'm still underwater right now though.
Hide
Steve Blackburn added a comment -

One more thing...

Dave mentioned that the system ran OK with one virtual processor, but would reliably fail with > 1 virtual processor using the same build. This suggests that the boot image and map are correctly built, but who knows.

Show
Steve Blackburn added a comment - One more thing... Dave mentioned that the system ran OK with one virtual processor, but would reliably fail with > 1 virtual processor using the same build. This suggests that the boot image and map are correctly built, but who knows.
Hide
David Grove added a comment -

I ran with the assertions Steve checked in last night. No assertions tripped before the crash, which with 2 virtual processors happens in the first major collect.

[excalibur:/homes/excalibur/dgrove/SPECjvm98] ../buildit/rvm-trunk/dist/prototype_ppc64-aix/rvm -X:processors=2 -X:verbose -verbose:gc SpecApplication -s100 -m5 -M5 -a _213_javac
JikesRVM: loading from "../buildit/rvm-trunk/dist/prototype_ppc64-aix/RVM.data.image"
JikesRVM: loading from "../buildit/rvm-trunk/dist/prototype_ppc64-aix/RVM.code.image"
JikesRVM: loading from "../buildit/rvm-trunk/dist/prototype_ppc64-aix/RVM.rmap.image"
JikesRVM: setting linkage
JikesRVM: boot record contents:
bootImageDataStart: 0000000030000000
bootImageDataEnd: 0000000031ba1000
bootImageCodeStart: 0000000034000000
bootImageCodeEnd: 00000000343e5000
bootImageRMapStart: 0000000037000000
bootImageRMapEnd: 000000003704f000
initialHeapSize: 0000000001400000
maximumHeapSize: 0000000006400000
tiRegister: 0000000000040000
spRegister: 0000000030c9f148
ipRegister: 0000000034231c90
tocRegister: 0000000030040500
sysConsoleWriteCharIP:0000000110009790
Testing faulting-address location
save->except[0]=00000000dead1234
Done testing faulting-address location
JikesRVM: calling boot thread: jtoc = 0000000030040500 pr = 00000000310a1670 tid = 262144 fp = 0000000030c9f130
JikesRVM: sysPthreadSelf: thread 1
JikesRVM: pthread_create 0x00000102
JikesRVM: sysVirtualProcessorStartup: pr=0x0000000040000040 ip=0x00000000341b78c0 fp=0x000000004148f010
JikesRVM: sysPthreadSelf: thread 258
Using a time-slice of 10 ms
Speed will be 100
Will run each benchmark at least 5 times
Will run each benchmark at most 5 times
[Forced GC][GC 1 Start 1.79 s 10720KB -> 9736KB 880.14 ms]
Caching Off Speed = 100 Auto run mode

======= _213_javac Starting =======
Run 0 start. Total memory=20971520 free memory=10899456
Javac benchmark starting...
[GC 2 Start 4.43 s 15608KB -> 11884KB 312.68 ms]
[GC 3 Start 6.31 s 16288KB -> 13136KB 233.22 ms]
[GC 4 Start 7.89 s 16856KB -> 14868KB 399.03 ms]
[GC 5 Start 9.30 s 17728KB -> 16404KB 316.50 ms]
[GC 6 Start 10.36 s 18496KB -> 17516KB 196.34 ms]
[GC 7 Start 11.10 s 19020KB -> 18324KB 175.39 ms]
[GC 8 Start 11.61 s 19432KB -> 19068KB 116.10 ms]
[GC 9 Start 11.96 s 19788KB -> 19552KB 80.59 ms]
[GC 10 Start 12.20 s 20048KB -> 19916KB 79.64 ms]
[Full heap][GC 11 Start 12.28 s 19916KB save->except[0]=30ca19bffffffff0
mem=30ca19bffffffff0
fp=000000004000ea60
pr=00000000310a1670
trap/exception: type=Segmentation fault
ip=0000000034334570
instr=0x7c64182a
exn_handler=0000000034023af8
lr=0000000034297264
pthread_self=0000000000000001
JikesRVM: internal error trap
JikesRVM: unknown seg fault

Fatal error: Unknown hardware trap within uninterruptible region.
Died in GC:
Exiting virtual machine due to uninterruptibility violation.

– Stack –
at [0x000000004000e958] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2043
at [0x000000004000e988] Lorg/jikesrvm/runtime/VM_Runtime; deliverHardwareException(II)V at line 773
at [0x000000004000ea48] <hardware trap>
at [0x000000004000ea60] Lorg/jikesrvm/objectmodel/VM_JavaHeader; readAvailableBitsWord(Ljava/lang/Object;)Lorg/vmmagic/unboxed/Word; at line 617
at [0x000000004000ea98] Lorg/jikesrvm/objectmodel/VM_ObjectModel; readAvailableBitsWord(Ljava/lang/Object;)Lorg/vmmagic/unboxed/Word; at line 504
at [0x000000004000eac8] Lorg/jikesrvm/mm/mmtk/ObjectModel; readAvailableBitsWord(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/Word; at line 358
at [0x000000004000eb08] Lorg/mmtk/policy/LargeObjectSpace; isInNursery(Lorg/vmmagic/unboxed/ObjectReference;)Z at line 258
at [0x000000004000eb50] Lorg/mmtk/policy/LargeObjectSpace; traceObject(Lorg/mmtk/plan/TransitiveClosure;Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 166
at [0x000000004000ebc0] Lorg/mmtk/plan/generational/GenMatureTraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 116
at [0x000000004000ec18] Lorg/mmtk/plan/generational/marksweep/GenMSMatureTraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 55
at [0x000000004000ec68] Lorg/mmtk/plan/TraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;Z)Lorg/vmmagic/unboxed/ObjectReference; at line 301
at [0x000000004000ecc0] Lorg/mmtk/plan/TraceLocal; processRootEdge(Lorg/vmmagic/unboxed/Address;Z)V at line 123
at [0x000000004000ed40] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 142
at [0x000000004000ee20] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 79
at [0x000000004000eee0] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333
at [0x000000004000ef20] Lorg/mmtk/plan/generational/GenCollector; collectionPhase(SZ)V at line 99
at [0x000000004000ef80] Lorg/mmtk/plan/generational/marksweep/GenMSCollector; collectionPhase(SZ)V at line 143
at [0x000000004000efe0] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477
at [0x000000004000f0f0] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390
at [0x000000004000f138] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39
at [0x000000004000f168] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385
at [0x000000004000f2f8] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617

Show
David Grove added a comment - I ran with the assertions Steve checked in last night. No assertions tripped before the crash, which with 2 virtual processors happens in the first major collect. [excalibur:/homes/excalibur/dgrove/SPECjvm98] ../buildit/rvm-trunk/dist/prototype_ppc64-aix/rvm -X:processors=2 -X:verbose -verbose:gc SpecApplication -s100 -m5 -M5 -a _213_javac JikesRVM: loading from "../buildit/rvm-trunk/dist/prototype_ppc64-aix/RVM.data.image" JikesRVM: loading from "../buildit/rvm-trunk/dist/prototype_ppc64-aix/RVM.code.image" JikesRVM: loading from "../buildit/rvm-trunk/dist/prototype_ppc64-aix/RVM.rmap.image" JikesRVM: setting linkage JikesRVM: boot record contents: bootImageDataStart: 0000000030000000 bootImageDataEnd: 0000000031ba1000 bootImageCodeStart: 0000000034000000 bootImageCodeEnd: 00000000343e5000 bootImageRMapStart: 0000000037000000 bootImageRMapEnd: 000000003704f000 initialHeapSize: 0000000001400000 maximumHeapSize: 0000000006400000 tiRegister: 0000000000040000 spRegister: 0000000030c9f148 ipRegister: 0000000034231c90 tocRegister: 0000000030040500 sysConsoleWriteCharIP:0000000110009790 Testing faulting-address location save->except[0]=00000000dead1234 Done testing faulting-address location JikesRVM: calling boot thread: jtoc = 0000000030040500 pr = 00000000310a1670 tid = 262144 fp = 0000000030c9f130 JikesRVM: sysPthreadSelf: thread 1 JikesRVM: pthread_create 0x00000102 JikesRVM: sysVirtualProcessorStartup: pr=0x0000000040000040 ip=0x00000000341b78c0 fp=0x000000004148f010 JikesRVM: sysPthreadSelf: thread 258 Using a time-slice of 10 ms Speed will be 100 Will run each benchmark at least 5 times Will run each benchmark at most 5 times [Forced GC][GC 1 Start 1.79 s 10720KB -> 9736KB 880.14 ms] Caching Off Speed = 100 Auto run mode ======= _213_javac Starting ======= Run 0 start. Total memory=20971520 free memory=10899456 Javac benchmark starting... [GC 2 Start 4.43 s 15608KB -> 11884KB 312.68 ms] [GC 3 Start 6.31 s 16288KB -> 13136KB 233.22 ms] [GC 4 Start 7.89 s 16856KB -> 14868KB 399.03 ms] [GC 5 Start 9.30 s 17728KB -> 16404KB 316.50 ms] [GC 6 Start 10.36 s 18496KB -> 17516KB 196.34 ms] [GC 7 Start 11.10 s 19020KB -> 18324KB 175.39 ms] [GC 8 Start 11.61 s 19432KB -> 19068KB 116.10 ms] [GC 9 Start 11.96 s 19788KB -> 19552KB 80.59 ms] [GC 10 Start 12.20 s 20048KB -> 19916KB 79.64 ms] [Full heap][GC 11 Start 12.28 s 19916KB save->except[0]=30ca19bffffffff0 mem=30ca19bffffffff0 fp=000000004000ea60 pr=00000000310a1670 trap/exception: type=Segmentation fault ip=0000000034334570 instr=0x7c64182a exn_handler=0000000034023af8 lr=0000000034297264 pthread_self=0000000000000001 JikesRVM: internal error trap JikesRVM: unknown seg fault Fatal error: Unknown hardware trap within uninterruptible region. Died in GC: Exiting virtual machine due to uninterruptibility violation. – Stack – at [0x000000004000e958] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2043 at [0x000000004000e988] Lorg/jikesrvm/runtime/VM_Runtime; deliverHardwareException(II)V at line 773 at [0x000000004000ea48] <hardware trap> at [0x000000004000ea60] Lorg/jikesrvm/objectmodel/VM_JavaHeader; readAvailableBitsWord(Ljava/lang/Object;)Lorg/vmmagic/unboxed/Word; at line 617 at [0x000000004000ea98] Lorg/jikesrvm/objectmodel/VM_ObjectModel; readAvailableBitsWord(Ljava/lang/Object;)Lorg/vmmagic/unboxed/Word; at line 504 at [0x000000004000eac8] Lorg/jikesrvm/mm/mmtk/ObjectModel; readAvailableBitsWord(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/Word; at line 358 at [0x000000004000eb08] Lorg/mmtk/policy/LargeObjectSpace; isInNursery(Lorg/vmmagic/unboxed/ObjectReference;)Z at line 258 at [0x000000004000eb50] Lorg/mmtk/policy/LargeObjectSpace; traceObject(Lorg/mmtk/plan/TransitiveClosure;Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 166 at [0x000000004000ebc0] Lorg/mmtk/plan/generational/GenMatureTraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 116 at [0x000000004000ec18] Lorg/mmtk/plan/generational/marksweep/GenMSMatureTraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 55 at [0x000000004000ec68] Lorg/mmtk/plan/TraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;Z)Lorg/vmmagic/unboxed/ObjectReference; at line 301 at [0x000000004000ecc0] Lorg/mmtk/plan/TraceLocal; processRootEdge(Lorg/vmmagic/unboxed/Address;Z)V at line 123 at [0x000000004000ed40] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 142 at [0x000000004000ee20] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 79 at [0x000000004000eee0] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333 at [0x000000004000ef20] Lorg/mmtk/plan/generational/GenCollector; collectionPhase(SZ)V at line 99 at [0x000000004000ef80] Lorg/mmtk/plan/generational/marksweep/GenMSCollector; collectionPhase(SZ)V at line 143 at [0x000000004000efe0] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477 at [0x000000004000f0f0] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390 at [0x000000004000f138] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39 at [0x000000004000f168] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385 at [0x000000004000f2f8] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617
Hide
David Grove added a comment -

Another thing Steve and I noticed last night, sometimes the crash is while scanning the bootimage, other times it isn't. So it seems probable that the bootimage map is actually ok. There's something else going wrong that mysteriously is linked with having multiple virtual processors. It's unlikely to be PPC weak memory problems, since ppc32-aix is running just fine with multiple virtual processors. So perhaps something in the load balancing/work queue aspect of the GC that isn't 64 bit safe??

Show
David Grove added a comment - Another thing Steve and I noticed last night, sometimes the crash is while scanning the bootimage, other times it isn't. So it seems probable that the bootimage map is actually ok. There's something else going wrong that mysteriously is linked with having multiple virtual processors. It's unlikely to be PPC weak memory problems, since ppc32-aix is running just fine with multiple virtual processors. So perhaps something in the load balancing/work queue aspect of the GC that isn't 64 bit safe??
Hide
Steve Blackburn added a comment -

Those assertions should give us a very strong clue.

It seems that in the case of the stack trace which includes the boot image, only three things could now have happened (since we know the slot was aligned and yet the contents of it is misaligned when used a little while later):

a) *slot has bad stuff inside it
b) *slot is not correctly read from memory for some reason (it gets screwed up during slot.loadAddress())
c) the value gets screwed up by the ObjectReference.toObject() conversion.

I can try throwing in a few more assertions to narrow this down further.

--Steve

Show
Steve Blackburn added a comment - Those assertions should give us a very strong clue. It seems that in the case of the stack trace which includes the boot image, only three things could now have happened (since we know the slot was aligned and yet the contents of it is misaligned when used a little while later): a) *slot has bad stuff inside it b) *slot is not correctly read from memory for some reason (it gets screwed up during slot.loadAddress()) c) the value gets screwed up by the ObjectReference.toObject() conversion. I can try throwing in a few more assertions to narrow this down further. --Steve
Hide
Steve Blackburn added a comment -

Dave, can you please test against 14230 (just committed) and report what you see? I just added some assertions which should help further narrow things down.

Show
Steve Blackburn added a comment - Dave, can you please test against 14230 (just committed) and report what you see? I just added some assertions which should help further narrow things down.
Hide
David Grove added a comment -

using 14230 and a BaseBaseSemiSpace image, with 2 processors it crashes as below on first GC. With 1 processor it runs 5 iterations to completion. I'm going to try a BaseBaseMarkSweep next to see if it has something to do with the fixup after movement.

[excalibur:/homes/excalibur/dgrove/SPECjvm98] ../buildit/rvm-trunk/dist/BaseBaseSemiSpace_ppc64-aix/rvm -verbose:gc -X:processors=2 SpecApplication -s100 -m5 -M5 -a _213_javac
Speed will be 100
Will run each benchmark at least 5 times
Will run each benchmark at most 5 times
[Forced GC][GC 1 Start 1.67 s 10460KB validRef: REF outside heap, ref = 0x30c8974000000000
Key: (I)mmortal (N)onmoving (D)iscontiguous (E)xtent (F)raction
HEAP_START 0x0000000030000000
AVAILABLE_START 0x0000000035800000
boot IN 0x0000000030000000->0x000000003fffffff E 0x0000000010000000
immortal IND [0x0000000040000000->0x00000000403fffff]
meta ND [0x0000000042000000->0x00000000423fffff]
los ND [0x0000000041400000->0x00000000417fffff]
plos N 0x00000000c5000000->0x00000000cfffffff F 0.07
sanity ND []
non-moving ND [0x0000000041000000->0x00000000413fffff]
sm-code ND [0x0000000040800000->0x0000000040bfffff]
lg-code ND [0x0000000040c00000->0x0000000040ffffff]
ss0 D [0x0000000041800000->0x0000000041bfffff, 0x0000000040400000->0x00000000407fffff]
ss1 D [0x0000000041c00000->0x0000000041ffffff]
AVAILABLE_END 0x00000000d0000000
HEAP_END 0x00000000d0000000
Died in GC:
vm internal error at:

– Stack –
at [0x0000000040019870] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2043
at [0x00000000400198a0] Lorg/jikesrvm/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 551
at [0x00000000400198f8] Lorg/jikesrvm/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 534
at [0x0000000040019950] Lorg/jikesrvm/VM; _assert(Z)V at line 512
at [0x0000000040019990] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 140
at [0x0000000040019a70] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 80
at [0x0000000040019b30] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333
at [0x0000000040019b70] Lorg/mmtk/plan/SimpleCollector; collectionPhase(SZ)V at line 87
at [0x0000000040019bd0] Lorg/mmtk/plan/semispace/SSCollector; collectionPhase(SZ)V at line 140
at [0x0000000040019c30] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477
at [0x0000000040019d40] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390
at [0x0000000040019d88] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39
at [0x0000000040019db8] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385
at [0x0000000040019f48] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617
Virtual machine state:

– Processors –
Processor 1 running thread4-daemon-collector-RUNNABLE
system thread id 1
transferQueue:
readyQueue:
ioQueue:
processWaitQueue:
idleQueue:Proc 2: Thread 5: VM.sysFail(): We're in a (likely) recursive call to VM.sysFail(), 2 deep
sysFail was called with the message: vm internal error at:
Died in GC:
vm internal error at:

– Stack –
at [0x0000000040019488] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2043
at [0x00000000400194b8] Lorg/jikesrvm/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 551
at [0x0000000040019510] Lorg/jikesrvm/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 534
at [0x0000000040019568] Lorg/jikesrvm/VM; _assert(Z)V at line 512
at [0x00000000400195a8] Lorg/jikesrvm/scheduler/VM_Thread; getIndex()I at line 1143
at [0x00000000400195e0] Lorg/jikesrvm/scheduler/VM_Thread; dump([CI)I at line 1936
at [0x0000000040019660] Lorg/jikesrvm/scheduler/VM_Thread; dump([C)I at line 1978
at [0x00000000400196b0] Lorg/jikesrvm/scheduler/VM_Thread; dump(I)V at line 1904
at [0x0000000040019718] Lorg/jikesrvm/scheduler/VM_Thread; dump()V at line 1869
at [0x0000000040019750] Lorg/jikesrvm/scheduler/greenthreads/VM_GreenThreadQueue; dump()V at line 244
at [0x00000000400197a0] Lorg/jikesrvm/scheduler/greenthreads/VM_GreenProcessor; dumpProcessorState()V at line 603
at [0x00000000400197e8] Lorg/jikesrvm/scheduler/greenthreads/VM_GreenScheduler; dumpVirtualMachineInternal()V at line 457
at [0x0000000040019850] Lorg/jikesrvm/scheduler/VM_Scheduler; dumpVirtualMachine()V at line 239
at [0x0000000040019870] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2045
at [0x00000000400198a0] Lorg/jikesrvm/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 551
at [0x00000000400198f8] Lorg/jikesrvm/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 534
at [0x0000000040019950] Lorg/jikesrvm/VM; _assert(Z)V at line 512
at [0x0000000040019990] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 140
at [0x0000000040019a70] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 80
at [0x0000000040019b30] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333
at [0x0000000040019b70] Lorg/mmtk/plan/SimpleCollector; collectionPhase(SZ)V at line 87
at [0x0000000040019bd0] Lorg/mmtk/plan/semispace/SSCollector; collectionPhase(SZ)V at line 140
at [0x0000000040019c30] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477
at [0x0000000040019d40] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390
at [0x0000000040019d88] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39
at [0x0000000040019db8] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385
at [0x0000000040019f48] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617
Virtual machine state:

Show
David Grove added a comment - using 14230 and a BaseBaseSemiSpace image, with 2 processors it crashes as below on first GC. With 1 processor it runs 5 iterations to completion. I'm going to try a BaseBaseMarkSweep next to see if it has something to do with the fixup after movement. [excalibur:/homes/excalibur/dgrove/SPECjvm98] ../buildit/rvm-trunk/dist/BaseBaseSemiSpace_ppc64-aix/rvm -verbose:gc -X:processors=2 SpecApplication -s100 -m5 -M5 -a _213_javac Speed will be 100 Will run each benchmark at least 5 times Will run each benchmark at most 5 times [Forced GC][GC 1 Start 1.67 s 10460KB validRef: REF outside heap, ref = 0x30c8974000000000 Key: (I)mmortal (N)onmoving (D)iscontiguous (E)xtent (F)raction HEAP_START 0x0000000030000000 AVAILABLE_START 0x0000000035800000 boot IN 0x0000000030000000->0x000000003fffffff E 0x0000000010000000 immortal IND [0x0000000040000000->0x00000000403fffff] meta ND [0x0000000042000000->0x00000000423fffff] los ND [0x0000000041400000->0x00000000417fffff] plos N 0x00000000c5000000->0x00000000cfffffff F 0.07 sanity ND [] non-moving ND [0x0000000041000000->0x00000000413fffff] sm-code ND [0x0000000040800000->0x0000000040bfffff] lg-code ND [0x0000000040c00000->0x0000000040ffffff] ss0 D [0x0000000041800000->0x0000000041bfffff, 0x0000000040400000->0x00000000407fffff] ss1 D [0x0000000041c00000->0x0000000041ffffff] AVAILABLE_END 0x00000000d0000000 HEAP_END 0x00000000d0000000 Died in GC: vm internal error at: – Stack – at [0x0000000040019870] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2043 at [0x00000000400198a0] Lorg/jikesrvm/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 551 at [0x00000000400198f8] Lorg/jikesrvm/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 534 at [0x0000000040019950] Lorg/jikesrvm/VM; _assert(Z)V at line 512 at [0x0000000040019990] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 140 at [0x0000000040019a70] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 80 at [0x0000000040019b30] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333 at [0x0000000040019b70] Lorg/mmtk/plan/SimpleCollector; collectionPhase(SZ)V at line 87 at [0x0000000040019bd0] Lorg/mmtk/plan/semispace/SSCollector; collectionPhase(SZ)V at line 140 at [0x0000000040019c30] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477 at [0x0000000040019d40] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390 at [0x0000000040019d88] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39 at [0x0000000040019db8] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385 at [0x0000000040019f48] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617 Virtual machine state: – Processors – Processor 1 running thread4-daemon-collector-RUNNABLE system thread id 1 transferQueue: readyQueue: ioQueue: processWaitQueue: idleQueue:Proc 2: Thread 5: VM.sysFail(): We're in a (likely) recursive call to VM.sysFail(), 2 deep sysFail was called with the message: vm internal error at: Died in GC: vm internal error at: – Stack – at [0x0000000040019488] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2043 at [0x00000000400194b8] Lorg/jikesrvm/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 551 at [0x0000000040019510] Lorg/jikesrvm/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 534 at [0x0000000040019568] Lorg/jikesrvm/VM; _assert(Z)V at line 512 at [0x00000000400195a8] Lorg/jikesrvm/scheduler/VM_Thread; getIndex()I at line 1143 at [0x00000000400195e0] Lorg/jikesrvm/scheduler/VM_Thread; dump([CI)I at line 1936 at [0x0000000040019660] Lorg/jikesrvm/scheduler/VM_Thread; dump([C)I at line 1978 at [0x00000000400196b0] Lorg/jikesrvm/scheduler/VM_Thread; dump(I)V at line 1904 at [0x0000000040019718] Lorg/jikesrvm/scheduler/VM_Thread; dump()V at line 1869 at [0x0000000040019750] Lorg/jikesrvm/scheduler/greenthreads/VM_GreenThreadQueue; dump()V at line 244 at [0x00000000400197a0] Lorg/jikesrvm/scheduler/greenthreads/VM_GreenProcessor; dumpProcessorState()V at line 603 at [0x00000000400197e8] Lorg/jikesrvm/scheduler/greenthreads/VM_GreenScheduler; dumpVirtualMachineInternal()V at line 457 at [0x0000000040019850] Lorg/jikesrvm/scheduler/VM_Scheduler; dumpVirtualMachine()V at line 239 at [0x0000000040019870] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2045 at [0x00000000400198a0] Lorg/jikesrvm/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 551 at [0x00000000400198f8] Lorg/jikesrvm/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 534 at [0x0000000040019950] Lorg/jikesrvm/VM; _assert(Z)V at line 512 at [0x0000000040019990] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 140 at [0x0000000040019a70] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 80 at [0x0000000040019b30] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333 at [0x0000000040019b70] Lorg/mmtk/plan/SimpleCollector; collectionPhase(SZ)V at line 87 at [0x0000000040019bd0] Lorg/mmtk/plan/semispace/SSCollector; collectionPhase(SZ)V at line 140 at [0x0000000040019c30] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477 at [0x0000000040019d40] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390 at [0x0000000040019d88] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39 at [0x0000000040019db8] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385 at [0x0000000040019f48] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617 Virtual machine state:
Hide
David Grove added a comment -

MarkSweep results are the same as SemiSpace. I broke the 2 parts of the assertion at line 140 into line 140 and 141. It's the second part that is failing (slot.loadObjectReference is a validRef).

[excalibur:/homes/excalibur/dgrove/SPECjvm98] ../buildit/rvm-trunk/dist/BaseBaseMarkSweep_ppc64-aix/rvm -verbose:gc -X:processors=2 SpecApplication -s100 -m5 -M5 -a _213_javac
Speed will be 100
Will run each benchmark at least 5 times
Will run each benchmark at most 5 times
[Forced GC][GC 1 Start 1.88 s 12008KB validRef: REF outside heap, ref = 0x30c81e6000000000
Key: (I)mmortal (N)onmoving (D)iscontiguous (E)xtent (F)raction
HEAP_START 0x0000000030000000
AVAILABLE_START 0x0000000035800000
boot IN 0x0000000030000000->0x000000003fffffff E 0x0000000010000000
immortal IND [0x0000000040000000->0x00000000403fffff]
meta ND [0x0000000042000000->0x00000000423fffff]
los ND [0x0000000041400000->0x00000000417fffff]
plos N 0x00000000c5000000->0x00000000cfffffff F 0.07
sanity ND []
non-moving ND [0x0000000041000000->0x00000000413fffff]
sm-code ND [0x0000000040800000->0x0000000040bfffff]
lg-code ND [0x0000000040c00000->0x0000000040ffffff]
ms ND [0x0000000041c00000->0x0000000041ffffff, 0x0000000041800000->0x0000000041bfffff, 0x0000000040400000->0x00000000407fffff]
AVAILABLE_END 0x00000000d0000000
HEAP_END 0x00000000d0000000
Died in GC:
vm internal error at:

– Stack –
at [0x0000000040019da0] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2043
at [0x0000000040019dd0] Lorg/jikesrvm/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 551
at [0x0000000040019e28] Lorg/jikesrvm/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 534
at [0x0000000040019e80] Lorg/jikesrvm/VM; _assert(Z)V at line 512
at [0x0000000040019ec0] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 141
at [0x0000000040019fa0] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 80
at [0x000000004001a060] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333
at [0x000000004001a0a0] Lorg/mmtk/plan/SimpleCollector; collectionPhase(SZ)V at line 87
at [0x000000004001a100] Lorg/mmtk/plan/marksweep/MSCollector; collectionPhase(SZ)V at line 96
at [0x000000004001a160] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477
at [0x000000004001a270] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390
at [0x000000004001a2b8] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39
at [0x000000004001a2e8] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385
at [0x000000004001a478] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617
Virtual machine state:

Show
David Grove added a comment - MarkSweep results are the same as SemiSpace. I broke the 2 parts of the assertion at line 140 into line 140 and 141. It's the second part that is failing (slot.loadObjectReference is a validRef). [excalibur:/homes/excalibur/dgrove/SPECjvm98] ../buildit/rvm-trunk/dist/BaseBaseMarkSweep_ppc64-aix/rvm -verbose:gc -X:processors=2 SpecApplication -s100 -m5 -M5 -a _213_javac Speed will be 100 Will run each benchmark at least 5 times Will run each benchmark at most 5 times [Forced GC][GC 1 Start 1.88 s 12008KB validRef: REF outside heap, ref = 0x30c81e6000000000 Key: (I)mmortal (N)onmoving (D)iscontiguous (E)xtent (F)raction HEAP_START 0x0000000030000000 AVAILABLE_START 0x0000000035800000 boot IN 0x0000000030000000->0x000000003fffffff E 0x0000000010000000 immortal IND [0x0000000040000000->0x00000000403fffff] meta ND [0x0000000042000000->0x00000000423fffff] los ND [0x0000000041400000->0x00000000417fffff] plos N 0x00000000c5000000->0x00000000cfffffff F 0.07 sanity ND [] non-moving ND [0x0000000041000000->0x00000000413fffff] sm-code ND [0x0000000040800000->0x0000000040bfffff] lg-code ND [0x0000000040c00000->0x0000000040ffffff] ms ND [0x0000000041c00000->0x0000000041ffffff, 0x0000000041800000->0x0000000041bfffff, 0x0000000040400000->0x00000000407fffff] AVAILABLE_END 0x00000000d0000000 HEAP_END 0x00000000d0000000 Died in GC: vm internal error at: – Stack – at [0x0000000040019da0] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2043 at [0x0000000040019dd0] Lorg/jikesrvm/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 551 at [0x0000000040019e28] Lorg/jikesrvm/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 534 at [0x0000000040019e80] Lorg/jikesrvm/VM; _assert(Z)V at line 512 at [0x0000000040019ec0] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 141 at [0x0000000040019fa0] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 80 at [0x000000004001a060] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333 at [0x000000004001a0a0] Lorg/mmtk/plan/SimpleCollector; collectionPhase(SZ)V at line 87 at [0x000000004001a100] Lorg/mmtk/plan/marksweep/MSCollector; collectionPhase(SZ)V at line 96 at [0x000000004001a160] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477 at [0x000000004001a270] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390 at [0x000000004001a2b8] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39 at [0x000000004001a2e8] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385 at [0x000000004001a478] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617 Virtual machine state:
Hide
David Grove added a comment -

It might be a red herring, but according to the bootimage maps the upper 4 bytes of the bad reference correspond to the same string literal.

in the BaseBaseMarkSweep image:
68166 0x0000000000002918 literal 0x0000000030c81e60 "gc thread transfer"

and in the BaseBaseSemiSpace image:
68170 0x0000000000002928 literal 0x0000000030c89740 "gc thread transfer"

Show
David Grove added a comment - It might be a red herring, but according to the bootimage maps the upper 4 bytes of the bad reference correspond to the same string literal. in the BaseBaseMarkSweep image: 68166 0x0000000000002918 literal 0x0000000030c81e60 "gc thread transfer" and in the BaseBaseSemiSpace image: 68170 0x0000000000002928 literal 0x0000000030c89740 "gc thread transfer"
Hide
David Grove added a comment -

hmmm...and that only place that string literal shows up in our source code base is:

private void transferThread(VM_GreenThread t) {
if (t.isGCThread()) {
collectorThreadMutex.lock("gc thread transfer");
if (verbose>0) VM.sysWriteln("setting collectorThread to ",t.getIndex()," in GP.transferThread for ",id);
collectorThread = t;

Wonder if perhaps a magic being used to manipulate the mutex isn't right in 64 bits. Seems really far fetched but it might explain why the crash only shows up when we have more than 1 virtual processor in the mix.

Show
David Grove added a comment - hmmm...and that only place that string literal shows up in our source code base is: private void transferThread(VM_GreenThread t) { if (t.isGCThread()) { collectorThreadMutex.lock("gc thread transfer"); if (verbose>0) VM.sysWriteln("setting collectorThread to ",t.getIndex()," in GP.transferThread for ",id); collectorThread = t; Wonder if perhaps a magic being used to manipulate the mutex isn't right in 64 bits. Seems really far fetched but it might explain why the crash only shows up when we have more than 1 virtual processor in the mix.
Hide
David Grove added a comment -

Bang: bogus code found in VM_Processor

private final String[] lockReasons = VM.VerifyAssertions ? new String[100] : null;

public void registerLock(String reason) { VM_Magic.setObjectAtOffset(lockReasons, Offset.fromIntSignExtend(lockCount<<2), reason); lockCount ++; }

Show
David Grove added a comment - Bang: bogus code found in VM_Processor private final String[] lockReasons = VM.VerifyAssertions ? new String[100] : null; public void registerLock(String reason) { VM_Magic.setObjectAtOffset(lockReasons, Offset.fromIntSignExtend(lockCount<<2), reason); lockCount ++; }
Hide
David Grove added a comment -

Fix committed in r14231.

Thanks for the help Steve. Once it became clear from the assertions you added that the problem was that value in the bootimage itself was wrong, I had the idea of looking at RVM.map and then got lucky....

I left the 14230 assertions in place; not sure if you wanted to pull them back out or leave them in.

Show
David Grove added a comment - Fix committed in r14231. Thanks for the help Steve. Once it became clear from the assertions you added that the problem was that value in the bootimage itself was wrong, I had the idea of looking at RVM.map and then got lucky.... I left the 14230 assertions in place; not sure if you wanted to pull them back out or leave them in.
Hide
David Grove added a comment -

reopening so I can modify fix target to 3.0

Show
David Grove added a comment - reopening so I can modify fix target to 3.0

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: