Issue Details (XML | Word | Printable)

Key: RVM-495
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Unassigned
Reporter: David Grove
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
RVM

Crash during first major GC on PPC64 tests when running with -X:processors=2

Created: 14/May/08 07:49 AM   Updated: 04/Jul/08 02:24 PM
Component/s: Instruction Architecture: 64bit
Affects Version/s: 2.9.3
Fix Version/s: 3.0

Time Tracking:
Not Specified

File Attachments: 1. File rmap_reader.pl (2 kB)



 Description  « Hide
The stack appended below is a fairly wide-spread crash symptom on the latest ppc64-aix runs. (http://jikesrvm.anu.edu.au/cattrack/results/excalibur.watson.ibm.com/core-ppc64/3750). This particular one is in _213_javac, but very similar stacks are showing up as the cause of most of our failures.

I'm a little surprised that we think we're using addresses as high as 30cc7c57fffffff0, so my initial wild guess is that there is some 32 bit assumption in and around the code in MMtk in this stack trace. I'll do a little poking around to see if I can determine what is happening.

mem=30cc7c57fffffff0
fp=000000004000ea68
pr=00000000310a6d60
trap/exception: type=Segmentation fault
ip=00000000343342c8
instr=0x7c64182a
exn_handler=0000000034023af8
lr=0000000034299654
pthread_self=0000000000000001
JikesRVM: internal error trap

Fatal error: Unknown hardware trap within uninterruptible region.
Died in GC:
Exiting virtual machine due to uninterruptibility violation.

– Stack –
at [0x000000004000e960] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2044
at [0x000000004000e990] Lorg/jikesrvm/runtime/VM_Runtime; deliverHardwareException(II)V at line 773
at [0x000000004000ea50] <hardware trap>
at [0x000000004000ea68] Lorg/jikesrvm/objectmodel/VM_JavaHeader; readAvailableBitsWord(Ljava/lang/Object;)Lorg/vmmagic/unboxed/Word; at line 617
at [0x000000004000eaa0] Lorg/jikesrvm/objectmodel/VM_ObjectModel; readAvailableBitsWord(Ljava/lang/Object;)Lorg/vmmagic/unboxed/Word; at line 504
at [0x000000004000ead0] Lorg/jikesrvm/mm/mmtk/ObjectModel; readAvailableBitsWord(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/Word; at line 358
at [0x000000004000eb10] Lorg/mmtk/policy/LargeObjectSpace; isInNursery(Lorg/vmmagic/unboxed/ObjectReference;)Z at line 258
at [0x000000004000eb58] Lorg/mmtk/policy/LargeObjectSpace; traceObject(Lorg/mmtk/plan/TransitiveClosure;Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 166
at [0x000000004000ebc8] Lorg/mmtk/plan/generational/GenMatureTraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 116
at [0x000000004000ec20] Lorg/mmtk/plan/generational/marksweep/GenMSMatureTraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 55
at [0x000000004000ec70] Lorg/mmtk/plan/TraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;Z)Lorg/vmmagic/unboxed/ObjectReference; at line 301
at [0x000000004000ecc8] Lorg/mmtk/plan/TraceLocal; processRootEdge(Lorg/vmmagic/unboxed/Address;Z)V at line 123
at [0x000000004000ed48] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 140
at [0x000000004000ee28] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 79
at [0x000000004000eee8] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333
at [0x000000004000ef28] Lorg/mmtk/plan/generational/GenCollector; collectionPhase(SZ)V at line 99
at [0x000000004000ef88] Lorg/mmtk/plan/generational/marksweep/GenMSCollector; collectionPhase(SZ)V at line 143
at [0x000000004000efe8] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477
at [0x000000004000f0f8] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390
at [0x000000004000f140] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39
at [0x000000004000f170] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385
at [0x000000004000f300] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
David Grove added a comment - 14/May/08 08:16 AM
I've looked at 10 different failures. They all have the appended portion of the stack in common:

at [0x000000004001a168] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 140
at [0x000000004001a248] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 79
at [0x000000004001a308] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333

So, I think there's a very good chance that the problem is that the code that is building up references from the encoded bootimage map is not correct on 64 bit platforms. There are a couple of suspicious 4's and int/word conversions in org.jikesrvm.mm.mmtk.ScanBootImage.


Ian Rogers added a comment - 14/May/08 08:30 AM
This perl script parses the root map as is done by boot image scanning.. it dumps a list of all the references in the root map. It may be informative to run this on the problematic root map.

Ian Rogers added a comment - 14/May/08 10:42 AM
btw: if anyone thinks its worth distributing the rmap decompressor perl script then I'd support this. It's rough and ready at the moment. The uploaded version has constants from an x86_64 build.

David Grove added a comment - 14/May/08 12:24 PM
The system appears fairly stable with 1 virtual processor (can reliably run 5 size100 iterations of _213_javac).

With 2 virtual processors, it reliably dies in the first major GC with the above stack trace.

Both tests on a prototype image.


Steve Blackburn added a comment - 16/May/08 01:07 AM
I have looked at this a bit further with Dave and Daniel today. No big insights yet, I'm afraid.

Some info:

The LOS references on the failed stacks are probably red herrings given that traceObject will throw high addresses at the PLOS, so if the address were corrupted high it would end up in the LOS code.

It seems that the code is dereferencing a mis-aligned 8 byte address (30cc7c5800000000).

It seems that this is happening during boot image scanning, and therefore is most likely happening due to one of two bugs:
a) the boot image contains a mis-aligned pointer (which the scanning code is correctly passing on)
b) the scanning code is out by four and thus seeing a misaligned value

For b) to happen, it seems that the value would have to be misaligned back at line 129 and earlier of ScanBootImage, since after that point it is only incremented by BYTES_IN_ADDRESS. However, we don't fail on line 131, which suggestes that if b) is true, we're getting a little lucky (perhaps reading zero at 129?).

It would be a good idea to throw a few assertions into ScanBootImage.processChunk(). When I get the chance I'll do that. I'm still underwater right now though.


Steve Blackburn added a comment - 16/May/08 01:08 AM
One more thing...

Dave mentioned that the system ran OK with one virtual processor, but would reliably fail with > 1 virtual processor using the same build. This suggests that the boot image and map are correctly built, but who knows.


David Grove added a comment - 16/May/08 02:50 PM
I ran with the assertions Steve checked in last night. No assertions tripped before the crash, which with 2 virtual processors happens in the first major collect.

[excalibur:/homes/excalibur/dgrove/SPECjvm98] ../buildit/rvm-trunk/dist/prototype_ppc64-aix/rvm -X:processors=2 -X:verbose -verbose:gc SpecApplication -s100 -m5 -M5 -a _213_javac
JikesRVM: loading from "../buildit/rvm-trunk/dist/prototype_ppc64-aix/RVM.data.image"
JikesRVM: loading from "../buildit/rvm-trunk/dist/prototype_ppc64-aix/RVM.code.image"
JikesRVM: loading from "../buildit/rvm-trunk/dist/prototype_ppc64-aix/RVM.rmap.image"
JikesRVM: setting linkage
JikesRVM: boot record contents:
bootImageDataStart: 0000000030000000
bootImageDataEnd: 0000000031ba1000
bootImageCodeStart: 0000000034000000
bootImageCodeEnd: 00000000343e5000
bootImageRMapStart: 0000000037000000
bootImageRMapEnd: 000000003704f000
initialHeapSize: 0000000001400000
maximumHeapSize: 0000000006400000
tiRegister: 0000000000040000
spRegister: 0000000030c9f148
ipRegister: 0000000034231c90
tocRegister: 0000000030040500
sysConsoleWriteCharIP:0000000110009790
Testing faulting-address location
save->except[0]=00000000dead1234
Done testing faulting-address location
JikesRVM: calling boot thread: jtoc = 0000000030040500 pr = 00000000310a1670 tid = 262144 fp = 0000000030c9f130
JikesRVM: sysPthreadSelf: thread 1
JikesRVM: pthread_create 0x00000102
JikesRVM: sysVirtualProcessorStartup: pr=0x0000000040000040 ip=0x00000000341b78c0 fp=0x000000004148f010
JikesRVM: sysPthreadSelf: thread 258
Using a time-slice of 10 ms
Speed will be 100
Will run each benchmark at least 5 times
Will run each benchmark at most 5 times
[Forced GC][GC 1 Start 1.79 s 10720KB -> 9736KB 880.14 ms]
Caching Off Speed = 100 Auto run mode

======= _213_javac Starting =======
Run 0 start. Total memory=20971520 free memory=10899456
Javac benchmark starting...
[GC 2 Start 4.43 s 15608KB -> 11884KB 312.68 ms]
[GC 3 Start 6.31 s 16288KB -> 13136KB 233.22 ms]
[GC 4 Start 7.89 s 16856KB -> 14868KB 399.03 ms]
[GC 5 Start 9.30 s 17728KB -> 16404KB 316.50 ms]
[GC 6 Start 10.36 s 18496KB -> 17516KB 196.34 ms]
[GC 7 Start 11.10 s 19020KB -> 18324KB 175.39 ms]
[GC 8 Start 11.61 s 19432KB -> 19068KB 116.10 ms]
[GC 9 Start 11.96 s 19788KB -> 19552KB 80.59 ms]
[GC 10 Start 12.20 s 20048KB -> 19916KB 79.64 ms]
[Full heap][GC 11 Start 12.28 s 19916KB save->except[0]=30ca19bffffffff0
mem=30ca19bffffffff0
fp=000000004000ea60
pr=00000000310a1670
trap/exception: type=Segmentation fault
ip=0000000034334570
instr=0x7c64182a
exn_handler=0000000034023af8
lr=0000000034297264
pthread_self=0000000000000001
JikesRVM: internal error trap
JikesRVM: unknown seg fault

Fatal error: Unknown hardware trap within uninterruptible region.
Died in GC:
Exiting virtual machine due to uninterruptibility violation.

– Stack –
at [0x000000004000e958] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2043
at [0x000000004000e988] Lorg/jikesrvm/runtime/VM_Runtime; deliverHardwareException(II)V at line 773
at [0x000000004000ea48] <hardware trap>
at [0x000000004000ea60] Lorg/jikesrvm/objectmodel/VM_JavaHeader; readAvailableBitsWord(Ljava/lang/Object;)Lorg/vmmagic/unboxed/Word; at line 617
at [0x000000004000ea98] Lorg/jikesrvm/objectmodel/VM_ObjectModel; readAvailableBitsWord(Ljava/lang/Object;)Lorg/vmmagic/unboxed/Word; at line 504
at [0x000000004000eac8] Lorg/jikesrvm/mm/mmtk/ObjectModel; readAvailableBitsWord(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/Word; at line 358
at [0x000000004000eb08] Lorg/mmtk/policy/LargeObjectSpace; isInNursery(Lorg/vmmagic/unboxed/ObjectReference;)Z at line 258
at [0x000000004000eb50] Lorg/mmtk/policy/LargeObjectSpace; traceObject(Lorg/mmtk/plan/TransitiveClosure;Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 166
at [0x000000004000ebc0] Lorg/mmtk/plan/generational/GenMatureTraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 116
at [0x000000004000ec18] Lorg/mmtk/plan/generational/marksweep/GenMSMatureTraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 55
at [0x000000004000ec68] Lorg/mmtk/plan/TraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;Z)Lorg/vmmagic/unboxed/ObjectReference; at line 301
at [0x000000004000ecc0] Lorg/mmtk/plan/TraceLocal; processRootEdge(Lorg/vmmagic/unboxed/Address;Z)V at line 123
at [0x000000004000ed40] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 142
at [0x000000004000ee20] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 79
at [0x000000004000eee0] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333
at [0x000000004000ef20] Lorg/mmtk/plan/generational/GenCollector; collectionPhase(SZ)V at line 99
at [0x000000004000ef80] Lorg/mmtk/plan/generational/marksweep/GenMSCollector; collectionPhase(SZ)V at line 143
at [0x000000004000efe0] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477
at [0x000000004000f0f0] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390
at [0x000000004000f138] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39
at [0x000000004000f168] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385
at [0x000000004000f2f8] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617


David Grove added a comment - 16/May/08 02:54 PM
Another thing Steve and I noticed last night, sometimes the crash is while scanning the bootimage, other times it isn't. So it seems probable that the bootimage map is actually ok. There's something else going wrong that mysteriously is linked with having multiple virtual processors. It's unlikely to be PPC weak memory problems, since ppc32-aix is running just fine with multiple virtual processors. So perhaps something in the load balancing/work queue aspect of the GC that isn't 64 bit safe??

Steve Blackburn added a comment - 16/May/08 05:20 PM
Those assertions should give us a very strong clue.

It seems that in the case of the stack trace which includes the boot image, only three things could now have happened (since we know the slot was aligned and yet the contents of it is misaligned when used a little while later):

a) *slot has bad stuff inside it
b) *slot is not correctly read from memory for some reason (it gets screwed up during slot.loadAddress())
c) the value gets screwed up by the ObjectReference.toObject() conversion.

I can try throwing in a few more assertions to narrow this down further.

--Steve


Steve Blackburn added a comment - 16/May/08 05:54 PM
Dave, can you please test against 14230 (just committed) and report what you see? I just added some assertions which should help further narrow things down.

David Grove added a comment - 16/May/08 09:38 PM
using 14230 and a BaseBaseSemiSpace image, with 2 processors it crashes as below on first GC. With 1 processor it runs 5 iterations to completion. I'm going to try a BaseBaseMarkSweep next to see if it has something to do with the fixup after movement.

[excalibur:/homes/excalibur/dgrove/SPECjvm98] ../buildit/rvm-trunk/dist/BaseBaseSemiSpace_ppc64-aix/rvm -verbose:gc -X:processors=2 SpecApplication -s100 -m5 -M5 -a _213_javac
Speed will be 100
Will run each benchmark at least 5 times
Will run each benchmark at most 5 times
[Forced GC][GC 1 Start 1.67 s 10460KB validRef: REF outside heap, ref = 0x30c8974000000000
Key: (I)mmortal (N)onmoving (D)iscontiguous (E)xtent (F)raction
HEAP_START 0x0000000030000000
AVAILABLE_START 0x0000000035800000
boot IN 0x0000000030000000->0x000000003fffffff E 0x0000000010000000
immortal IND [0x0000000040000000->0x00000000403fffff]
meta ND [0x0000000042000000->0x00000000423fffff]
los ND [0x0000000041400000->0x00000000417fffff]
plos N 0x00000000c5000000->0x00000000cfffffff F 0.07
sanity ND []
non-moving ND [0x0000000041000000->0x00000000413fffff]
sm-code ND [0x0000000040800000->0x0000000040bfffff]
lg-code ND [0x0000000040c00000->0x0000000040ffffff]
ss0 D [0x0000000041800000->0x0000000041bfffff, 0x0000000040400000->0x00000000407fffff]
ss1 D [0x0000000041c00000->0x0000000041ffffff]
AVAILABLE_END 0x00000000d0000000
HEAP_END 0x00000000d0000000
Died in GC:
vm internal error at:

– Stack –
at [0x0000000040019870] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2043
at [0x00000000400198a0] Lorg/jikesrvm/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 551
at [0x00000000400198f8] Lorg/jikesrvm/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 534
at [0x0000000040019950] Lorg/jikesrvm/VM; _assert(Z)V at line 512
at [0x0000000040019990] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 140
at [0x0000000040019a70] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 80
at [0x0000000040019b30] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333
at [0x0000000040019b70] Lorg/mmtk/plan/SimpleCollector; collectionPhase(SZ)V at line 87
at [0x0000000040019bd0] Lorg/mmtk/plan/semispace/SSCollector; collectionPhase(SZ)V at line 140
at [0x0000000040019c30] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477
at [0x0000000040019d40] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390
at [0x0000000040019d88] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39
at [0x0000000040019db8] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385
at [0x0000000040019f48] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617
Virtual machine state:

– Processors –
Processor 1 running thread4-daemon-collector-RUNNABLE
system thread id 1
transferQueue:
readyQueue:
ioQueue:
processWaitQueue:
idleQueue:Proc 2: Thread 5: VM.sysFail(): We're in a (likely) recursive call to VM.sysFail(), 2 deep
sysFail was called with the message: vm internal error at:
Died in GC:
vm internal error at:

– Stack –
at [0x0000000040019488] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2043
at [0x00000000400194b8] Lorg/jikesrvm/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 551
at [0x0000000040019510] Lorg/jikesrvm/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 534
at [0x0000000040019568] Lorg/jikesrvm/VM; _assert(Z)V at line 512
at [0x00000000400195a8] Lorg/jikesrvm/scheduler/VM_Thread; getIndex()I at line 1143
at [0x00000000400195e0] Lorg/jikesrvm/scheduler/VM_Thread; dump([CI)I at line 1936
at [0x0000000040019660] Lorg/jikesrvm/scheduler/VM_Thread; dump([C)I at line 1978
at [0x00000000400196b0] Lorg/jikesrvm/scheduler/VM_Thread; dump(I)V at line 1904
at [0x0000000040019718] Lorg/jikesrvm/scheduler/VM_Thread; dump()V at line 1869
at [0x0000000040019750] Lorg/jikesrvm/scheduler/greenthreads/VM_GreenThreadQueue; dump()V at line 244
at [0x00000000400197a0] Lorg/jikesrvm/scheduler/greenthreads/VM_GreenProcessor; dumpProcessorState()V at line 603
at [0x00000000400197e8] Lorg/jikesrvm/scheduler/greenthreads/VM_GreenScheduler; dumpVirtualMachineInternal()V at line 457
at [0x0000000040019850] Lorg/jikesrvm/scheduler/VM_Scheduler; dumpVirtualMachine()V at line 239
at [0x0000000040019870] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2045
at [0x00000000400198a0] Lorg/jikesrvm/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 551
at [0x00000000400198f8] Lorg/jikesrvm/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 534
at [0x0000000040019950] Lorg/jikesrvm/VM; _assert(Z)V at line 512
at [0x0000000040019990] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 140
at [0x0000000040019a70] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 80
at [0x0000000040019b30] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333
at [0x0000000040019b70] Lorg/mmtk/plan/SimpleCollector; collectionPhase(SZ)V at line 87
at [0x0000000040019bd0] Lorg/mmtk/plan/semispace/SSCollector; collectionPhase(SZ)V at line 140
at [0x0000000040019c30] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477
at [0x0000000040019d40] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390
at [0x0000000040019d88] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39
at [0x0000000040019db8] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385
at [0x0000000040019f48] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617
Virtual machine state:


David Grove added a comment - 16/May/08 09:56 PM
MarkSweep results are the same as SemiSpace. I broke the 2 parts of the assertion at line 140 into line 140 and 141. It's the second part that is failing (slot.loadObjectReference is a validRef).

[excalibur:/homes/excalibur/dgrove/SPECjvm98] ../buildit/rvm-trunk/dist/BaseBaseMarkSweep_ppc64-aix/rvm -verbose:gc -X:processors=2 SpecApplication -s100 -m5 -M5 -a _213_javac
Speed will be 100
Will run each benchmark at least 5 times
Will run each benchmark at most 5 times
[Forced GC][GC 1 Start 1.88 s 12008KB validRef: REF outside heap, ref = 0x30c81e6000000000
Key: (I)mmortal (N)onmoving (D)iscontiguous (E)xtent (F)raction
HEAP_START 0x0000000030000000
AVAILABLE_START 0x0000000035800000
boot IN 0x0000000030000000->0x000000003fffffff E 0x0000000010000000
immortal IND [0x0000000040000000->0x00000000403fffff]
meta ND [0x0000000042000000->0x00000000423fffff]
los ND [0x0000000041400000->0x00000000417fffff]
plos N 0x00000000c5000000->0x00000000cfffffff F 0.07
sanity ND []
non-moving ND [0x0000000041000000->0x00000000413fffff]
sm-code ND [0x0000000040800000->0x0000000040bfffff]
lg-code ND [0x0000000040c00000->0x0000000040ffffff]
ms ND [0x0000000041c00000->0x0000000041ffffff, 0x0000000041800000->0x0000000041bfffff, 0x0000000040400000->0x00000000407fffff]
AVAILABLE_END 0x00000000d0000000
HEAP_END 0x00000000d0000000
Died in GC:
vm internal error at:

– Stack –
at [0x0000000040019da0] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2043
at [0x0000000040019dd0] Lorg/jikesrvm/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 551
at [0x0000000040019e28] Lorg/jikesrvm/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 534
at [0x0000000040019e80] Lorg/jikesrvm/VM; _assert(Z)V at line 512
at [0x0000000040019ec0] Lorg/jikesrvm/mm/mmtk/ScanBootImage; processChunk(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;Lorg/mmtk/plan/TraceLocal;)V at line 141
at [0x0000000040019fa0] Lorg/jikesrvm/mm/mmtk/ScanBootImage; scanBootImage(Lorg/mmtk/plan/TraceLocal;)V at line 80
at [0x000000004001a060] Lorg/jikesrvm/mm/mmtk/Scanning; computeBootImageRoots(Lorg/mmtk/plan/TraceLocal;)V at line 333
at [0x000000004001a0a0] Lorg/mmtk/plan/SimpleCollector; collectionPhase(SZ)V at line 87
at [0x000000004001a100] Lorg/mmtk/plan/marksweep/MSCollector; collectionPhase(SZ)V at line 96
at [0x000000004001a160] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477
at [0x000000004001a270] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390
at [0x000000004001a2b8] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39
at [0x000000004001a2e8] Lorg/jikesrvm/memorymanagers/mminterface/VM_CollectorThread; run()V at line 385
at [0x000000004001a478] Lorg/jikesrvm/scheduler/VM_Thread; startoff()V at line 617
Virtual machine state:


David Grove added a comment - 16/May/08 10:04 PM
It might be a red herring, but according to the bootimage maps the upper 4 bytes of the bad reference correspond to the same string literal.

in the BaseBaseMarkSweep image:
68166 0x0000000000002918 literal 0x0000000030c81e60 "gc thread transfer"

and in the BaseBaseSemiSpace image:
68170 0x0000000000002928 literal 0x0000000030c89740 "gc thread transfer"


David Grove added a comment - 16/May/08 10:07 PM
hmmm...and that only place that string literal shows up in our source code base is:

private void transferThread(VM_GreenThread t) {
if (t.isGCThread()) {
collectorThreadMutex.lock("gc thread transfer");
if (verbose>0) VM.sysWriteln("setting collectorThread to ",t.getIndex()," in GP.transferThread for ",id);
collectorThread = t;

Wonder if perhaps a magic being used to manipulate the mutex isn't right in 64 bits. Seems really far fetched but it might explain why the crash only shows up when we have more than 1 virtual processor in the mix.


David Grove added a comment - 16/May/08 10:13 PM
Bang: bogus code found in VM_Processor

private final String[] lockReasons = VM.VerifyAssertions ? new String[100] : null;

public void registerLock(String reason) { VM_Magic.setObjectAtOffset(lockReasons, Offset.fromIntSignExtend(lockCount<<2), reason); lockCount ++; }


David Grove added a comment - 16/May/08 10:48 PM
Fix committed in r14231.

Thanks for the help Steve. Once it became clear from the assertions you added that the problem was that value in the bootimage itself was wrong, I had the idea of looking at RVM.map and then got lucky....

I left the 14230 assertions in place; not sure if you wanted to pull them back out or leave them in.


David Grove added a comment - 04/Jul/08 02:22 PM
reopening so I can modify fix target to 3.0