Issue Details (XML | Word | Printable)

Key: RVM-482
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: David Grove
Reporter: David Grove
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
RVM

Crash when running Production_Opt0_perf portion of compiler-dna test run

Created: 09/May/08 08:25 AM   Updated: 09/Jul/08 12:32 PM
Component/s: None
Affects Version/s: None
Fix Version/s: 3.0

Time Tracking:
Not Specified

Issue Links:
dependent
 


 Description  « Hide
Running this portion of the compiler-dna test run on linux-ia32 fairly reliably results in a crash a few iterations into_213_javac.

This is a non-adaptive configuration that compiles everything at O0 and runs 20 iterations of every SPECjvm98 benchmark in a single JVM instance. Since the crash is not in the first iteration, it's probably not an optimization/codegen bug. It smells more like a GC map problem.

This is not a configuration/test we normally run and my recollection is that I saw the same crash in July 2007 (last time I gathered the DNA). In 2007, I worked around it by manually running the O0 perf runs 1 benchmark at a time.



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
David Grove added a comment - 07/Jul/08 09:42 AM
The crash occurs here:

– Stack –
at [0x7000fa00] Lorg/jikesrvm/VM; sysFail(Ljava/lang/String;)V at line 2116
at [0x7000fa38] Lorg/jikesrvm/runtime/RuntimeEntrypoints; deliverHardwareException(II)V at line 682
at [0x7000fa4c] <hardware trap>
at [0x70015e04] Lorg/jikesrvm/mm/mmtk/ObjectModel; copy(Lorg/vmmagic/unboxed/ObjectReference;I)Lorg/vmmagic/unboxed/ObjectReference; at line 50
at [0x70015e38] Lorg/mmtk/policy/CopySpace; traceObject(Lorg/mmtk/plan/TransitiveClosure;Lorg/vmmagic/unboxed/ObjectReference;I)Lorg/vmmagic/unboxed/ObjectReference; at line 187
at [0x70015e60] Lorg/mmtk/plan/generational/GenNurseryTraceLocal; traceObject(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 87
at [0x70015e80] Lorg/mmtk/plan/TraceLocal; retainForFinalize(Lorg/vmmagic/unboxed/ObjectReference;)Lorg/vmmagic/unboxed/ObjectReference; at line 416
at [0x70015ec0] Lorg/mmtk/utility/Finalizer; moveToFinalizable(Lorg/mmtk/plan/TraceLocal;)I at line 236
at [0x70015ef0] Lorg/mmtk/plan/SimpleCollector; collectionPhase(SZ)V at line 110
at [0x70015f1c] Lorg/mmtk/plan/generational/GenCollector; collectionPhase(SZ)V at line 120
at [0x70015f50] Lorg/mmtk/plan/generational/marksweep/GenMSCollector; collectionPhase(SZ)V at line 144
at [0x70015fa8] Lorg/mmtk/plan/Phase; processPhaseStack(Z)Z at line 477
at [0x70015fc8] Lorg/mmtk/plan/Phase; beginNewPhaseStack(I)Z at line 390
at [0x70015fdc] Lorg/mmtk/plan/StopTheWorldCollector; collect()V at line 39
at [0x70016018] Lorg/jikesrvm/memorymanagers/mminterface/CollectorThread; run()V at line 385
at [0x70016040] Lorg/jikesrvm/scheduler/RVMThread; startoff()V at line 620

and happens even if -X:aos:enable_recompilation=false is given on the command line. Therefore, if this is an optimizaing compiler problem, it is due to Opt0 compilation of the bootimage (presumably code related to finalization), not to O0 compilation at runtime.

The crash tends to occur 15-25 iterations into _213_javac.

../rvm-trunk/dist/production_Opt_0_x86_64-linux/rvm -X:aos:enable_recompilation=false SpecApplication -s100 -m100 -M100 -a _213_javac


David Grove added a comment - 07/Jul/08 12:04 PM
I can run hundreds of iterations of _213_javac using either baseline, Opt1, or Opt2 compiled bootimaged, so it looks to be very likely that the problem is in O0 compilation of the bootimage.

David Grove added a comment - 07/Jul/08 01:58 PM
Forcing org.mmtk.utility.FInalizer.addCandidate to be re-compiled at O1 instead of at O0 before we start running _213_javac by using OptTestHarness appears to be sufficient to avoid the crash.

David Grove added a comment - 07/Jul/08 02:23 PM - edited
Specifically, this command line results in a successful execution:

../rvm-trunk/dist/production_Opt_0_x86_64-linux/rvm -X:aos:enable_recompilation=false org.jikesrvm.tools.oth.OptTestHarness -oc:O1 -oc:verbose=true -oc:O1 -method org.mmtk.utility.Finalizer addCandidate - -er SpecApplication main - -s100 -m100 -M100 -a _213_javac

While this command line results in the crash:

../rvm-trunk/dist/production_Opt_0_x86_64-linux/rvm -X:aos:enable_recompilation=false org.jikesrvm.tools.oth.OptTestHarness -oc:O0 -oc:verbose=true -oc:O0 -method org.mmtk.utility.Finalizer addCandidate - -er SpecApplication main - -s100 -m100 -M100 -a _213_javac

Both behaviors are 100% reproducible.


David Grove added a comment - 07/Jul/08 03:39 PM
crash still occurs even with -oc:inline=false

David Grove added a comment - 07/Jul/08 04:44 PM
Disabling local_copy_prop makes the crash go away. So either the bug is in this optimization pass or it is tickling some other downstream bug in the opt compiler.

ie, this command line results in a successful execution:

../rvm-trunk/dist/production_Opt_0_x86_64-linux/rvm -X:aos:enable_recompilation=false org.jikesrvm.tools.oth.OptTestHarness -oc:O1 -oc:verbose=true -oc:O0 -oc:inline=false -oc:local_copy_prop=false -oc:phases=true -method org.mmtk.utility.Finalizer addCandidate - -er SpecApplication main - -s100 -m100 -M100 -a _213_java

while this one results in a crash:
../rvm-trunk/dist/production_Opt_0_x86_64-linux/rvm -X:aos:enable_recompilation=false org.jikesrvm.tools.oth.OptTestHarness -oc:O1 -oc:verbose=true -oc:O0 -oc:inline=false -oc:phases=true -method org.mmtk.utility.Finalizer addCandidate - -er SpecApplication main - -s100 -m100 -M100 -a _213_java


Ian Rogers added a comment - 08/Jul/08 08:03 AM
Looking at the inline reports -oc:inline=false doesn't appear to work

David Grove added a comment - 08/Jul/08 08:52 AM
hmm, oc:inline=false works for me. Be sure to give it after the -oc:O0 argument (the command lines are applied in order and O0 enables inlining).

David Grove added a comment - 09/Jul/08 09:27 AM
It will take a few hours more testing to be positive but it looks almost certain that the problem was that local constant propagation was propagating through the move instruction introduced by a toAddress() call on an ObjectReference.

I'm testing a fix to local constant prop to prevent it from GENing based on move instructions where one operand is a reference type and the other side is not.


David Grove added a comment - 09/Jul/08 12:32 PM
was in fact resolved by local copy prop bug fix in r14688.