Issue Details (XML | Word | Printable)

Key: RVM-550
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Daniel Frampton
Reporter: David Grove
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
RVM

Implementation of isZero magic in PPC baseline compiler can overflow expression stack and corrupt stack frame

Created: 01/Jul/08 10:15 AM   Updated: 06/Aug/08 07:37 AM
Component/s: Compiler: Baseline, Instruction Architecture: PowerPC
Affects Version/s: None
Fix Version/s: 3.0

Time Tracking:
Issue & Sub-Tasks
Issue Only
Not Specified

Issue Links:
Related
 

Sub-Tasks  All   Open   

 Description  « Hide
The least few ppc32-linux regression tests on rvmppclnx64.anu.edu.au have failed all prototype/prototype-opt tests because of a failure during bootimage writing.

In the bootimagewriter output, we see this exception:
Warning unable to find Java class for RVM type
java.lang.ClassNotFoundException: java.util.Collections$2
at java.lang.Class.forNameImpl(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.jikesrvm.classloader.RVMType.createClassForType(RVMType.java:548)
at org.jikesrvm.classloader.RVMType.<init>(RVMType.java:254)
at org.jikesrvm.classloader.RVMClass.<init>(RVMClass.java:1190)
at org.jikesrvm.classloader.RVMClass.readClass(RVMClass.java:1560)
at org.jikesrvm.classloader.RVMClassLoader.defineClassInternal(RVMClassLoader.java:336)
at org.jikesrvm.classloader.BootstrapClassLoader.loadVMClass(BootstrapClassLoader.java:120)
at org.jikesrvm.classloader.TypeReference.resolveInternal(TypeReference.java:789)
at org.jikesrvm.classloader.TypeReference.resolve(TypeReference.java:763)
at org.jikesrvm.tools.bootImageWriter.BootImageWriter.createBootImageObjects(BootImageWriter.java:1043)
at org.jikesrvm.tools.bootImageWriter.BootImageWriter.main(BootImageWriter.java:648)

Which results in a bootimage that when executing crashes during startup with the following.

regression@rvmppclnx64:~/jikesrvm/jikesrvm/dist$ ./prototype-opt_ppc32-linux/rvm
WARNING: attempt to get compiled method #0
WARNING: attempt to get compiled method #0
JikesRVM: internal error: recursive use of hardware exception registers (exiting)

– Stack –
at [0x31e5d4b4] Lorg/jikesrvm/runtime/RuntimeEntrypoints; deliverHardwareException(II)V at line 718
at [0x31e5d514] <hardware trap>
at [0x31e5d520] Lorg/jikesrvm/runtime/DynamicLinker$DL_Helper; resolveDynamicInvocation()Lorg/jikesrvm/runtime/DynamicLink; at line 97
at [0x31e5d568] Lorg/jikesrvm/runtime/DynamicLinker; lazyMethodInvoker()V at line 43
WARNING: attempt to get compiled method #0
at [0x31e5d6f8] <unprintable normal Java frame: CompiledMethods.getCompiledMethod(0) returned null>
at [0x31e5d718] Lorg/jikesrvm/runtime/DynamicLibrary; <init>(Ljava/lang/String;)V at line 98
at [0x31e5d768] Lorg/jikesrvm/runtime/DynamicLibrary; load(Ljava/lang/String;)I at line 225
at [0x31e5d798] Ljava/lang/VMRuntime; nativeLoad(Ljava/lang/String;Ljava/lang/ClassLoader;)I at line 87
at [0x31e5d7b8] Ljava/lang/Runtime; loadLib(Ljava/lang/String;Ljava/lang/ClassLoader;)I at line 698
at [0x31e5d7e8] Ljava/lang/Runtime; loadLibrary(Ljava/lang/String;Ljava/lang/ClassLoader;)V at line 760
at [0x31e5d838] Ljava/lang/System; loadLibrary(Ljava/lang/String;)V at line 662
at [0x31e5d858] Lorg/jikesrvm/VM; finishBooting()V at line 349
at [0x31e5d880] Lorg/jikesrvm/VM; boot()V at line 157



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Ian Rogers added a comment - 01/Jul/08 11:18 AM
There were a few new Classpath patches that may have filled rvmppclnx64's disk, and so Collections$2 may not have made it into classpath.jar.

David Grove added a comment - 01/Jul/08 02:31 PM
I think that's unlikely because development images are working and the disk isn't that full. But, to be sure I just blew away the classpath build from components, so the next run will rebuild it from scratch.

David Grove added a comment - 02/Jul/08 02:03 PM
clearing the components directory didn't help; still seeing same problem and there is plenty of space on the disk.

David Grove added a comment - 04/Jul/08 02:58 PM
release blocker.

Ian Rogers added a comment - 14/Jul/08 03:11 PM
Prototype builds work fine on PPC32 OSX with HotSpot 1.5.0.

David Grove added a comment - 15/Jul/08 05:01 PM
i'm trying a revert to using the IBM Java5 SR6 which I suspect is more stable than the 6.0 that was put in /tmp.

I build with Java 5 SR6b and SR7 on AIX. (this is what excalibur and piccolo have installed respectively). If SR6 doesn't work, it might be worth updating to SR7 on rvmppclnx64.


David Grove added a comment - 16/Jul/08 07:17 AM
bootimages built, but were busted due to assertion about identidy hashcode != 0, which has already been fixed. continue to monitor...

Ian Rogers added a comment - 19/Jul/08 04:02 AM
I believe switching the JDK version has broken the build of Classpath:
http://jikesrvm.anu.edu.au/cattrack/results/rvmppclnx64.anu.edu.au/core/4431/prototype/Output.txt

David Grove added a comment - 21/Jul/08 01:15 PM
I believe the problem is that the GNU classpath build is not copying vm/reference/sun/misc/Unsafe.java into the build. The version of sun.misc.Unsafe found in the linux-ppc IBM JVM does not define the putOrderedObject method and because the reference implementation is not being copied to a place where the compile will find it, this results in a compilation error.

David Grove added a comment - 01/Aug/08 10:50 PM
Moving to 3.0.1 because I'm pretty sure this is just a problem with the ANU machine, not a general breakage of ppc32-linux. I will verify by testing the release candidate on piano, building ppc32-linux.

David Grove added a comment - 04/Aug/08 09:30 AM
Sigh. It isn't just a problem on rvmppc32-linux. It's broken on piano as well. Looks to me like a problem with primordials. I'll try to poke at it today, but we're going to have to fix this before the 3.0 release.

Daniel Frampton added a comment - 05/Aug/08 04:16 AM
I spent my afternoon performing a binary search to find when this issue was introduced.

r14597 seems to reliably be the cause of the bug (applying a reverse patch of this against head solves the issue I was seeing).

There is nothing that appears too evil in r14597 (which only changes DynamicLibrary.java) prompting further investigation:

It is possible to make the error go away by simply manually inlining the code for callOnLoad into the constructor .

It is also possible to correct the issue by simply including a line that calls getSymbol("JNI_OnLoad") and ignores the result at the start of the callOnLoad method.

The failure occurs when an unresolved call to the native run_JNIOnLoad method is run – the dynamic linker does not correctly find the method Id and crashes. I believe it sees the caller as <init> (which called it) rather than callOnLoad as expected. I assume all this points to the baseline compiler not handling all the corner cases for keeping frame pointers up to date, but I don't know the details.

I can look again at this tomorrow but want to share my status since it took me quite a while to hunt things down.

For a quick release of 3.0 we can simply do the manual inlining or back out r14597 (but I assume this is required for Harmony?).


Ian Rogers added a comment - 05/Aug/08 04:42 AM
We need r14597 for Harmony. The problem fixed is that in Harmony the libraries are more modular so there are more JNI_OnLoad routines than previously, but what really broke things were libraries with no JNI_OnLoad. When getSymbol is called it returns the address of the JNI_OnLoad routine, but it needn't be of the most recently dlopened library. r14597 changes DynamicLibrary so that it remembers the JNI_OnLoad address and will only invoke it if that address is unique.

Ian Rogers added a comment - 05/Aug/08 04:49 AM
btw: nice work Daniel!

I believe we need to fix the stack walk, but we can postpone this issue by just doing the manual inline.


David Grove added a comment - 05/Aug/08 04:28 PM
Nice Daniel. This matches what I was seeing as well.

I keep hoping to have a couple solid hours to try to nail the root bug, but didn't make it today. I propose that I try again tomorrow, but that if I don't get it by mid-day I will commit the workaround on the head and 3.0.0 release branches and we'll go ahead with a release. I still think it's worth waiting for a little bit to try to nail the real bug because it's lurking in there and could bit people in other circumstances that are less repeatable....


Daniel Frampton added a comment - 05/Aug/08 11:04 PM
The debugging trail continues...

We implement isZero on PPC by pushing a zero and then performing a comparison. It is the 'push' (li/stw) that is scribbling over the compiled method id.

My guess is that due to the small method size, the PPC stack is growing larger than we expect/calculate based on the Java bytecode. Address.isZero() is a virtual call with only one parameter (this) so the java stack does not have the two values that we are actually using in PPC.


Daniel Frampton added a comment - 05/Aug/08 11:22 PM
Bingo. Fix checked in r14838. Leaving open for Dave to double check the changed assembly code.

David Grove added a comment - 06/Aug/08 07:37 AM
will respin 3.0.0 release branch to pick up fix.