|
[
Permlink
| « Hide
]
Ian Rogers added a comment - 01/Jul/08 11:18 AM
There were a few new Classpath patches that may have filled rvmppclnx64's disk, and so Collections$2 may not have made it into classpath.jar.
I think that's unlikely because development images are working and the disk isn't that full. But, to be sure I just blew away the classpath build from components, so the next run will rebuild it from scratch.
clearing the components directory didn't help; still seeing same problem and there is plenty of space on the disk.
Prototype builds work fine on PPC32 OSX with HotSpot 1.5.0.
i'm trying a revert to using the IBM Java5 SR6 which I suspect is more stable than the 6.0 that was put in /tmp.
I build with Java 5 SR6b and SR7 on AIX. (this is what excalibur and piccolo have installed respectively). If SR6 doesn't work, it might be worth updating to SR7 on rvmppclnx64. bootimages built, but were busted due to assertion about identidy hashcode != 0, which has already been fixed. continue to monitor...
I believe switching the JDK version has broken the build of Classpath:
http://jikesrvm.anu.edu.au/cattrack/results/rvmppclnx64.anu.edu.au/core/4431/prototype/Output.txt I believe the problem is that the GNU classpath build is not copying vm/reference/sun/misc/Unsafe.java into the build. The version of sun.misc.Unsafe found in the linux-ppc IBM JVM does not define the putOrderedObject method and because the reference implementation is not being copied to a place where the compile will find it, this results in a compilation error.
Moving to 3.0.1 because I'm pretty sure this is just a problem with the ANU machine, not a general breakage of ppc32-linux. I will verify by testing the release candidate on piano, building ppc32-linux.
Sigh. It isn't just a problem on rvmppc32-linux. It's broken on piano as well. Looks to me like a problem with primordials. I'll try to poke at it today, but we're going to have to fix this before the 3.0 release.
I spent my afternoon performing a binary search to find when this issue was introduced.
r14597 seems to reliably be the cause of the bug (applying a reverse patch of this against head solves the issue I was seeing). There is nothing that appears too evil in r14597 (which only changes DynamicLibrary.java) prompting further investigation: It is possible to make the error go away by simply manually inlining the code for callOnLoad into the constructor It is also possible to correct the issue by simply including a line that calls getSymbol("JNI_OnLoad") and ignores the result at the start of the callOnLoad method. The failure occurs when an unresolved call to the native run_JNIOnLoad method is run – the dynamic linker does not correctly find the method Id and crashes. I believe it sees the caller as <init> (which called it) rather than callOnLoad as expected. I assume all this points to the baseline compiler not handling all the corner cases for keeping frame pointers up to date, but I don't know the details. I can look again at this tomorrow but want to share my status since it took me quite a while to hunt things down. For a quick release of 3.0 we can simply do the manual inlining or back out r14597 (but I assume this is required for Harmony?). We need r14597 for Harmony. The problem fixed is that in Harmony the libraries are more modular so there are more JNI_OnLoad routines than previously, but what really broke things were libraries with no JNI_OnLoad. When getSymbol is called it returns the address of the JNI_OnLoad routine, but it needn't be of the most recently dlopened library. r14597 changes DynamicLibrary so that it remembers the JNI_OnLoad address and will only invoke it if that address is unique.
btw: nice work Daniel!
I believe we need to fix the stack walk, but we can postpone this issue by just doing the manual inline. Nice Daniel. This matches what I was seeing as well.
I keep hoping to have a couple solid hours to try to nail the root bug, but didn't make it today. I propose that I try again tomorrow, but that if I don't get it by mid-day I will commit the workaround on the head and 3.0.0 release branches and we'll go ahead with a release. I still think it's worth waiting for a little bit to try to nail the real bug because it's lurking in there and could bit people in other circumstances that are less repeatable.... The debugging trail continues...
We implement isZero on PPC by pushing a zero and then performing a comparison. It is the 'push' (li/stw) that is scribbling over the compiled method id. My guess is that due to the small method size, the PPC stack is growing larger than we expect/calculate based on the Java bytecode. Address.isZero() is a virtual call with only one parameter (this) so the java stack does not have the two values that we are actually using in PPC. Bingo. Fix checked in r14838. Leaving open for Dave to double check the changed assembly code.
will respin 3.0.0 release branch to pick up fix.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||