Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Minor
-
Resolution: Fixed
-
Affects Version/s: 3.1.0
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
-
Environment:Jikes RVM 3.0.1 (r15707), ia32-linux, prototype-opt
-
Number of attachments :2
Description
This is not yet a bug, but it certainly looks like a bug waiting to happen: If one removes the call to MemoryManager.getAllocationSite(true) in BaselineCompilerImpl (as the site ID is currently not used anyway) and instead passes a constant to the corresponding entrypoint method (we tested this with resolvedNewScalar), the RVM hangs upon the first allocation while still in finishBooting, throwing an OutOfMemoryException.
Exception in thread "Jikes_RBoot_Thread": java.lang.OutOfMemoryError
at java.lang.String.toCharArray(String.java:1742)
at java.lang.String.replace(String.java:1307)
at org.jikesrvm.VM.runClassInitializer(VM.java:527)
at org.jikesrvm.VM.finishBooting(VM.java:403)
at org.jikesrvm.VM.boot(VM.java:145)
The attached patch is a two line change that makes this bug occur.
My guess is that this method call, which the patch removes, implicitly encodes a dependency on Plan (the class which ultimately does the work for getAllocationSite). If this is the case, it should rather be made explicit, as it sooner or later will bite you – just as it bit us. ![]()
-
- BaselineCompilerImpl.java.patch
- 28/Aug/09 10:33 AM
- 1 kB
- Andreas Sewe
-
- rvm.patch
- 30/Aug/09 3:49 AM
- 2 kB
- Dingwen Yuan
Activity
First of all, here's what SVN has to say about the modified RVM I am using:
sewe@rossini:~/pretenuring/jikesrvm-3.1.0$ svn info
Path: .
URL: https://jikesrvm.svn.sourceforge.net/svnroot/jikesrvm/rvmroot/tags/3.1.0
Repository Root: https://jikesrvm.svn.sourceforge.net/svnroot/jikesrvm
Repository UUID: dce3f540-011b-0410-8642-b5df2b1aa704
Revision: 15768
Node Kind: directory
Schedule: normal
Last Changed Author: dgrove-oss
Last Changed Rev: 15707
Last Changed Date: 2009-06-10 16:20:23 +0200 (Wed, 10 Jun 2009)sewe@rossini:~/pretenuring/jikesrvm-3.1.0$ svn status
M rvm/src/org/jikesrvm/compilers/baseline/ia32/BaselineCompilerImpl.javasewe@rossini:~/pretenuring/jikesrvm-3.1.0$ svn diff
Index: rvm/src/org/jikesrvm/compilers/baseline/ia32/BaselineCompilerImpl.java
===================================================================
— rvm/src/org/jikesrvm/compilers/baseline/ia32/BaselineCompilerImpl.java (revision 15768)
+++ rvm/src/org/jikesrvm/compilers/baseline/ia32/BaselineCompilerImpl.java (working copy)
@@ -3240,14 +3240,14 @@
int whichAllocator = MemoryManager.pickAllocator(typeRef, method);
int align = ObjectModel.getAlignment(typeRef, false);
int offset = ObjectModel.getOffsetForAlignment(typeRef, false);
- int site = MemoryManager.getAllocationSite(true);
+ // MemoryManager.getAllocationSite(true);
asm.emitPUSH_Imm(instanceSize);
asm.emitPUSH_Abs(Magic.getTocPointer().plus(tibOffset)); // put tib on stack
asm.emitPUSH_Imm(typeRef.hasFinalizer() ? 1 : 0); // does the class have a finalizer?
asm.emitPUSH_Imm(whichAllocator);
asm.emitPUSH_Imm(align);
asm.emitPUSH_Imm(offset);- asm.emitPUSH_Imm(site);
+ asm.emitPUSH_Imm(42);
genParameterRegisterLoad(asm, 7); // pass 7 parameter words
asm.emitCALL_Abs(Magic.getTocPointer().plus(Entrypoints.resolvedNewScalarMethod.getOffset()));
asm.emitPUSH_Reg(T0);
So, you see, it was indeed this little change that provoked the bug, as on "ia32-linux" both ant -f test.xml -Dtest-run.name=tiny -Dtest.configs=prototype and ant -f test.xml -Dtest-run.name=tiny -Dtest.configs=prototype-opt run just fine – unless I make the above change.
In that case, the first test of tiny reproducibly causes an OVERTIME error, leaving the following result:
GCWarning: Live ratio greater than 1: 7.77
GCWarning: Live ratio greater than 1: 1.02
GCWarning: Live ratio greater than 1: 1.02
Exception in thread "Jikes_RBoot_Thread": java.lang.OutOfMemoryError
at java.lang.String.toCharArray(String.java:1742)
at java.lang.String.replace(String.java:1307)
at org.jikesrvm.VM.runClassInitializer(VM.java:527)
at org.jikesrvm.VM.finishBooting(VM.java:403)
at org.jikesrvm.VM.boot(VM.java:145)
To summarize, the RVM doesn't terminate after this, but idles for a long while at 0% CPU until it is shut down by the test harness.
At any rate, here's some more information that may be helpful to you:
sewe@rossini:~/pretenuring/jikesrvm-3.1.0$ uname -a
Linux rossini 2.6.20-16-386 #2 Tue Feb 12 05:38:06 UTC 2008 i686 GNU/Linuxsewe@rossini:~/pretenuring/jikesrvm-3.1.0$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 6
model name : Intel(R) Pentium(R) D CPU 3.00GHz
stepping : 2
cpu MHz : 2992.790
cache size : 2048 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 6
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips : 5990.08
clflush size : 64
I hope this helps. But believe me, I am just as baffled by this behavior as you. ![]()
Hi Andreas,
I've just tried once again to reproduce the problem, this time from an entirely clean environment. Once again, I cannot reproduce the behavior you see (see below). Further, as I said earlier, my inspection of the code reveals no reason why this would be a bug.
I suggest the following action:
First, I encourage others to try to reproduce this behavior in their own environment (an environment distinct from Andreas'). If others are able to reproduce the behavior, it will be essential that they describe their environment fully so that I may be able to reproduce the problem too.
Second, if anyone can provide an explanation as to why this could be a bug (I've looked through the code and I don't believe it can be a bug), then I'm happy to follow up.
If we can't do either of the above within a reasonable timeframe then I'm afraid I'll have to close this issue and put it down to some inexplicable quirk in Andreas' environment.
As to the code in question: you are correct in that it is not actually used by the existing collectors. However, it exists as a very useful hook that has been used in published research and continues to be used. As it happens I am running experiments at this very minute that depend on that code (the allocation site ID). So I'm not inclined to remove it unless there is a very clear case for doing so.
--Steve
My third attempt to reproduce the behavior (on a Core 2 Quad Q6600 Kentsfield, running Ubuntu hardy):
teveb$ ssh coyote
Linux coyote 2.6.27-9-perfctr #1 SMP Thu Apr 2 11:59:38 EST 2009 i686
steveb@coyote:~$ cd /tmp
steveb@coyote:/tmp$ svn co https://jikesrvm.svn.sourceforge.net/svnroot/jikesrvm/rvmroot/trunk jikesrvm
[...snip...]
Checked out revision 15772.
steveb@coyote:/tmp$ cd jikesrvm/
steveb@coyote:/tmp/jikesrvm$ patch -p0 < ../BaselineCompilerImpl.java.patch
patching file rvm/src/org/jikesrvm/compilers/baseline/ia32/BaselineCompilerImpl.java
Hunk #1 succeeded at 3241 (offset 1 line).
steveb@coyote:/tmp/jikesrvm$ bin/buildit localhost -t dacapo prototype-opt
[...snip...]
test:
[echo] Test Result for [prototype-opt|dacapo] antlr (default) : SUCCESS
[echo] Test Result for [prototype-opt|dacapo] bloat (default) : SUCCESS
[echo] Test Result for [prototype-opt|dacapo] chart (default) : SUCCESS
[echo] Test Result for [prototype-opt|dacapo] eclipse (default) : SUCCESS
[echo] Test Result for [prototype-opt|dacapo] fop (default) : SUCCESS
[echo] Test Result for [prototype-opt|dacapo] hsqldb (default) : SUCCESS
[echo] Test Result for [prototype-opt|dacapo] jython (default) : SUCCESS
[echo] Test Result for [prototype-opt|dacapo] luindex (default) : SUCCESS
[echo] Test Result for [prototype-opt|dacapo] lusearch (default) : SUCCESS
[echo] Test Result for [prototype-opt|dacapo] pmd (default) : SUCCESS
[echo] Test Result for [prototype-opt|dacapo] xalan (default) : SUCCESS
[...snip...]
===================== Results =====================
Total Success Rate 11/11
Subversion Revision: 15772M
===================================================
steveb@coyote:/tmp/jikesrvm$
Hi Steve and Andreas,
I can reproduce this problem on my laptop. After downloading 3.1.0 release and applying the modification as specified by Andreas, I built jikes with "prototype" configuration.
Some informations about the commands and my computer:
pdv@pdv-laptop:~/workspace/test$ /work/jikesrvm-3.1.0/dist/prototype_ia32-linux/rvm
GCWarning: Live ratio greater than 1: 7.77
GCWarning: Live ratio greater than 1: 1.54
GCWarning: Live ratio greater than 1: 1.54
Exception in thread "Jikes_RBoot_Thread": java.lang.OutOfMemoryError
at java.lang.String.toCharArray(String.java:1742)
at java.lang.String.replace(String.java:1307)
at org.jikesrvm.VM.runClassInitializer(VM.java:527)
at org.jikesrvm.VM.finishBooting(VM.java:403)
at org.jikesrvm.VM.boot(VM.java:145)
pdv@pdv-laptop:~/workspace/test$ /work/jikesrvm-3.1.0/dist/prototype_ia32-linux/rvm -jar lib/dacapo-2006-10-MR2.jar -s default fop
GCWarning: Live ratio greater than 1: 7.77
GCWarning: Live ratio greater than 1: 1.54
GCWarning: Live ratio greater than 1: 1.54
Exception in thread "Jikes_RBoot_Thread": java.lang.OutOfMemoryError
at java.lang.String.toCharArray(String.java:1742)
at java.lang.String.replace(String.java:1307)
at org.jikesrvm.VM.runClassInitializer(VM.java:527)
at org.jikesrvm.VM.finishBooting(VM.java:403)
at org.jikesrvm.VM.boot(VM.java:145)
pdv@pdv-laptop:~/workspace/test$ uname -a
Linux pdv-laptop 2.6.27-7-generic #1 SMP Fri Oct 24 06:42:44 UTC 2008 i686 GNU/Linux
pdv@pdv-laptop:~/workspace/test$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 14
model name : Genuine Intel(R) CPU T1300 @ 1.66GHz
stepping : 8
cpu MHz : 1000.000
cache size : 2048 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss tm pbe nx constant_tsc up arch_perfmon bts pni monitor vmx est tm2 xtpr
bogomips : 3324.98
clflush size : 64
power management:
pdv@pdv-laptop:~/workspace/test$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=8.10
DISTRIB_CODENAME=intrepid
DISTRIB_DESCRIPTION="Ubuntu 8.10"
Another experiment I made is that if we replace the commented line "MemoryManager.getAllocationSite(true);" with a call on a empty function in Plan, the benchmark can pass successfully. (See my attached patch)
pdv@pdv-laptop:~/workspace/test$ /work/jikesrvm-3.1.0/dist/prototype_ia32-linux/rvm -jar lib/dacapo-2006-10-MR2.jar -s default fop
===== DaCapo fop starting =====
===== DaCapo fop PASSED in 32593 msec =====
replacing "MemoryManager.getAllocationSite(true)" with "Plan.nullFunc()"
I haven't looked at this yet. The workaround suggests that this method may be relied on for implicit classloading either during the build or initial boot stages.
I will try and make some time later this week to see what the status is wrt. head.
Dingwen Yuan: Thanks, that is very helpful. Nice work.
Daniel: Please see my posts above. I had already tested against head (and 3.0.1) and could not reproduce.
Given Dingwen Yuan's experience, I have just gone and tested against 3.1.0 and I CAN reproduce the problem on 3.1.0.
I then went and tested against the specific revision Andreas cited in his second post (15768). I do not see the problem.
I looked further to narrow the window between the head and 3.1.0 and established that the problem disappears from r15717 onward.
This is no co-incidence since this was Robin fixing a static initialization loop (see the commit message)!
http://jikesrvm.svn.sourceforge.net/viewvc/jikesrvm?view=rev&revision=15717
So Andreas, you should either use Dingwen's work-around or use revisions subsequent to 15716 (I don't understand the experience you have for 15768---I can't reproduce that, see below).
I am now closing the issue on account of Robin's pre-existing fix in 15717.
===================== Results =====================
Total Success Rate 11/11
Subversion Revision: 15768M
===================================================
steveb@coyote:/tmp/jikesrvm$ svn diff
Index: rvm/src/org/jikesrvm/compilers/baseline/ia32/BaselineCompilerImpl.java
===================================================================
— rvm/src/org/jikesrvm/compilers/baseline/ia32/BaselineCompilerImpl.java (revision 15768)
+++ rvm/src/org/jikesrvm/compilers/baseline/ia32/BaselineCompilerImpl.java (working copy)
@@ -3241,14 +3241,14 @@
int whichAllocator = MemoryManager.pickAllocator(typeRef, method);
int align = ObjectModel.getAlignment(typeRef, false);
int offset = ObjectModel.getOffsetForAlignment(typeRef, false);
- int site = MemoryManager.getAllocationSite(true);
+ // int site = MemoryManager.getAllocationSite(true);
asm.emitPUSH_Imm(instanceSize);
asm.emitPUSH_Abs(Magic.getTocPointer().plus(tibOffset)); // put tib on stack
asm.emitPUSH_Imm(typeRef.hasFinalizer() ? 1 : 0); // does the class have a finalizer?
asm.emitPUSH_Imm(whichAllocator);
asm.emitPUSH_Imm(align);
asm.emitPUSH_Imm(offset); - asm.emitPUSH_Imm(site);
+ asm.emitPUSH_Imm(42);
genParameterRegisterLoad(asm, 7); // pass 7 parameter words
asm.emitCALL_Abs(Magic.getTocPointer().plus(Entrypoints.resolvedNewScalarMethod.getOffset()));
asm.emitPUSH_Reg(T0);
All of the above confusion seems to have stemmed from two minor glitches in the reporting:
1. Note that the original bug report lists the environment as "Jikes RVM 3.0.1 (r15707), ia32-linux, prototype-opt." This is a typo. I think Andreas meant "3.1.0". Anyway, this is why I did my original tests against 3.0.1 (which does not in fact have this problem).
2. Note also that Andrea's svn info output is strange it lists 15768 as the Revision, but lists 15707 as the last change. 15768 does not have this problem, but 15707 does. So when I tested against 15768 I could not see the problem.
In short, I don't think there's any mystery at all and Robin's commit message for 15717 is entirely clear.
Yes, I did indeed mean 3.0.1; I mixed up the minor and micro version numbers.
Just for the record and commenting on Steve:
As to the code in question: you are correct in that it is not actually used by the existing collectors. However, it exists as a very useful hook that has been used in published research and continues to be used. As it happens I am running experiments at this very minute that depend on that code (the allocation site ID). So I'm not inclined to remove it unless there is a very clear case for doing so.
I did not suggest to remove the site parameter. In fact, if you look closely, you'll see that my prompt reads sewe@rossini:~/pretenuring/jikesrvm-3.1.0$; we (Dingwen is a student of mine) too have a use for site as well. We just happened to stumble upon this weird behavior when modifying resolvedNewScalar.
bulk close of all resolved issues in preparation for 3.1.3 release.
I'm confused by this. This is not (as far as I can tell) a bug, and it should never cause a problem. When I read your description, I could not fathom how this could be a bug.
Indeed, when I take your patch and apply it to the svn head and to 3.0.1, in both cases the system performs perfectly. I tested BaseBase, production and prototype-opt. I'm testing on ia32-linux.
So, I think there's something missing from this picture. Perhaps you made some other changes? Can you please try to repeat the bug using only the patch you provided and a fresh, clean 3.0.1 source?
Thanks.