|
Having stared at the IR for this for quite a while I can't see what's bad. It looks like something unrelated is clobbering a register, but why this happens when the input values are negative I don't know and why it occurs in phases I don't know.
This bug effects both SSE and x87. Interestingly I'm seeing this error:
checkRemainders: long Failed: 10 % 8 = 2 (should be: 3 and from the code the test should only be checking the remainder for 7. As the code looks good I'm wondering whether the stack logic is broken. The following command line on a prototype-opt image is sufficient to recreate the failure with r14159.
rvm -X:aos:enable_recompilation=false org.jikesrvm.tools.oth.OptTestHarness -oc:O0 -oc:inline=false -method spec.benchmarks._200_check.PepTest checkRemD - -er SpecApplication main - -s100 -m10 -M10 -a _200_check. The LIR looks right to me, so the bug must be somewhere in the IA32 code gen and/or reg allocation. My feeling is that the code is right, but we're missing something from the bigger picture. It could be that ucomisd and its x87 equivalent aren't doing what we expect, but why would this be time dependent? What I see is that we pass the test initially and then after a few iterations we fail for a while, then for a few more iterations we pass for a while. I thought this could be SSE state being clobbered, but that wouldn't explain the bug also effecting the x87 code. I'm suspicious of the threading mechanism but iirc there are no yield points between the compare and branch. Given the failure is in such a small region of code it should be within our brain power to fix, but currently I'm some what stumped.
Tracking through the code there is a definite bug with long literals in the JTOC, we're marking slots that shouldn't be literals as literals.. one example is that we mark the slot with the current number of allocated threads as literal, this hold the value 7 during an oth compilation but when run holds the value 8. Causing the result "10 % 8 = 2 (should be: 3" as the test should have been 10 % 7.
So using the test program below:
public class test { we fail for the last 3 iterations. If I copy the value of the ftag register across from the last 3 iterations to the 1st iteration, we fail immediately. The problem is the fld is placing NAN on the stack. Ian tracked it down to the fact that in the opt codegen for FPREM we push 2 values, but only pop one. Thus after 7 fprems the next fld gets a stackoverflow exception.
We're optimistic that putting in an ffree to balance the fp stack operations in the implementation of FPREM will take care of the problem. I'm testing a fix... Fixed via a combination of r14160 and r14161.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-13 LABEL0 Frequency: 1.0
-2 EG ir_prologue l0sa(Lspec/benchmarks/_200_check/PepTest;,x,d), l2psd(D,d), l3psd(D,d), l4psd(D,d) =
0 G yieldpoint_prologue
4 double_rem t6sd(D) = l2psd(D,d), l3psd(D,d)
6 double_ifcmp t8sv(GUARD) = l4psd(D,d), t6sd(D), ==F, LABEL2, Probability: 0.5
-1 bbend BB0 (ENTRY)
9 LABEL1 Frequency: 0.5
which is odd as it doesn't give scope for clobbering the x87 values in between the fprem and the compare. Other odd things are that it will pass and the fail in phases. Also it only fails when the test values are -10.5 % -7.0 == -3.5, it passes in the cases where one of the arguments is positive. The IR itself looks sane. It's possible this could be a OPT_SaveVolatile quirk.