RVM
  1. RVM
  2. RVM-213

jbb2005 performance regression

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.9.2
    • Fix Version/s: 2.9.2
    • Component/s: None
    • Labels:
      None
    • Environment:
      x86lnx
    • Number of attachments :
      0

      Description

      jbb2005 seems to have taken two discrete hits over the past couple of months which leave us at 82% of our most recent best.

      The following graph tells the story:

      http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/commit.1012/production/Performance/SPECjbb2005/SPECjbb2005/SPECjbb2005.large.png

      The first big hit was apparently due to the nano timer commit (r13209), which apparently jbb2005 did not properly recover from.

      The second hit is a little less obvious (between r14403 and r14425).

      Unless anyone has better ideas immediately, I'm prepared to take an exhaustive look at the revisions between 14403 and 14425 to try to get a clearer picture.

        Activity

        Hide
        Steve Blackburn added a comment -

        I have further refined things by running on a quad Core 2 with r13208, r13209, and r13214, with 1, 2 and 4 VPs. The results are below.

        It is clear now that:

        . The problem is evident on both dual and quad core Core 2
        . The problem is only really evident with 2 virtual processors
        . r13209 appears to be the big culprit (r13214 is also, but much less so).

        1 Virtual processor:

        13208 360M 1 02
        Valid run, Score is 5455
        13209 360M 1 02
        Valid run, Score is 5451
        13214 360M 1 02
        Valid run, Score is 5557

        2 Virtual processors:

        13208 360M 2 02
        Valid run, Score is 8528
        13209 360M 2 02
        Valid run, Score is 7315
        13214 360M 2 02
        Valid run, Score is 6924

        4 Virtual processors:

        13208 360M 4 01
        Valid run, Score is 10847
        13209 360M 4 01
        Valid run, Score is 10557
        13214 360M 4 01
        Valid run, Score is 10766

        Show
        Steve Blackburn added a comment - I have further refined things by running on a quad Core 2 with r13208, r13209, and r13214, with 1, 2 and 4 VPs. The results are below. It is clear now that: . The problem is evident on both dual and quad core Core 2 . The problem is only really evident with 2 virtual processors . r13209 appears to be the big culprit (r13214 is also, but much less so). 1 Virtual processor: 13208 360M 1 02 Valid run, Score is 5455 13209 360M 1 02 Valid run, Score is 5451 13214 360M 1 02 Valid run, Score is 5557 2 Virtual processors: 13208 360M 2 02 Valid run, Score is 8528 13209 360M 2 02 Valid run, Score is 7315 13214 360M 2 02 Valid run, Score is 6924 4 Virtual processors: 13208 360M 4 01 Valid run, Score is 10847 13209 360M 4 01 Valid run, Score is 10557 13214 360M 4 01 Valid run, Score is 10766
        Hide
        Steve Blackburn added a comment -

        OK. Another datapoint: I've reproduced the result on a P4D (below). So we now see a substantial degradation from 13208->13209 on Core 2 Quad, Core 2 Duo and P4D; around 20% in each case. I don't ahve access to a dual core AMD box. It would be interesting to try that out. So we now know the problem is not unique to the Core 2 architecture.

        P4D, 2 Virtual Processors:

        13208 360M 2 01
        Valid run, Score is 5228
        13209 360M 2 01
        Valid run, Score is 4269
        13214 360M 2 01
        Valid run, Score is 4053

        Show
        Steve Blackburn added a comment - OK. Another datapoint: I've reproduced the result on a P4D (below). So we now see a substantial degradation from 13208->13209 on Core 2 Quad, Core 2 Duo and P4D; around 20% in each case. I don't ahve access to a dual core AMD box. It would be interesting to try that out. So we now know the problem is not unique to the Core 2 architecture. P4D, 2 Virtual Processors: 13208 360M 2 01 Valid run, Score is 5228 13209 360M 2 01 Valid run, Score is 4269 13214 360M 2 01 Valid run, Score is 4053
        Hide
        Steve Blackburn added a comment -

        It seems possible that r13745 has fixed this problem, but I don't yet properly understand why.

        Here's my analysis based on one set of numbers I just run using 13208, 13209, 13744 and 13745 on a Core 2 Quad using 1 & 2 VPs:

        The 13208->13209 transition cost 1% on 1 core and 13% on 2 cores (it degraded 2 core by 12% more).
        The 13744->13745 transition gained 5.3% on 1 core and 22% on 2 cores (it improved 2 core by 17% more).

        So it seems that we recovered the ground lost in 13209.

        So much for correlation. What's the causation? Dave? Obviously AOS is going to be sensitive to changes in timing. I can figure that much

        Here are the raw numbers FWIW (revision, heap, VPs, iteration):

        1 VP:
        13208 360M 1 01
        Valid run, Score is 5460
        13209 360M 1 01
        Valid run, Score is 5421
        13744 360M 1 01
        Valid run, Score is 5854
        13745 360M 1 01
        Valid run, Score is 6183

        2 VP:
        13208 360M 2 01
        Valid run, Score is 8550
        13209 360M 2 01
        Valid run, Score is 7432
        13744 360M 2 01
        Valid run, Score is 6917
        13745 360M 2 01
        Valid run, Score is 8840

        Show
        Steve Blackburn added a comment - It seems possible that r13745 has fixed this problem, but I don't yet properly understand why. Here's my analysis based on one set of numbers I just run using 13208, 13209, 13744 and 13745 on a Core 2 Quad using 1 & 2 VPs: The 13208->13209 transition cost 1% on 1 core and 13% on 2 cores (it degraded 2 core by 12% more). The 13744->13745 transition gained 5.3% on 1 core and 22% on 2 cores (it improved 2 core by 17% more). So it seems that we recovered the ground lost in 13209. So much for correlation. What's the causation? Dave? Obviously AOS is going to be sensitive to changes in timing. I can figure that much Here are the raw numbers FWIW (revision, heap, VPs, iteration): 1 VP: 13208 360M 1 01 Valid run, Score is 5460 13209 360M 1 01 Valid run, Score is 5421 13744 360M 1 01 Valid run, Score is 5854 13745 360M 1 01 Valid run, Score is 6183 2 VP: 13208 360M 2 01 Valid run, Score is 8550 13209 360M 2 01 Valid run, Score is 7432 13744 360M 2 01 Valid run, Score is 6917 13745 360M 2 01 Valid run, Score is 8840
        Hide
        David Grove added a comment -

        My theory is that changing the compiler DNA caused us to do more of the optimizing recompilation early in the run (before we had built up a good enough dynamic call graph to make the right inlining decisions). One way to verify that theory would be to try comparing 13208 and 13209 with -X:aos:adaptive_inlining=false. If the theory is correct, then both revisions should get pretty much identical scores (which is likely to be a little lower than the normal 13209 score with adaptive inlining).

        The 1 core vs. 2 core thing might be because we are using the spare core to opt compile early in the run (before the second warehouse is started), so things are happening even faster, but that's a wild guess.

        Show
        David Grove added a comment - My theory is that changing the compiler DNA caused us to do more of the optimizing recompilation early in the run (before we had built up a good enough dynamic call graph to make the right inlining decisions). One way to verify that theory would be to try comparing 13208 and 13209 with -X:aos:adaptive_inlining=false. If the theory is correct, then both revisions should get pretty much identical scores (which is likely to be a little lower than the normal 13209 score with adaptive inlining). The 1 core vs. 2 core thing might be because we are using the spare core to opt compile early in the run (before the second warehouse is started), so things are happening even faster, but that's a wild guess.
        Hide
        David Grove added a comment -

        Although it would be nice to verify that we actually understand what happened, this is pretty much resolved (at least in the sense of not blocking a 2.9.2 release).

        Show
        David Grove added a comment - Although it would be nice to verify that we actually understand what happened, this is pretty much resolved (at least in the sense of not blocking a 2.9.2 release).

          People

          • Assignee:
            David Grove
            Reporter:
            Steve Blackburn
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: