RVM
  1. RVM
  2. RVM-755

Tweaks to optimizing compilation to improve the performance of write barriers

    Details

    • Type: Improvement Improvement
    • Status: Open Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 3.2
    • Component/s: Compiler: Optimizing
    • Labels:
      None
    • Number of attachments :
      3

      Description

      We don't allow copy propagation of physical registers, however, in the case of the processor register this appears overly cautious. In the current generational putfield write barrier we get:

      t1 = PR // define t1 to PR
      if ....
      return // likely
      else
      ... = t1 // unlikely - use of t1

      as t1 is alive in the slow path the definition must occur at the head of the method, so we copy PR redundantly for it only to be used if we get into the write barrier slow path. We should just copy propagate PR and save the allocation of t1.

      Similarly the test at the head of the put field write barrier is often

      ...
      t2 = l1 + constant_offset
      if t2 < constant_start_of_nursery
      ...

      which can be folded in BURS to:

      if l1 < constant_start_of_nursery - constant_offset

      this is already performed in expression folding, but currently disabled.

      1. all_ir_copy_prop.out
        263 kB
        Ian Rogers
      2. all_ir_no_copy_prop.out
        328 kB
        Ian Rogers
      3. copy-prop-pr.patch
        3 kB
        Ian Rogers

        Activity

        Hide
        David Grove added a comment -

        Duplicating comment from core list dicusssion:

        I'm objecting to letting the opt compiler copy propagate physical registers as I think it has some potential for introducing bugs in code that is using Magic to manipulate the processor register. The problem is that MMTk gets hold of the PR register through a series of function calls. By the time we inline all of those, we are left with an assignment of PR to a temporary variable (the return value from the inlined function). Ian is right that if we allow copy propagation to be applied to this IR, then we will get better code.

        My reluctance to do this optimization is that the opt compiler will also copy propagate in cases where the programmer has explicitly assigned PR to a local variable because they want to hold on to a particular value in that register and use it later.

        It is awkward for the opt compiler to be able to distinguish between code where the programmer is explicitly putting PR in a local variable vs. where it is only in a local variable as a result of inlining.

        The invasive change is that I'd prefer to force the Magic programmer to invoke the magic to get the PR everywhere they want it in the inline sequence instead of wrapping layers of function calls.

        To be clear, I'm not saying this change is a good idea (or one that we should do), but having been burned badly in the past by the opt compiler trying to be to smart and optimize code sequences that arise from VM magic I would like us to not enable any such optimization unless we really understand and document what implications it has for the correct and bug free use of magic.

        I'm also putting this mail in the tracker...

        --dave

        Show
        David Grove added a comment - Duplicating comment from core list dicusssion: I'm objecting to letting the opt compiler copy propagate physical registers as I think it has some potential for introducing bugs in code that is using Magic to manipulate the processor register. The problem is that MMTk gets hold of the PR register through a series of function calls. By the time we inline all of those, we are left with an assignment of PR to a temporary variable (the return value from the inlined function). Ian is right that if we allow copy propagation to be applied to this IR, then we will get better code. My reluctance to do this optimization is that the opt compiler will also copy propagate in cases where the programmer has explicitly assigned PR to a local variable because they want to hold on to a particular value in that register and use it later. It is awkward for the opt compiler to be able to distinguish between code where the programmer is explicitly putting PR in a local variable vs. where it is only in a local variable as a result of inlining. The invasive change is that I'd prefer to force the Magic programmer to invoke the magic to get the PR everywhere they want it in the inline sequence instead of wrapping layers of function calls. To be clear, I'm not saying this change is a good idea (or one that we should do), but having been burned badly in the past by the opt compiler trying to be to smart and optimize code sequences that arise from VM magic I would like us to not enable any such optimization unless we really understand and document what implications it has for the correct and bug free use of magic. I'm also putting this mail in the tracker... --dave
        Hide
        Ian Rogers added a comment -

        Can we resolve this by documenting that copy propagation of the processor register relies on the processor being @NonMoving and then also document that in the Processor (soon to be RVMThread, but it still holds) that the NonMoving allows amongst other things copy propagation of the processor register? Looking through the history, Processor was made NonMoving during the merge of read barriers.

        Show
        Ian Rogers added a comment - Can we resolve this by documenting that copy propagation of the processor register relies on the processor being @NonMoving and then also document that in the Processor (soon to be RVMThread, but it still holds) that the NonMoving allows amongst other things copy propagation of the processor register? Looking through the history, Processor was made NonMoving during the merge of read barriers.
        Hide
        David Grove added a comment -

        With green threads, Non-Moving doesn't help. IThe opt compiler has to understand that all potential thread switch points can cause the value of PR to change.

        With native threads, things are simpler and we can probably document usage and do something like this.

        Show
        David Grove added a comment - With green threads, Non-Moving doesn't help. IThe opt compiler has to understand that all potential thread switch points can cause the value of PR to change. With native threads, things are simpler and we can probably document usage and do something like this.
        Hide
        Ian Rogers added a comment -

        Then we're back to the issue that it would be invalid to have fewer reads of the PR but having more reads is fine. As r15297 causes more reads of PR (not less) then it is valid. The only time it'd be unexpected would be for code like (NB method names not strictly adhered to):

        Processor p = Magic.getProcessor();
        RemSet r = p.remset;
        ... // possible yield
        assert (r == p.remset);

        But this code would be broken at other levels.

        Show
        Ian Rogers added a comment - Then we're back to the issue that it would be invalid to have fewer reads of the PR but having more reads is fine. As r15297 causes more reads of PR (not less) then it is valid. The only time it'd be unexpected would be for code like (NB method names not strictly adhered to): Processor p = Magic.getProcessor(); RemSet r = p.remset; ... // possible yield assert (r == p.remset); But this code would be broken at other levels.
        Hide
        David Grove added a comment -
        Processor p = Magic.getProcessor();
        ....code that might have a yieldpoint...
        if (p != Magic.getProcessor()) {
         ....code that wants to do something if there has been a switch of virtual processor...
        }
        

        I agree it is contrived, but if the opt compiler copy propagated PR, there would be no reliable way to write this code.
        This is why I'm conservative about enabling this optimization.

        Show
        David Grove added a comment - Processor p = Magic.getProcessor(); ....code that might have a yieldpoint... if (p != Magic.getProcessor()) { ....code that wants to do something if there has been a switch of virtual processor... } I agree it is contrived, but if the opt compiler copy propagated PR, there would be no reliable way to write this code. This is why I'm conservative about enabling this optimization.

          People

          • Assignee:
            Unassigned
            Reporter:
            Ian Rogers
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated: