RVM
  1. RVM
  2. RVM-148

Reduce cost of inlining runtime services

    Details

    • Type: Improvement Improvement
    • Status: Open Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1000
    • Component/s: Compiler: Optimizing
    • Labels:
      None
    • Number of attachments :
      0

      Description

      Currently we use inlining to expand runtime services as well as in BC2IR. The expansion in runtime services is often expanding similar operations. For example, the expansion of runtime services for the method Object.toString in the production compiler makes 134 inline decisions. In contrast BC2IR for the same method makes 43 inlining decisions. Part of the reason for this cost are the improvements to the modularity of the runtime and GC systems. It would be nice if we could reuse cached translations of the runtime services to reduce the cost of their expansion into the IR. Problems with exception handling and parameter passing/constant propagation would need dealing with. Possibly the cached translations would have different versions for particular constant arguments.

        Activity

        Hide
        David Grove added a comment -

        To put this in context, I did a quick measurement of opt compilation time on ia32-linux. What this feature would accomplish is to attempt to reduce compilation time spent in OPT_ExpandRuntimeServices at the cost of doing some fairly significant engineering to cache & patch in opt compiler IR. Historically, we've considered this several times, but have never been motivated enough by the potential reduction in compilation time to do the work. Things change as the system evolves, but just wanted to note that this has been considered in the past and not deemed to be worth the engineering it would take to accomplish it.

        With the current system forcing compilation of every method in _213_javac we get:
        at O0 expand runtime services is 2.7% of compilation time (we don't inline allocations at O0).
        at O1 expand runtime services is 19.4% of compilation time
        at O2 expand runtime services is 18.7% of compilation time (currently O1 is almost the same as O2).

        Using the default (O1) optimization level, the command:
        ../rvmRoot/dist/production_ia32-linux/rvm -X:vm:measureCompilation=true -X:aos:enable_recompilation=false -X:aos:initial_compiler=opt SpecApplication -s100 -m1 -M1 -a _213_javac

        yields:

        Compilation Subsystem Report
        Comp #Meths Time bcb/ms mcb/bcb MCKB BCKB
        JNI 19 0.39 NA NA 4.5 NA
        Base 115 7.09 1083.66 7.64 79.1 10.4
        Opt 684 8015.46 11.91 7.76 596.2 76.8

        Baseline Compiler SubSystem
        Phase Time
        (ms) (%ofTotal)
        Compute GC Maps 1.78 28.28
        OSR setup 0.02 0.24
        Code generation 3.78 59.90
        Encode GC/MC maps 0.73 11.57
        TOTAL 6.31
        Optimizing Compiler SubSystem
        Phase Time
        (ms) (%ofTotal)
        Convert Bytecodes to HIR
        Generate HIR 447 5.61%
        AdjustBytecodeIndexes 0 0.00%
        OSR_OsrPointConstructor 82 1.04%
        Branch Optimizations 34 0.43%
        Adjust Branch Probabilities 5 0.06%
        TOTAL 570 7.14%
        CFG Transformations
        Tail Recursion Elimination 2 0.04%
        Basic Block Frequency Estima
        Build LST 46 0.58%
        Estimate Block Frequenci 23 0.30% 25.57% Infrequent BBs
        TOTAL 70 0.88%
        Static Splitting 12 0.16%
        Branch Optimizations 42 0.54%
        TOTAL 128 1.61%
        CFG Structural Analysis
        Build LST 25 0.33%
        Yield Point Insertion 2 0.04%
        Estimate Block Frequencies 15 0.19% 25.21% Infrequent BBs
        TOTAL 44 0.55%
        Simple Opts 91 1.14%
        Escape Transformations 56 0.71%
        Branch Optimizations 29 0.37%
        Local CopyProp 12 0.15% 25.18% Infrequent BBs
        Local ConstantProp 7 0.09% 25.18% Infrequent BBs
        Local CSE 39 0.50% 25.18% Infrequent BBs
        Field Analysis 3 0.04%
        Convert HIR to LIR
        Expand Runtime Services 1552 19.44% 13.37% Infrequent RS calls
        Branch Optimizations 88 1.11%
        Local Cast Optimizations 9 0.12% 10.27% Infrequent BBs
        HIR Operator Expansion 65 0.82%
        Branch Optimizations 136 1.70%
        Adjust Branch Probabilities 10 0.13%
        TOTAL 1863 23.33%
        Local CopyProp 32 0.41% 8.84% Infrequent BBs
        Local ConstantProp 65 0.82% 8.84% Infrequent BBs
        Local CSE 43 0.54% 8.87% Infrequent BBs
        Simple Opts 203 2.55%
        Basic Block Frequency Estimation
        Build LST 73 0.92%
        Estimate Block Frequencies 45 0.57% 30.13% Infrequent BBs
        TOTAL 118 1.49%
        Code Reordering 183 2.30%
        Branch Optimizations 121 1.52%
        Convert LIR to MIR
        SplitBasicBlock 4 0.06%
        Instruction Selection
        Reduce Operators 8 0.11%
        ConvertALUOps 153 1.93%
        Normalize Constants 37 0.47%
        Live Handlers 0 0.00%
        DepGraph & BURS 1065 13.35% 30.77% Infrequent BBs
        Complex Operators 47 0.59%
        NullCheckCombining 29 0.37%
        TOTAL 1342 16.82%
        TOTAL 1347 16.88%
        Register Mapping
        MIR Range Splitting 21 0.27%
        Expand Calling Convention 109 1.37%
        Expand Calling Convention 0 0.00%
        Live Analysis 352 4.42%
        Register Allocation
        Register Allocation Prep 90 1.13%
        Linear Scan Composite Ph
        Interval Analysis 205 2.57%
        Register Restriction 171 2.15%
        Linear Scan 969 12.14%
        Update GCMaps 1 8 0.11%
        Spill Code 468 5.87%
        Update GCMaps 2 50 0.63%
        Update OSRMaps 2 0.04%
        TOTAL 1877 23.51%
        TOTAL 1967 24.63%
        Insert Prologue/Epilogue 82 1.03%
        TOTAL 2533 31.73%
        Branch Optimizations 61 0.77%
        Generate Machine Code
        Final MIR Expansion 38 0.49%
        Assembler Driver 388 4.87%
        TOTAL 427 5.35%

        TOTAL COMPILATION TIME 7985

        Show
        David Grove added a comment - To put this in context, I did a quick measurement of opt compilation time on ia32-linux. What this feature would accomplish is to attempt to reduce compilation time spent in OPT_ExpandRuntimeServices at the cost of doing some fairly significant engineering to cache & patch in opt compiler IR. Historically, we've considered this several times, but have never been motivated enough by the potential reduction in compilation time to do the work. Things change as the system evolves, but just wanted to note that this has been considered in the past and not deemed to be worth the engineering it would take to accomplish it. With the current system forcing compilation of every method in _213_javac we get: at O0 expand runtime services is 2.7% of compilation time (we don't inline allocations at O0). at O1 expand runtime services is 19.4% of compilation time at O2 expand runtime services is 18.7% of compilation time (currently O1 is almost the same as O2). Using the default (O1) optimization level, the command: ../rvmRoot/dist/production_ia32-linux/rvm -X:vm:measureCompilation=true -X:aos:enable_recompilation=false -X:aos:initial_compiler=opt SpecApplication -s100 -m1 -M1 -a _213_javac yields: Compilation Subsystem Report Comp #Meths Time bcb/ms mcb/bcb MCKB BCKB JNI 19 0.39 NA NA 4.5 NA Base 115 7.09 1083.66 7.64 79.1 10.4 Opt 684 8015.46 11.91 7.76 596.2 76.8 Baseline Compiler SubSystem Phase Time (ms) (%ofTotal) Compute GC Maps 1.78 28.28 OSR setup 0.02 0.24 Code generation 3.78 59.90 Encode GC/MC maps 0.73 11.57 TOTAL 6.31 Optimizing Compiler SubSystem Phase Time (ms) (%ofTotal) Convert Bytecodes to HIR Generate HIR 447 5.61% AdjustBytecodeIndexes 0 0.00% OSR_OsrPointConstructor 82 1.04% Branch Optimizations 34 0.43% Adjust Branch Probabilities 5 0.06% TOTAL 570 7.14% CFG Transformations Tail Recursion Elimination 2 0.04% Basic Block Frequency Estima Build LST 46 0.58% Estimate Block Frequenci 23 0.30% 25.57% Infrequent BBs TOTAL 70 0.88% Static Splitting 12 0.16% Branch Optimizations 42 0.54% TOTAL 128 1.61% CFG Structural Analysis Build LST 25 0.33% Yield Point Insertion 2 0.04% Estimate Block Frequencies 15 0.19% 25.21% Infrequent BBs TOTAL 44 0.55% Simple Opts 91 1.14% Escape Transformations 56 0.71% Branch Optimizations 29 0.37% Local CopyProp 12 0.15% 25.18% Infrequent BBs Local ConstantProp 7 0.09% 25.18% Infrequent BBs Local CSE 39 0.50% 25.18% Infrequent BBs Field Analysis 3 0.04% Convert HIR to LIR Expand Runtime Services 1552 19.44% 13.37% Infrequent RS calls Branch Optimizations 88 1.11% Local Cast Optimizations 9 0.12% 10.27% Infrequent BBs HIR Operator Expansion 65 0.82% Branch Optimizations 136 1.70% Adjust Branch Probabilities 10 0.13% TOTAL 1863 23.33% Local CopyProp 32 0.41% 8.84% Infrequent BBs Local ConstantProp 65 0.82% 8.84% Infrequent BBs Local CSE 43 0.54% 8.87% Infrequent BBs Simple Opts 203 2.55% Basic Block Frequency Estimation Build LST 73 0.92% Estimate Block Frequencies 45 0.57% 30.13% Infrequent BBs TOTAL 118 1.49% Code Reordering 183 2.30% Branch Optimizations 121 1.52% Convert LIR to MIR SplitBasicBlock 4 0.06% Instruction Selection Reduce Operators 8 0.11% ConvertALUOps 153 1.93% Normalize Constants 37 0.47% Live Handlers 0 0.00% DepGraph & BURS 1065 13.35% 30.77% Infrequent BBs Complex Operators 47 0.59% NullCheckCombining 29 0.37% TOTAL 1342 16.82% TOTAL 1347 16.88% Register Mapping MIR Range Splitting 21 0.27% Expand Calling Convention 109 1.37% Expand Calling Convention 0 0.00% Live Analysis 352 4.42% Register Allocation Register Allocation Prep 90 1.13% Linear Scan Composite Ph Interval Analysis 205 2.57% Register Restriction 171 2.15% Linear Scan 969 12.14% Update GCMaps 1 8 0.11% Spill Code 468 5.87% Update GCMaps 2 50 0.63% Update OSRMaps 2 0.04% TOTAL 1877 23.51% TOTAL 1967 24.63% Insert Prologue/Epilogue 82 1.03% TOTAL 2533 31.73% Branch Optimizations 61 0.77% Generate Machine Code Final MIR Expansion 38 0.49% Assembler Driver 388 4.87% TOTAL 427 5.35% TOTAL COMPILATION TIME 7985
        Hide
        Peter Donald added a comment -

        It may be the case that the reduction in compilation time is not worth the engineering effort.
        I would have assumed that by reducing the cost of inlining certain sequences (ie barriers and allocation) that they could always be inlined. Thus we see an increase in application performance with little to no increase in compilation overhead.

        It would be nice to be able to annotate future runtime services by @CacheIR to decide which particular sequences are cached.

        Show
        Peter Donald added a comment - It may be the case that the reduction in compilation time is not worth the engineering effort. I would have assumed that by reducing the cost of inlining certain sequences (ie barriers and allocation) that they could always be inlined. Thus we see an increase in application performance with little to no increase in compilation overhead. It would be nice to be able to annotate future runtime services by @CacheIR to decide which particular sequences are cached.

          People

          • Assignee:
            Unassigned
            Reporter:
            Ian Rogers
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated: