History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: RVM-148
Type: Improvement Improvement
Status: Open Open
Priority: Minor Minor
Assignee: Unassigned
Reporter: Ian Rogers
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
RVM

Reduce cost of inlining runtime services

Created: 18/Jul/07 03:54 PM   Updated: 11/Apr/08 09:24 AM
Component/s: Compiler: Optimizing
Affects Version/s: None
Fix Version/s: 1000

Time Tracking:
Not Specified


 Description  « Hide
Currently we use inlining to expand runtime services as well as in BC2IR. The expansion in runtime services is often expanding similar operations. For example, the expansion of runtime services for the method Object.toString in the production compiler makes 134 inline decisions. In contrast BC2IR for the same method makes 43 inlining decisions. Part of the reason for this cost are the improvements to the modularity of the runtime and GC systems. It would be nice if we could reuse cached translations of the runtime services to reduce the cost of their expansion into the IR. Problems with exception handling and parameter passing/constant propagation would need dealing with. Possibly the cached translations would have different versions for particular constant arguments.

 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
David Grove - 19/Jul/07 08:19 AM
To put this in context, I did a quick measurement of opt compilation time on ia32-linux. What this feature would accomplish is to attempt to reduce compilation time spent in OPT_ExpandRuntimeServices at the cost of doing some fairly significant engineering to cache & patch in opt compiler IR. Historically, we've considered this several times, but have never been motivated enough by the potential reduction in compilation time to do the work. Things change as the system evolves, but just wanted to note that this has been considered in the past and not deemed to be worth the engineering it would take to accomplish it.

With the current system forcing compilation of every method in _213_javac we get:
at O0 expand runtime services is 2.7% of compilation time (we don't inline allocations at O0).
at O1 expand runtime services is 19.4% of compilation time
at O2 expand runtime services is 18.7% of compilation time (currently O1 is almost the same as O2).

Using the default (O1) optimization level, the command:
../rvmRoot/dist/production_ia32-linux/rvm -X:vm:measureCompilation=true -X:aos:enable_recompilation=false -X:aos:initial_compiler=opt SpecApplication -s100 -m1 -M1 -a _213_javac

yields:

Compilation Subsystem Report
Comp #Meths Time bcb/ms mcb/bcb MCKB BCKB
JNI 19 0.39 NA NA 4.5 NA
Base 115 7.09 1083.66 7.64 79.1 10.4
Opt 684 8015.46 11.91 7.76 596.2 76.8

Baseline Compiler SubSystem
Phase Time
(ms) (%ofTotal)
Compute GC Maps 1.78 28.28
OSR setup 0.02 0.24
Code generation 3.78 59.90
Encode GC/MC maps 0.73 11.57
TOTAL 6.31
Optimizing Compiler SubSystem
Phase Time
(ms) (%ofTotal)
Convert Bytecodes to HIR
Generate HIR 447 5.61%
AdjustBytecodeIndexes 0 0.00%
OSR_OsrPointConstructor 82 1.04%
Branch Optimizations 34 0.43%
Adjust Branch Probabilities 5 0.06%
TOTAL 570 7.14%
CFG Transformations
Tail Recursion Elimination 2 0.04%
Basic Block Frequency Estima
Build LST 46 0.58%
Estimate Block Frequenci 23 0.30% 25.57% Infrequent BBs
TOTAL 70 0.88%
Static Splitting 12 0.16%
Branch Optimizations 42 0.54%
TOTAL 128 1.61%
CFG Structural Analysis
Build LST 25 0.33%
Yield Point Insertion 2 0.04%
Estimate Block Frequencies 15 0.19% 25.21% Infrequent BBs
TOTAL 44 0.55%
Simple Opts 91 1.14%
Escape Transformations 56 0.71%
Branch Optimizations 29 0.37%
Local CopyProp 12 0.15% 25.18% Infrequent BBs
Local ConstantProp 7 0.09% 25.18% Infrequent BBs
Local CSE 39 0.50% 25.18% Infrequent BBs
Field Analysis 3 0.04%
Convert HIR to LIR
Expand Runtime Services 1552 19.44% 13.37% Infrequent RS calls
Branch Optimizations 88 1.11%
Local Cast Optimizations 9 0.12% 10.27% Infrequent BBs
HIR Operator Expansion 65 0.82%
Branch Optimizations 136 1.70%
Adjust Branch Probabilities 10 0.13%
TOTAL 1863 23.33%
Local CopyProp 32 0.41% 8.84% Infrequent BBs
Local ConstantProp 65 0.82% 8.84% Infrequent BBs
Local CSE 43 0.54% 8.87% Infrequent BBs
Simple Opts 203 2.55%
Basic Block Frequency Estimation
Build LST 73 0.92%
Estimate Block Frequencies 45 0.57% 30.13% Infrequent BBs
TOTAL 118 1.49%
Code Reordering 183 2.30%
Branch Optimizations 121 1.52%
Convert LIR to MIR
SplitBasicBlock 4 0.06%
Instruction Selection
Reduce Operators 8 0.11%
ConvertALUOps 153 1.93%
Normalize Constants 37 0.47%
Live Handlers 0 0.00%
DepGraph & BURS 1065 13.35% 30.77% Infrequent BBs
Complex Operators 47 0.59%
NullCheckCombining 29 0.37%
TOTAL 1342 16.82%
TOTAL 1347 16.88%
Register Mapping
MIR Range Splitting 21 0.27%
Expand Calling Convention 109 1.37%
Expand Calling Convention 0 0.00%
Live Analysis 352 4.42%
Register Allocation
Register Allocation Prep 90 1.13%
Linear Scan Composite Ph
Interval Analysis 205 2.57%
Register Restriction 171 2.15%
Linear Scan 969 12.14%
Update GCMaps 1 8 0.11%
Spill Code 468 5.87%
Update GCMaps 2 50 0.63%
Update OSRMaps 2 0.04%
TOTAL 1877 23.51%
TOTAL 1967 24.63%
Insert Prologue/Epilogue 82 1.03%
TOTAL 2533 31.73%
Branch Optimizations 61 0.77%
Generate Machine Code
Final MIR Expansion 38 0.49%
Assembler Driver 388 4.87%
TOTAL 427 5.35%

TOTAL COMPILATION TIME 7985


Peter Donald - 09/Aug/07 03:04 AM
It may be the case that the reduction in compilation time is not worth the engineering effort.
I would have assumed that by reducing the cost of inlining certain sequences (ie barriers and allocation) that they could always be inlined. Thus we see an increase in application performance with little to no increase in compilation overhead.

It would be nice to be able to annotate future runtime services by @CacheIR to decide which particular sequences are cached.