Details
-
Type:
Improvement
-
Status:
Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: 1000
-
Component/s: Compiler: Optimizing
-
Labels:None
-
Number of attachments :
Description
The magic call to store a primitive at an offset is sub-optimal on IA32. For example, from MMTk's Log:
VM.barriers.setArrayNoBarrier(buffer, bufferIndex++, c);
becomes:
9 int_shl t127i(I) = l124i(I), 1
10 int_2addrze.ext t128a(Lorg/vmmagic/unboxed/Offset
= t127i(I)
14 short_store 10, l123a([C), t128a(Lorg/vmmagic/unboxed/Offset
, <unused>, <unused>
which becomes:
14 ia32_mov EAX([C) = <[EAX(Lorg/mmtk/utility/Log;)]+-4>DW (<mem loc: Lorg/mmtk/utility/Log;.buffer>, t450sv(GUARD))
10 ia32_lea EDX(Lorg/vmmagic/unboxed/Offset
= <0+[EDX(I)*2]>DW
14 ia32_mov <[EAX([C)]+[EDX(Lorg/vmmagic/unboxed/Offset;)]>W = 10
which could more optimally be:
14 ia32_mov EAX([C) = <[EAX(Lorg/mmtk/utility/Log;)]+-4>DW (<mem loc: Lorg/mmtk/utility/Log;.buffer>, t450sv(GUARD))
14 ia32_mov <[EAX([C)]+[EDX(Lorg/vmmagic/unboxed/Offset;)*2]>W = 10
I believe the easiest way to implement this would be by magic array stores which can directly generate ALOADs in OPT_GenerateMagic.
Makes sense.
We intentionally decided to keep arrayloads as ALOADS in IA32 LIR to avoid having to recognize the address calculation expression trees in the instruction selection (and to prevent the expression trees from being "optimized" into forms we couldn't put back together) for Java level array loads. Seems like having a magic API that allows the same thing to happen for runtime code is the right way to handle this. Recognizing these as arrayloads would baggage to the instruction selection and LIR optimizations that we decided to avoid in the non-magic case.
Need to wait to see what MMTk people think before proceeding with an implementation though.