Affects Version/s: None
Fix Version/s: None
Component/s: Instruction Architecture: Intel
Number of attachments :
Our current register usage for Intel in baseline compiled code is:
EAX, ECX, EDX = temporaries
EBX = Edge counters array
ESI = Processor register
EDI = JTOC register
EBP = unused
ESP = operand stack pointer and pointer to next available scratch area of memory
Using the stack pointer to address the operand stack causes us to perform things like:
add ESP, 8
The add becomes a 1 byte opcode with the extended opcode in the r field of the ModRM. The ESP should become part of the one ModRM byte, but the ESP register is special so it actually a break out case to indicate a following SIB byte (this is because it's expected no-one would ever use ESP to hold an array index). So in the end we need an extra SIB byte, with the 8bit immediate the instruction is already 4 bytes long (twice the size of 2 pop instructions). Intel's (not AMD's) optimization guide also advises not to access memory via ESP in a sequence of push and pops.
The frame base pointer is held in a field of the VM_Processor. To support stack back traces in GDB we either need to support unwind tables or to hold the frame base pointer in EBP.
In the PowerPC baseline compiler we hold the frame base pointer and don't adjust it after each bytecode. This may be performance beneficial for Intel. As would be holding say just the first local in a register.
The JTOC register is unnecessary as the JTOC doesn't move, we already don't have a JTOC register in the optimizing compiler.
There is a research archive for an x86 64 port of the baseline compiler. For x86 64 we need to recognize registers become 64bits long (as we do in the PowerPC baseline compiler) and support in the assembler the emitting of the REX prefix for registers r8 to r15. We can likely incorporate the REX prefix enhancements to the assembler now. I believe we should rewrite the baseline compiler using the PowerPC baseline compiler as a template. We should revise the use of registers.
This has to be done with an eye on performance as selecting less optimal code sequences on Intel can be size and performance expensive.