|
With the adaptive system, we take this yieldpoints more frequently than I suspect other JVMs do to support online profiling (ie, not just to initiate a GC cycle). It's an interesting idea, but I'd be a little surprised if this actually was going to be a performance win for us. Nothing wrong with prototyping it if someone is interested, just observing that perhaps one should do a little measurement of how frequently we take yieldpoints and then do a back-of-the-envelope calculation of trap costs vs. saving an L1 cache hitting load and a conditional branch at every yieldpoint.
|
||||||||||||||||||||||||||||||||||||||||||
so to get an idea of how we deal with the same situation look at:
http://rvm.codehaus.org/docs/api/org/jikesrvm/compilers/opt/ia32/OPT_FinalMIRExpansion-source.html#line.537
so rather than an opcode with one reg/mem operand we do compare memory
and constant then a branch. The important case (I believe) will be in
loop back edges. Using the trick saves you one operation per loop, it
also avoids having too many branches within 16bytes of each other
(Intel's branch predictor can only handle 2 iirc). On Intel we don't set
to take a yield point with a timer tick, so CBS and GC are the two
places to modify the takeYieldpoint value in VM_Processor. It seems
reasonable that we could refactor the code in green threads to map/unmap
a page rather than use takeYieldpoint, but we'd probably want both
implementations for flexibility's sake. One final thing to note is that
the tweak could use a non-temporal load on Intel to avoid any cache
pollution problems.