RVM

Fix processor JNI entry/collector rendezvous race

Details

  • Type: Improvement Improvement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 2.9.2
  • Fix Version/s: 2.9.3
  • Component/s: MMTk
  • Labels:
    None
  • Patch Submitted:
    Yes
  • Number of attachments :
    2

Description

Presently, a VM_Processor that is about to enter native code while the collector is performing the initial rendezvous may "disappear" - the collector will think it is still IN_JAVA and thus not remove it from the collection, while in the meantime the processor will go to IN_NATIVE. The collector will then proceed to wait for the processor to emerge from native code before proceeding further, but the processor might never do that (assuming arbitrarily nasty native code).

This fixed the problem by making the initial rendezvous simultaneously wait for processors to arrive while also checking that they did not go to IN_NATIVE.

This is a work in progress. DaCapo tests seem to be passing, but a proper study of whether or not this fix addresses known JBB issues has not yet been performed.

  1. fixjni.patch
    10/Apr/08 11:48 AM
    11 kB
    Filip Pizlo
  2. soo.txt
    07/Apr/08 4:12 PM
    0.5 kB
    Filip Pizlo

Activity

Hide
Ian Rogers added a comment -

Nice. Is there an example of this bug occurring in a regression report or is it something that shows up when running with many processors? Can you also complete a statement of origin [1]. Thanks!

[1] http://jikesrvm.org/Contributions

Show
Ian Rogers added a comment - Nice. Is there an example of this bug occurring in a regression report or is it something that shows up when running with many processors? Can you also complete a statement of origin [1]. Thanks! [1] http://jikesrvm.org/Contributions
Hide
Filip Pizlo added a comment -

I have seen the issue occur when running JBB with large numbers of processors. I've yet to run JBB stress tests, though, since I'm still working on getting all of DaCapo working on Purdue's setup (chart is not running because of GNU classpath link issues that are unrelated to this patch).

Show
Filip Pizlo added a comment - I have seen the issue occur when running JBB with large numbers of processors. I've yet to run JBB stress tests, though, since I'm still working on getting all of DaCapo working on Purdue's setup (chart is not running because of GNU classpath link issues that are unrelated to this patch).
Hide
Filip Pizlo added a comment -

This has been tested and seems to work (DaCapo passes on my available config, and JBB works as well). The patch now includes another bugfix - previously, if SynchronizationBarrier.removeProcessor() gets called, it's possible that it will either assert (under dev build) or corrupt the collector queue (under production build). In the latter case, the next GC attempt will deadlock. The fix is to have removeProcessor() wait until the collector thread for the given processor has already been dequeued.

Show
Filip Pizlo added a comment - This has been tested and seems to work (DaCapo passes on my available config, and JBB works as well). The patch now includes another bugfix - previously, if SynchronizationBarrier.removeProcessor() gets called, it's possible that it will either assert (under dev build) or corrupt the collector queue (under production build). In the latter case, the next GC attempt will deadlock. The fix is to have removeProcessor() wait until the collector thread for the given processor has already been dequeued.
Hide
David Grove added a comment -

Patch applied in r14111.

Show
David Grove added a comment - Patch applied in r14111.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: