Issue Details (XML | Word | Printable)

Key: RVM-431
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: David Grove
Reporter: Filip Pizlo
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
RVM

Fix processor JNI entry/collector rendezvous race

Created: 07/Apr/08 03:39 PM   Updated: 11/Apr/08 05:31 AM
Component/s: MMTk
Affects Version/s: 2.9.2
Fix Version/s: 2.9.3

Time Tracking:
Not Specified

File Attachments: 1. Text File fixjni.patch (11 kB)
2. Text File soo.txt (0.5 kB)


Patch Submitted: Yes


 Description  « Hide
Presently, a VM_Processor that is about to enter native code while the collector is performing the initial rendezvous may "disappear" - the collector will think it is still IN_JAVA and thus not remove it from the collection, while in the meantime the processor will go to IN_NATIVE. The collector will then proceed to wait for the processor to emerge from native code before proceeding further, but the processor might never do that (assuming arbitrarily nasty native code).

This fixed the problem by making the initial rendezvous simultaneously wait for processors to arrive while also checking that they did not go to IN_NATIVE.

This is a work in progress. DaCapo tests seem to be passing, but a proper study of whether or not this fix addresses known JBB issues has not yet been performed.



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Ian Rogers added a comment - 07/Apr/08 03:59 PM
Nice. Is there an example of this bug occurring in a regression report or is it something that shows up when running with many processors? Can you also complete a statement of origin [1]. Thanks!

[1] http://jikesrvm.org/Contributions


Filip Pizlo added a comment - 07/Apr/08 04:14 PM
I have seen the issue occur when running JBB with large numbers of processors. I've yet to run JBB stress tests, though, since I'm still working on getting all of DaCapo working on Purdue's setup (chart is not running because of GNU classpath link issues that are unrelated to this patch).

Filip Pizlo added a comment - 10/Apr/08 11:51 AM
This has been tested and seems to work (DaCapo passes on my available config, and JBB works as well). The patch now includes another bugfix - previously, if SynchronizationBarrier.removeProcessor() gets called, it's possible that it will either assert (under dev build) or corrupt the collector queue (under production build). In the latter case, the next GC attempt will deadlock. The fix is to have removeProcessor() wait until the collector thread for the given processor has already been dequeued.

David Grove added a comment - 11/Apr/08 05:31 AM
Patch applied in r14111.