Details
Description
Daniel recently fixed a problem caused by this following the scheduler refactoring. That fix avoided the write barrier. The original tracker (1147477) went:
I actually looked at a couple of stackdumps from the failing generational images today and convinced myself that the root cause of the failures is that we are executing write barriers in uninterruptible code. This is bad because in JMTk a write barrier overflow can trigger a gc, which is exactly what uninterruptible code is trying to avoid in many cases. This is very similar to the stack overflow-check in uninterruptible code problem that we found and killed years ago. Write barrier overflow was never an issue with the watson collectors because they cheated and kept their write buffers in the C heap (so they could grow w/o triggering a gc). Notice that a few of the problem write barriers are in JMTk itself, most are in other parts of the VM.
The particular crashes I was looking at looked like a write barrier overflow from a write in VM_Processor.dispatch. I
generated a list of the offenders (attached) in a prototype (BaseBaseGenMS) image by tweaking the code in
VM_BaselineCompiler to consider a putfield of a reference type to be a violation of uninterruptiblity.
It seems to me that the possible fixes are:
(1) declare that putfield of reference types in uninterruptible code are programming errors and rewrite the code to avoid them.
(2) allow these putfields, but don't write barrier them
(3) allow them, but call a different write barrier routine that either uses slack in the barrier or grows the buffer w/o triggering GC by stealing space from an emergency slack space.
Comment copied from original tracker:
Steve B suggested the fix was to use an asynch triggering of GC on write buffer "near-overflow"
Perry observed latter that this probably is going to force us to modify the compilers to inject yieldpoints to ensure that no more than a fixed number of pointer-writes occur before the next yieldpoint.
On the plus side, Perry also observed that if we can make write barriers a non-GC point, then we could eliminate a large number of points that currently have GC maps (thus reducing space impact of generational GC on machine code maps).