|
I'd like to get the single word object header available again too. I've heard that it may no longer be as promising a model as it used to be, in particular as more modern libraries are trying to have more locks that lock data structures at a finer granularity.
I've started a refactor that hopefully by the end of today (GMT So far this has allowed me to separate out a lot of the constants and variables so, for example, those relevant to maintaining a counter based hash code are only held in their appropriate class. Where I'm going with this is to enable the configuration of detecting the TIB offset in a bidirectional object layout in a manner that is more configurable than what I've seen in previous implementations. One way of determining the TIB offset in a bidirectional model is to set a low bit in the status word. As pointers from the TIB are aligned, the low bit will never be set. So finding a word in the object with a set low bit, shows we've reached the header. There are however many other possible solutions, for example, placing a count of the number of references at the start of an object. I'll stick a link to the branch when I've uploaded the code. Thanks, Ian. Bit packing the status word is subtle. Optimal packing requires a global knowledge of how the bits are going to be used (especially bits that are mutable).
For example, the 2 bits of hash code state in the address based hashing model are intentionally not in the same byte as the bits we give out to GC. The reason for this is that one of the possible generational write barriers wants to set a bit in the object header to indicate that the object has been remembered (reduces duplicate remset entries). It wants to do this with a normal byte store (it's ok to be racy and have multiple threads set this bit...). However, we can't allow a race between the remembered set bit being set and the hashcode bits being changed. There are lineraizations that cause us to lose a change in hashcode state if the remembered set bit and the hash code bits are in the same byte, unless we use atomic operations in the GC barrier, which is generally not the right tradeoff. Sounds like a really unlikely bug, but it happens in practice (nothing like a 3Ghz SMP machine to make a one in a million bug bit you in short order The GC barrier is far more frequent that hash code checks, so the GC gets the bottom byte. In the non-address-based-hasing model, it's ok for the hashcode bits to be in the GC byte, because the hashcode bits are immutable (set during construction). |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
First, a high priority is to re-implement/de-cruft the previous pluggable object model, so that we can once again experiment with a single word object model, for example, while the interface from the rest of the VM remains abstract. My feeling is that a single word object model may ultimately be what we want as our default. We need to (re)implement it first, though.
Second, we've had automatic/optimal GC bit assignment as a long-standing goal. More recently a few of us MMTk people have started leaning toward the idea of having simply a single byte for GC (or, in the case of RefCount etc, a single byte plus some number of words). The motivation for using a single byte for all GCs is three-fold, a) simplicity, b) the potential for efficiency gains, and c) because of the difficulty of identifying and using an optimal number of bits.
The hard to understand bit is c). The problem of establishing the optimal number of bits is not as straightforward as it may seem at first blush. Here's a quick sketch of why:
. The bits required by any particular Plan are a function of the union of the requirements of the set of spaces it draws on.
. The properties associated with bits required by any give space have multiple dimensions:
. Having established the above for each of the spaces, one then has the bin-packing problem.
Having pondered the above on and off over the years, I think the MMTk people are leaning away from a fully automatic/optimal scheme. Instead we could simple allocate a byte in each object header to MMTk (which wastes a few bits in some cases, but simplifies and improves performance). The MMTk authors are then left to do the mental arithmetic of how the bits are shared among spaces. This seems tractable, simple and efficient although places the burden on the MMTk implementors as each new plan is defined, which is not as nice as a fully automated scheme. If someone wants to try to implement an optimal scheme, all power to them. It seems quite do-able, just not trivial (or necessarily very rewarding) :-D
Note that this relates directly to the recently created RVM-287.