Details
-
Type:
Wish
-
Status:
Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: 1000
-
Component/s: MMTk, Runtime: Object Model
-
Labels:None
-
Number of attachments :
Description
Hopefully the following is a statement of fact:
Currently the status word for an object looks like either:
TTTT TTTT TTTT TTTT TTTT TTHH AAAA AAAA (normal)
or
TTTT TTTT TTTT TTTT TTTT HHHH HHHH HHAA (GC trace)
where:
T = thin lock bits
H = hash code state bits
A = available for use by GCHeader and/or MiscHeader.
the thin lock bits hold:
1 bit for the thin lock header, for lock reservation one could imagine a 2nd bit being used
14 bits to hold the thread ID of the thread holding the lock for the object
any remaining bits to hold a recursion count for how many times a thread holds the lock for an object
if the thin lock doesn't work the lock header bit says the contents of the bits is the heavy lock ID.
The 2bit hashcode state has one of the following values:
0 - unhashed - no one has tried to get the hash code of this object
1 - hashed - some one has tried to get the hash code of the object
2 - hashed & moved - we've moved the object and therefore had to make a copy of the hash code to enable us to compute the same hash code value in the future
The 10bit hashcode is just a counter that wraps around.
The available bits are used by MMTk and the number of bits used varies. Typically 2 bits are used for tracking the space an object is in and for marking. There is support for 4 bits for marking and 7bits may be used for reference counting. The number of bits used can vary due to properties loaded during boot image writing.
The object model supports either bit or byte accesses to the status word.
Here are my feelings about the way we do this layout (and may have been warped by me misunderstanding the code):
1) When we use available bits we nearly always do it at the granularity of a word.
2) One can imagine that reading and writing a byte is more efficient than masking... a word. The only place this seems to happen is MarkSweepSpace. We may well want some optimization so that byte values can be read/written, for example iterating over every status word setting the available byte to 0. This doesn't happen currently.
3) If bits are unused in the header then we are reducing the number of threads we support, the recursion count for holding a thread, the bits used in the counter variant of the hash code.
4) the GC trace header that uses a counter doesn't support garbage collectors requiring >2 bits such as reference counting
5) our current "available bits" implementation in the object model is based around the worst case expectations of MMTk and as such is conservative
6) as a lot of the code that uses the object model is trying as much as possible to use constants, there are boot strap issues.
I think we should fix the object model so that each GC can release the optimal number of bits for other uses in the status word. We need to make this configurable and fault when errors occur.
We eventually can play games with the object layout to see if there are architectural sweet spots. For example, the hash code bits could be tested by an 8 bit immediate mask if they weren't at bits 9 and 10.
This general issue is one we've been thinking about for a long time. Opening a JIRA to discuss the issues and options is a good idea (see RVM-287 for a recent, related issue).
First, a high priority is to re-implement/de-cruft the previous pluggable object model, so that we can once again experiment with a single word object model, for example, while the interface from the rest of the VM remains abstract. My feeling is that a single word object model may ultimately be what we want as our default. We need to (re)implement it first, though.
Second, we've had automatic/optimal GC bit assignment as a long-standing goal. More recently a few of us MMTk people have started leaning toward the idea of having simply a single byte for GC (or, in the case of RefCount etc, a single byte plus some number of words). The motivation for using a single byte for all GCs is three-fold, a) simplicity, b) the potential for efficiency gains, and c) because of the difficulty of identifying and using an optimal number of bits.
The hard to understand bit is c). The problem of establishing the optimal number of bits is not as straightforward as it may seem at first blush. Here's a quick sketch of why:
. The bits required by any particular Plan are a function of the union of the requirements of the set of spaces it draws on.
. The properties associated with bits required by any give space have multiple dimensions:
. Having established the above for each of the spaces, one then has the bin-packing problem.
Having pondered the above on and off over the years, I think the MMTk people are leaning away from a fully automatic/optimal scheme. Instead we could simple allocate a byte in each object header to MMTk (which wastes a few bits in some cases, but simplifies and improves performance). The MMTk authors are then left to do the mental arithmetic of how the bits are shared among spaces. This seems tractable, simple and efficient although places the burden on the MMTk implementors as each new plan is defined, which is not as nice as a fully automated scheme. If someone wants to try to implement an optimal scheme, all power to them. It seems quite do-able, just not trivial (or necessarily very rewarding) :-D
Note that this relates directly to the recently created RVM-287.