RVM
  1. RVM
  2. RVM-290

Refactor object model so that status word (int) bits are used more intelligently

    Details

    • Type: Wish Wish
    • Status: Open Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1000
    • Component/s: MMTk, Runtime: Object Model
    • Labels:
      None
    • Number of attachments :
      0

      Description

      Hopefully the following is a statement of fact:

      Currently the status word for an object looks like either:

      TTTT TTTT TTTT TTTT TTTT TTHH AAAA AAAA (normal)

      or

      TTTT TTTT TTTT TTTT TTTT HHHH HHHH HHAA (GC trace)

      where:
      T = thin lock bits
      H = hash code state bits
      A = available for use by GCHeader and/or MiscHeader.

      the thin lock bits hold:
      1 bit for the thin lock header, for lock reservation one could imagine a 2nd bit being used
      14 bits to hold the thread ID of the thread holding the lock for the object
      any remaining bits to hold a recursion count for how many times a thread holds the lock for an object

      if the thin lock doesn't work the lock header bit says the contents of the bits is the heavy lock ID.

      The 2bit hashcode state has one of the following values:
      0 - unhashed - no one has tried to get the hash code of this object
      1 - hashed - some one has tried to get the hash code of the object
      2 - hashed & moved - we've moved the object and therefore had to make a copy of the hash code to enable us to compute the same hash code value in the future

      The 10bit hashcode is just a counter that wraps around.

      The available bits are used by MMTk and the number of bits used varies. Typically 2 bits are used for tracking the space an object is in and for marking. There is support for 4 bits for marking and 7bits may be used for reference counting. The number of bits used can vary due to properties loaded during boot image writing.

      The object model supports either bit or byte accesses to the status word.

      Here are my feelings about the way we do this layout (and may have been warped by me misunderstanding the code):

      1) When we use available bits we nearly always do it at the granularity of a word.
      2) One can imagine that reading and writing a byte is more efficient than masking... a word. The only place this seems to happen is MarkSweepSpace. We may well want some optimization so that byte values can be read/written, for example iterating over every status word setting the available byte to 0. This doesn't happen currently.
      3) If bits are unused in the header then we are reducing the number of threads we support, the recursion count for holding a thread, the bits used in the counter variant of the hash code.
      4) the GC trace header that uses a counter doesn't support garbage collectors requiring >2 bits such as reference counting
      5) our current "available bits" implementation in the object model is based around the worst case expectations of MMTk and as such is conservative
      6) as a lot of the code that uses the object model is trying as much as possible to use constants, there are boot strap issues.

      I think we should fix the object model so that each GC can release the optimal number of bits for other uses in the status word. We need to make this configurable and fault when errors occur.

      We eventually can play games with the object layout to see if there are architectural sweet spots. For example, the hash code bits could be tested by an 8 bit immediate mask if they weren't at bits 9 and 10.

        Issue Links

          Activity

          Hide
          Steve Blackburn added a comment -

          This general issue is one we've been thinking about for a long time. Opening a JIRA to discuss the issues and options is a good idea (see RVM-287 for a recent, related issue).

          First, a high priority is to re-implement/de-cruft the previous pluggable object model, so that we can once again experiment with a single word object model, for example, while the interface from the rest of the VM remains abstract. My feeling is that a single word object model may ultimately be what we want as our default. We need to (re)implement it first, though.

          Second, we've had automatic/optimal GC bit assignment as a long-standing goal. More recently a few of us MMTk people have started leaning toward the idea of having simply a single byte for GC (or, in the case of RefCount etc, a single byte plus some number of words). The motivation for using a single byte for all GCs is three-fold, a) simplicity, b) the potential for efficiency gains, and c) because of the difficulty of identifying and using an optimal number of bits.

          The hard to understand bit is c). The problem of establishing the optimal number of bits is not as straightforward as it may seem at first blush. Here's a quick sketch of why:

          . The bits required by any particular Plan are a function of the union of the requirements of the set of spaces it draws on.

          • What needs to happen is for each plan to establish the requirements of each of its spaces and then optimally assign bits to each of them (largely bin packing)
          • This needs to happen statically so that the various constants etc can be assigned at build time.
            . The properties associated with bits required by any give space have multiple dimensions:
          • number of bits required
          • number of bits required exclusively at mutator time (eg for a write barrier)
          • number of bits required exclusively at GC time
          • number of bits which are private to this space
          • number of bits which may be shared among spaces
            . Having established the above for each of the spaces, one then has the bin-packing problem.
          • much of the above can be packed very efficiently (many spaces have private requirements, so those bits can be reused by all spaces (since any given object instance exists in strictly one space, so strictly one space's requirements matter)).

          Having pondered the above on and off over the years, I think the MMTk people are leaning away from a fully automatic/optimal scheme. Instead we could simple allocate a byte in each object header to MMTk (which wastes a few bits in some cases, but simplifies and improves performance). The MMTk authors are then left to do the mental arithmetic of how the bits are shared among spaces. This seems tractable, simple and efficient although places the burden on the MMTk implementors as each new plan is defined, which is not as nice as a fully automated scheme. If someone wants to try to implement an optimal scheme, all power to them. It seems quite do-able, just not trivial (or necessarily very rewarding) :-D

          Note that this relates directly to the recently created RVM-287.

          Show
          Steve Blackburn added a comment - This general issue is one we've been thinking about for a long time. Opening a JIRA to discuss the issues and options is a good idea (see RVM-287 for a recent, related issue). First, a high priority is to re-implement/de-cruft the previous pluggable object model, so that we can once again experiment with a single word object model, for example, while the interface from the rest of the VM remains abstract. My feeling is that a single word object model may ultimately be what we want as our default. We need to (re)implement it first, though. Second, we've had automatic/optimal GC bit assignment as a long-standing goal. More recently a few of us MMTk people have started leaning toward the idea of having simply a single byte for GC (or, in the case of RefCount etc, a single byte plus some number of words). The motivation for using a single byte for all GCs is three-fold, a) simplicity, b) the potential for efficiency gains, and c) because of the difficulty of identifying and using an optimal number of bits. The hard to understand bit is c). The problem of establishing the optimal number of bits is not as straightforward as it may seem at first blush. Here's a quick sketch of why: . The bits required by any particular Plan are a function of the union of the requirements of the set of spaces it draws on. What needs to happen is for each plan to establish the requirements of each of its spaces and then optimally assign bits to each of them (largely bin packing) This needs to happen statically so that the various constants etc can be assigned at build time. . The properties associated with bits required by any give space have multiple dimensions: number of bits required number of bits required exclusively at mutator time (eg for a write barrier) number of bits required exclusively at GC time number of bits which are private to this space number of bits which may be shared among spaces . Having established the above for each of the spaces, one then has the bin-packing problem. much of the above can be packed very efficiently (many spaces have private requirements, so those bits can be reused by all spaces (since any given object instance exists in strictly one space, so strictly one space's requirements matter)). Having pondered the above on and off over the years, I think the MMTk people are leaning away from a fully automatic/optimal scheme. Instead we could simple allocate a byte in each object header to MMTk (which wastes a few bits in some cases, but simplifies and improves performance). The MMTk authors are then left to do the mental arithmetic of how the bits are shared among spaces. This seems tractable, simple and efficient although places the burden on the MMTk implementors as each new plan is defined, which is not as nice as a fully automated scheme. If someone wants to try to implement an optimal scheme, all power to them. It seems quite do-able, just not trivial (or necessarily very rewarding) :-D Note that this relates directly to the recently created RVM-287 .
          Hide
          Ian Rogers added a comment -

          I'd like to get the single word object header available again too. I've heard that it may no longer be as promising a model as it used to be, in particular as more modern libraries are trying to have more locks that lock data structures at a finer granularity.

          I've started a refactor that hopefully by the end of today (GMT ) I can get some feedback on. My aim is to make all of the object model header and status word manipulations an abstract interface with varying implementations that we then make concrete in a singleton object held in org.jikesrvm.objectmodel.ObjectModel.

          So far this has allowed me to separate out a lot of the constants and variables so, for example, those relevant to maintaining a counter based hash code are only held in their appropriate class.

          Where I'm going with this is to enable the configuration of detecting the TIB offset in a bidirectional object layout in a manner that is more configurable than what I've seen in previous implementations. One way of determining the TIB offset in a bidirectional model is to set a low bit in the status word. As pointers from the TIB are aligned, the low bit will never be set. So finding a word in the object with a set low bit, shows we've reached the header. There are however many other possible solutions, for example, placing a count of the number of references at the start of an object.

          I'll stick a link to the branch when I've uploaded the code. Thanks, Ian.

          Show
          Ian Rogers added a comment - I'd like to get the single word object header available again too. I've heard that it may no longer be as promising a model as it used to be, in particular as more modern libraries are trying to have more locks that lock data structures at a finer granularity. I've started a refactor that hopefully by the end of today (GMT ) I can get some feedback on. My aim is to make all of the object model header and status word manipulations an abstract interface with varying implementations that we then make concrete in a singleton object held in org.jikesrvm.objectmodel.ObjectModel. So far this has allowed me to separate out a lot of the constants and variables so, for example, those relevant to maintaining a counter based hash code are only held in their appropriate class. Where I'm going with this is to enable the configuration of detecting the TIB offset in a bidirectional object layout in a manner that is more configurable than what I've seen in previous implementations. One way of determining the TIB offset in a bidirectional model is to set a low bit in the status word. As pointers from the TIB are aligned, the low bit will never be set. So finding a word in the object with a set low bit, shows we've reached the header. There are however many other possible solutions, for example, placing a count of the number of references at the start of an object. I'll stick a link to the branch when I've uploaded the code. Thanks, Ian.
          Hide
          David Grove added a comment - - edited

          Bit packing the status word is subtle. Optimal packing requires a global knowledge of how the bits are going to be used (especially bits that are mutable).

          For example, the 2 bits of hash code state in the address based hashing model are intentionally not in the same byte as the bits we give out to GC. The reason for this is that one of the possible generational write barriers wants to set a bit in the object header to indicate that the object has been remembered (reduces duplicate remset entries). It wants to do this with a normal byte store (it's ok to be racy and have multiple threads set this bit...). However, we can't allow a race between the remembered set bit being set and the hashcode bits being changed. There are lineraizations that cause us to lose a change in hashcode state if the remembered set bit and the hash code bits are in the same byte, unless we use atomic operations in the GC barrier, which is generally not the right tradeoff. Sounds like a really unlikely bug, but it happens in practice (nothing like a 3Ghz SMP machine to make a one in a million bug bit you in short order .

          The GC barrier is far more frequent that hash code checks, so the GC gets the bottom byte.

          In the non-address-based-hasing model, it's ok for the hashcode bits to be in the GC byte, because the hashcode bits are immutable (set during construction).

          Show
          David Grove added a comment - - edited Bit packing the status word is subtle. Optimal packing requires a global knowledge of how the bits are going to be used (especially bits that are mutable). For example, the 2 bits of hash code state in the address based hashing model are intentionally not in the same byte as the bits we give out to GC. The reason for this is that one of the possible generational write barriers wants to set a bit in the object header to indicate that the object has been remembered (reduces duplicate remset entries). It wants to do this with a normal byte store (it's ok to be racy and have multiple threads set this bit...). However, we can't allow a race between the remembered set bit being set and the hashcode bits being changed. There are lineraizations that cause us to lose a change in hashcode state if the remembered set bit and the hash code bits are in the same byte, unless we use atomic operations in the GC barrier, which is generally not the right tradeoff. Sounds like a really unlikely bug, but it happens in practice (nothing like a 3Ghz SMP machine to make a one in a million bug bit you in short order . The GC barrier is far more frequent that hash code checks, so the GC gets the bottom byte. In the non-address-based-hasing model, it's ok for the hashcode bits to be in the GC byte, because the hashcode bits are immutable (set during construction).

            People

            • Assignee:
              Unassigned
              Reporter:
              Ian Rogers
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated: