JRuby (please use github issues at http://bugs.jruby.org)
  1. JRuby (please use github issues at http://bugs.jruby.org)
  2. JRUBY-5813

FFI::AutoPointer occasionally calls releaser proc when GC'd, even if autorelease is set to false

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: JRuby 1.6.1
    • Fix Version/s: JRuby 1.6.3, JRuby 1.7.0.pre1
    • Component/s: Extensions
    • Labels:
      None
    • Environment:
      Snow Leopard 10.6.7 with Java 1.6.0_24-b07-334
    • Patch Submitted:
      Yes
    • Number of attachments :
      0

      Description

      The summary is that FFI::AutoPointer will occasionally call the release proc even after autorelease=false has been executed. When it happens, this can cause serious issues including segfaults for clients that rely on AutoPointer to direct their memory management. This is due to an assumption made by AutoPointer about the operation of the garbage collector, and it seems that assumption is not always true.

      I found this while working with the ffi-geos gem (https://github.com/dark-panda/ffi-geos), which exercises AutoPointer quite a bit.

      FFI::AutoPointer provides an "autorelease" mechanism by which a proc can be called when the AutoPointer is garbage collected. (ffi-geos uses this mechanism to release the memory held by the pointer.) AutoPointer also provides a mechanism whereby the "autorelease" can be enabled and disabled. (ffi-geos uses this mechanism because it has to handle certain use cases where another object "takes ownership" of the pointer and so the AutoPointer is no longer responsible for releasing the memory. That is, it sometimes calls AutoPointer#autorelease=false, and expects that, if the AutoPointer becomes unreachable after that point, the release proc will not be called.)

      AutoPointer implements this by using a PhantomReference called the "Reaper". When the AutoPointer becomes phantom-reachable, the Reaper, if present and reachable, runs the release proc. The autorelease enable/disable switch is implemented by adding/removing the Reaper to a global collection. When the Reaper is in the global collection, it is strongly referenced and thus queued for execution when the AutoPointer becomes phantom-reachable. When the Reaper is not in the global collection, it becomes unreachable along with its associated AutoPointer and therefore is not queued for execution.

      At least, this seems to have been the intent. In actuality, I have observed rare but definite instances when the Reaper gets executed anyway even though it had (long before) been removed from the global collection. This manifests as a spurious call to release proc even though autorelease had been set to false. In the case of ffi-geos, this results in a segmentation fault because the associated memory gets released twice.

      My belief is that this occurs because we cannot fully predict (and it is not specified) in which order the garbage collector computes object reachability and queues references. AutoPointer is depending on the Reaper being marked as unreachable before its corresponding AutoPointer becomes phantom-reachable and populates its phantom reference queue. This seems to be true most but not all the time.

      I've provided a small patch that appears to fix the issue:
      https://github.com/dazuma/jruby/commit/19f0dcde0570e21e4c7772208a5489093e5502f3

      It basically puts an additional guard around the running of the release proc. In the case where autorelease had been set to false (e.g. the Reaper was not in the global collection), but the Reaper ran anyway, it checks to make sure the Reaper actually was present in the global collection before running the release proc. Because the Reaper already attempts to remove itself from the global collection when it is executed, that check was already being computed, and so the overhead added by actually using its result is vanishingly small.

      This is a highly intermittent issue so I have no test case to submit. I can reproduce this a fraction of the time (maybe about 20-30% of the time) by running the test suite for the rgeo gem (the github head at https://github.com/dazuma/rgeo) in the presence of the ffi-geos gem (0.0.1.beta3). I am confident that the above patch fixes the issue because I have run the test several dozen times with the patch and have not observed any failures, whereas, without the patch, I can get the test suite to segfault around 20-30% of the time.

        Activity

        Hide
        Daniel Azuma added a comment -

        Got feedback from wmeissner, who suggested creating an explicit "autorelease" ivar in the Reaper to make it more obvious what is going on.

        Posted an updated patch: https://github.com/dazuma/jruby/commit/0cb5bccdb750750caba4301a3cf5ffa23a02cd43
        And issued a pull request: https://github.com/jruby/jruby/pull/36

        Show
        Daniel Azuma added a comment - Got feedback from wmeissner, who suggested creating an explicit "autorelease" ivar in the Reaper to make it more obvious what is going on. Posted an updated patch: https://github.com/dazuma/jruby/commit/0cb5bccdb750750caba4301a3cf5ffa23a02cd43 And issued a pull request: https://github.com/jruby/jruby/pull/36
        Hide
        Wayne Meissner added a comment -

        This should definitely be pulled into 1.6.3.

        I had a look at the pull request, looks fine to me, so anyone can merge it, as I probably won't have time for a while.

        Show
        Wayne Meissner added a comment - This should definitely be pulled into 1.6.3. I had a look at the pull request, looks fine to me, so anyone can merge it, as I probably won't have time for a while.
        Hide
        Hiro Asari added a comment -

        I pulled Daniel's pull request into master, and cherry-picked it into the 1.6 branch (https://github.com/jruby/jruby/commit/aec80f2d511ff66beb7746f8daed79a88f4a8310)

        Show
        Hiro Asari added a comment - I pulled Daniel's pull request into master, and cherry-picked it into the 1.6 branch ( https://github.com/jruby/jruby/commit/aec80f2d511ff66beb7746f8daed79a88f4a8310 )

          People

          • Assignee:
            Wayne Meissner
            Reporter:
            Daniel Azuma
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: