GeoTools
  1. GeoTools
  2. GEOT-2811

Nondeterministic bug loading reprojected shapefile

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 2.5.7
    • Fix Version/s: None
    • Component/s: data
    • Labels:
      None
    • Environment:
      Ubuntu, java version "1.6.0_0"
      IcedTea6 1.3.1 (6b12-0ubuntu6.4) Runtime Environment (build 1.6.0_0-b12)
      OpenJDK 64-Bit Server VM (build 1.6.0_0-b12, mixed mode)
      x86_64

      Also tested on MacOS 10.5.8 with Java 1.6.0_15.
    • Testcase included:
      yes

      Description

      Our test generates 77,000 random line strings and stores them to a shapefile in EPSG 2263. Then it loads them out of the shape file, reprojecting to EPSG 4326 as it goes. It puts each coordinate of each linestring into a HashSet. Then it loads the coordinates a second time, and checks to see if they are in the set. Sometimes, they are not, in which case, the program prints "missing" and then the coordinate.

      For a real fun time, try doing 2605, instead of 77,000 – sometimes it will work and sometimes it won't. 1000 always succeeds (never prints "missing") for us.

      We believe that the problem is a sporadic loss of precision (we believe this based on tests with our own data, a modified version of New York City's LION data).

        Activity

        Hide
        Andrea Aime added a comment -
        The issue is unfortunately outside of my control.

        If you want the problem to never trigger you can disable the JIT compiler by adding:
         -Djava.compiler=NONE
        to your VM startup params. Of course you'll pay a severe slowdown.

        What is happening is that after a certain number of iterations the JIT will determine that the reprojection code is a hotspot and will compile it into native code. The slightly different set of CPU instructions executed will result in a tiny difference on the reprojection results, e.g.:

        {code}
        3c3
        < 2 - MULTILINESTRING ((40.704800572517435 -74.01887014647102, 40.706097856736946 -74.01921052492746))
        ---
        > 2 - MULTILINESTRING ((40.70480057251743 -74.01887014647102, 40.706097856736946 -74.01921052492746))
        8c8
        < 7 - MULTILINESTRING ((40.70617211882219 -74.01791596175205, 40.706345981824306 -74.0178546931773, 40.70651595008431 -74.01777434330114, 40.7066802119895 -74.01767566865388, 40.70683712029383 -74.01755975085365))
        ---
        > 7 - MULTILINESTRING ((40.70617211882219 -74.01791596175205, 40.7063459818243 -74.01785469317728, 40.70651595008431 -74.01777434330114, 40.7066802119895 -74.01767566865388, 40.70683712029383 -74.01755975085365))
        {code}

        A GIS software should never assume coordinates to match perfectly for a couple of reasons:
        * the data you're reading always have a survey precision, that is usually in the order of meters or tens of centimeters. So you cannot expect, in general, for two roads endpoints to actually coincide (whether that happens or not it's an accident of how the data was digitized on the computer)
        * compututations (such as JTS overlays) might result in tiny shift of the points coordinates. For example JTS has a concept of "precision model" to deal with that
        * junit itself does not allow one to compare two doubles for equality without specifying a precision for the equality

        Long story short, you should be rounding the coordinates to a certain known precision before comparing them for equality (or otherwise assume two coordinates are the same if they differ for less than a certain "epsilon" value)
        Show
        Andrea Aime added a comment - The issue is unfortunately outside of my control. If you want the problem to never trigger you can disable the JIT compiler by adding:  -Djava.compiler=NONE to your VM startup params. Of course you'll pay a severe slowdown. What is happening is that after a certain number of iterations the JIT will determine that the reprojection code is a hotspot and will compile it into native code. The slightly different set of CPU instructions executed will result in a tiny difference on the reprojection results, e.g.: {code} 3c3 < 2 - MULTILINESTRING ((40.704800572517435 -74.01887014647102, 40.706097856736946 -74.01921052492746)) --- > 2 - MULTILINESTRING ((40.70480057251743 -74.01887014647102, 40.706097856736946 -74.01921052492746)) 8c8 < 7 - MULTILINESTRING ((40.70617211882219 -74.01791596175205, 40.706345981824306 -74.0178546931773, 40.70651595008431 -74.01777434330114, 40.7066802119895 -74.01767566865388, 40.70683712029383 -74.01755975085365)) --- > 7 - MULTILINESTRING ((40.70617211882219 -74.01791596175205, 40.7063459818243 -74.01785469317728, 40.70651595008431 -74.01777434330114, 40.7066802119895 -74.01767566865388, 40.70683712029383 -74.01755975085365)) {code} A GIS software should never assume coordinates to match perfectly for a couple of reasons: * the data you're reading always have a survey precision, that is usually in the order of meters or tens of centimeters. So you cannot expect, in general, for two roads endpoints to actually coincide (whether that happens or not it's an accident of how the data was digitized on the computer) * compututations (such as JTS overlays) might result in tiny shift of the points coordinates. For example JTS has a concept of "precision model" to deal with that * junit itself does not allow one to compare two doubles for equality without specifying a precision for the equality Long story short, you should be rounding the coordinates to a certain known precision before comparing them for equality (or otherwise assume two coordinates are the same if they differ for less than a certain "epsilon" value)

          People

          • Assignee:
            Andrea Aime
            Reporter:
            David Turner
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: