GeoServer
  1. GeoServer
  2. GEOS-4329

ConcurrentModificationException in GetFeature under heavy load

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.1-RC1
    • Fix Version/s: 2.0.3, 2.1-RC2
    • Component/s: None
    • Labels:
      None
    • Number of attachments :
      4

      Description

      Under heavy load of both GetFeature and DescribeFeatureType we get failures in GetFeature in roughly 10% of the GetFeature requests with the following stack trace:

      2011-01-26 08:50:22,030 ERROR [geoserver.ows] - 
      java.util.ConcurrentModificationException
      	at org.eclipse.emf.common.util.BasicEList$EIterator.checkModCount(BasicEList.java:1327)
      	at org.eclipse.emf.common.util.BasicEList$EIterator.next(BasicEList.java:1275)
      	at org.geotools.xml.Encoder.encode(Encoder.java:726)
      	at org.geotools.xml.Encoder.encode(Encoder.java:563)
      	at org.geoserver.wfs.xml.GML3OutputFormat.encode(GML3OutputFormat.java:253)
      	at org.geoserver.wfs.xml.GML3OutputFormat.write(GML3OutputFormat.java:241)
      	at org.geoserver.wfs.WFSGetFeatureOutputFormat.write(WFSGetFeatureOutputFormat.java:141)
      	at org.geoserver.ows.Dispatcher.response(Dispatcher.java:751)
      	at org.geoserver.ows.Dispatcher.handleRequestInternal(Dispatcher.java:233)
      	at org.springframework.web.servlet.mvc.AbstractController.handleRequest(AbstractController.java:153)
      	at org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:48)
      	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:875)
      	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:809)
      	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:571)
      	at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:501)
      

      This is probably related to the substitution group handling in gt-xsd as the problem occurrs while visiting substitution groups for _Feature.
      The patches committed on GeoTools at revision 35934,35936,35937 have been beneficial in this respect, as the same test against GS 2.0.2
      results in well over 90% failures instead (which do start with the same stack trace, but eventually lead to NPE as some null finds its way into
      the substitution group list), but I guess it's not enough since it's still possible to reproduce failures.

      I'm going to attach a JMeter script that makes the problem evident: it creates two groups of 40 threads that do run concurrently, one makes
      a lot of DFT (8000, as they are quick to process), the other one some GetFeature (800).

      1. layers.csv
        0.2 kB
        Andrea Aime
      2. substitution.patch
        3 kB
        Andrea Aime
      3. substitution.patch
        1 kB
        Andrea Aime
      4. wfs.jmx
        15 kB
        Andrea Aime

        Issue Links

          Activity

          Hide
          Andrea Aime added a comment -

          Here is the JMeter script and the csv file to drive it. It works against the standard GeoServer release directory

          Show
          Andrea Aime added a comment - Here is the JMeter script and the csv file to drive it. It works against the standard GeoServer release directory
          Hide
          Andrea Aime added a comment -

          Btw, running pure GetFeature load I had the impression GetFeature is a lot slower on trunk. Did not measure, but though I'd report it anyways

          Show
          Andrea Aime added a comment - Btw, running pure GetFeature load I had the impression GetFeature is a lot slower on trunk. Did not measure, but though I'd report it anyways
          Hide
          Andrea Aime added a comment -

          Hi Justin,
          this simple GeoTools patch seems to do the trick for me, I don't get exceptions anymore.
          Rationale of the patch:

          • first off, make a defensive copy of the substitution group so that iteration is safe (the case does not happen frequently and the inefficiencies in the Encoder are elsewhere)
          • check for null elements in the copied array: I guess they pop up when doing the internal array copy if something is removing items from the list in parallel, which likely nullifies the element to be removed and only after re-packs the list or updates the actual size field

          Btw, verified that GetFeature is a lot slower on trunk, something like 10 times. If the attached benchmark is changed to do only getfeature I get 164r/s on 2.0.x and 20 r/s on trunk. Going to open a new issue for that one

          Show
          Andrea Aime added a comment - Hi Justin, this simple GeoTools patch seems to do the trick for me, I don't get exceptions anymore. Rationale of the patch: first off, make a defensive copy of the substitution group so that iteration is safe (the case does not happen frequently and the inefficiencies in the Encoder are elsewhere) check for null elements in the copied array: I guess they pop up when doing the internal array copy if something is removing items from the list in parallel, which likely nullifies the element to be removed and only after re-packs the list or updates the actual size field Btw, verified that GetFeature is a lot slower on trunk, something like 10 times. If the attached benchmark is changed to do only getfeature I get 164r/s on 2.0.x and 20 r/s on trunk. Going to open a new issue for that one
          Hide
          Justin Deoliveira added a comment -

          Nice! Big +1 on the patch. I had reports from Jukka as well that the fixes for this last time around helped a lot but did not totally alleviate the issue. Thanks Andrea!

          Show
          Justin Deoliveira added a comment - Nice! Big +1 on the patch. I had reports from Jukka as well that the fixes for this last time around helped a lot but did not totally alleviate the issue. Thanks Andrea!
          Hide
          Andrea Aime added a comment -

          Mumble... was too good to be true... I still have the occasional exception on 2.0.x:

          
          java.lang.ArrayIndexOutOfBoundsException
                  at org.eclipse.emf.common.util.BasicEList.toArray(BasicEList.java:427)
                  at org.eclipse.emf.ecore.util.EcoreEList.toArray(EcoreEList.java:204)
                  at java.util.ArrayList.<init>(ArrayList.java:131)
                  at org.geotools.xml.Encoder.encode(Encoder.java:718)
                  at org.geotools.xml.Encoder.encode(Encoder.java:561)
          

          This happens as the EMF list is being copied into the array.... is there any way to avoid this? Some sort of atomic copy for EMF lists?

          Show
          Andrea Aime added a comment - Mumble... was too good to be true... I still have the occasional exception on 2.0.x: java.lang.ArrayIndexOutOfBoundsException at org.eclipse.emf.common.util.BasicEList.toArray(BasicEList.java:427) at org.eclipse.emf.ecore.util.EcoreEList.toArray(EcoreEList.java:204) at java.util.ArrayList.<init>(ArrayList.java:131) at org.geotools.xml.Encoder.encode(Encoder.java:718) at org.geotools.xml.Encoder.encode(Encoder.java:561) This happens as the EMF list is being copied into the array.... is there any way to avoid this? Some sort of atomic copy for EMF lists?
          Hide
          Andrea Aime added a comment -

          And here is a "too ugly to be true" patch that actually seems to work. In case the defensive copy fails due to an array index out of bound exception, it will just try to make the copy again. Working? Yes. Ugly? Don't get me started...

          Show
          Andrea Aime added a comment - And here is a "too ugly to be true" patch that actually seems to work. In case the defensive copy fails due to an array index out of bound exception, it will just try to make the copy again. Working? Yes. Ugly? Don't get me started...
          Hide
          Justin Deoliveira added a comment -

          Hmmm... well ugly problems usually require ugly solutions I guess

          I don't know of any sort of atomic copy for EMF lists. Long story short they are not really designed to be used in the way we are using them. I think it is time to do the EMF upgrade I have been putting off and see if any of these issues are solved.

          Another thing that might help the situation is to update SubstitutionGroupLeakPreventer to do an copy of the list, then update it and then setting it back in one transaction.

          That said +1 on this patch if it works.

          Show
          Justin Deoliveira added a comment - Hmmm... well ugly problems usually require ugly solutions I guess I don't know of any sort of atomic copy for EMF lists. Long story short they are not really designed to be used in the way we are using them. I think it is time to do the EMF upgrade I have been putting off and see if any of these issues are solved. Another thing that might help the situation is to update SubstitutionGroupLeakPreventer to do an copy of the list, then update it and then setting it back in one transaction. That said +1 on this patch if it works.
          Hide
          Andrea Aime added a comment -

          /me closes eyes and presses commit

          Show
          Andrea Aime added a comment - /me closes eyes and presses commit
          Hide
          Andrea Aime added a comment -

          Fixed in GeoTools on 2.6.x and trunk.
          Btw, yeah, would be nice to see if the new EMF fares better. Once trunk is open there should be a good window of opportunity to try out those modifications

          Show
          Andrea Aime added a comment - Fixed in GeoTools on 2.6.x and trunk. Btw, yeah, would be nice to see if the new EMF fares better. Once trunk is open there should be a good window of opportunity to try out those modifications
          Hide
          Andrea Aime added a comment -

          Mass closing all issues that have been in the resolved state for the last month without anyone commenting or reopening them

          Show
          Andrea Aime added a comment - Mass closing all issues that have been in the resolved state for the last month without anyone commenting or reopening them

            People

            • Assignee:
              Justin Deoliveira
              Reporter:
              Andrea Aime
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: