GeoServer

JVM crashes in a multithreaded load test against an ECW coverage

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Critical Critical
  • Resolution: Fixed
  • Affects Version/s: 2.0-RC1
  • Fix Version/s: 1.7.7, 2.0-RC2
  • Component/s: WMS
  • Labels:
    None
  • Number of attachments :
    4

Description

GeoServer trunk crashes badly and repeatably in a multithread test using 10 threads to access the usual Bluemarble ecw files.
With JDK 1.6.0_14 and imageio_ext 1.0.3 not even a .hprof file is dumped, it just crashes stating:

terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

I've tried running the same with GeoServer 1.7.x and it lasts a little longer, but eventually crashes as well.
I'm about to attach the GeoServer logs, the jmeter script, the .hprof generated by 1.7.x crashing

I've also tried to run the same benchmark against 1.7.5 to make sure it was not due to the rendering rotation patch. It failed the same way, with a hard JVM crash

  1. bluemarble_ecw.jmx
    24/Aug/09 4:56 AM
    28 kB
    Andrea Aime
  2. bluemarble.csv
    24/Aug/09 4:56 AM
    5 kB
    Andrea Aime
  3. geoserver.log
    24/Aug/09 4:56 AM
    40 kB
    Andrea Aime
  4. hs_err_pid32021.log
    24/Aug/09 4:56 AM
    54 kB
    Andrea Aime

Activity

Hide
Andrea Aime added a comment -

The file that led to this crash is the usual world-topo-bathy-200408-3x86400x43200.ecw, renamed to bluemarble_ecw.ecw to avoid problems with the URLs requesting it.
Simone, do you have that file? It's around 300MB, if you need it I can upload it somewhere

Show
Andrea Aime added a comment - The file that led to this crash is the usual world-topo-bathy-200408-3x86400x43200.ecw, renamed to bluemarble_ecw.ecw to avoid problems with the URLs requesting it. Simone, do you have that file? It's around 300MB, if you need it I can upload it somewhere
Hide
Andrea Aime added a comment -

Simone, it seems the above is happeninig only if the OS is in a "low" memory situation, that is, if the test is started with less than 1GB of free memory, with GS already up and whatnot. It seems the extra GB is needed to allow native memory allocations? The JVM usually crashes at load factor 10, it seems very odd that it's not possible to handle 10 concurrent requests with still a few hundreds megabytes of memory available...

Show
Andrea Aime added a comment - Simone, it seems the above is happeninig only if the OS is in a "low" memory situation, that is, if the test is started with less than 1GB of free memory, with GS already up and whatnot. It seems the extra GB is needed to allow native memory allocations? The JVM usually crashes at load factor 10, it seems very odd that it's not possible to handle 10 concurrent requests with still a few hundreds megabytes of memory available...
Hide
Simone Giannecchini added a comment -

Can you tell me a bit about the OS and the java options you are using?

Show
Simone Giannecchini added a comment - Can you tell me a bit about the OS and the java options you are using?
Hide
Andrea Aime added a comment -

Ubuntu 9.04, 32 bit, 2 cores, 3GB memory, no java options, thus it configures itself to run in server mode, with parallel collector, and access to ... half of the machine memory? Not sure.

Show
Andrea Aime added a comment - Ubuntu 9.04, 32 bit, 2 cores, 3GB memory, no java options, thus it configures itself to run in server mode, with parallel collector, and access to ... half of the machine memory? Not sure.
Hide
Andrea Aime added a comment -

Mind the problem can be reproduced only if the machine memory usage is high, close to swap storm.

Show
Andrea Aime added a comment - Mind the problem can be reproduced only if the machine memory usage is high, close to swap storm.
Hide
Andrea Aime added a comment -

This is apparently related to http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6463535
Sun bug... this is bad, they are notoriously slow at fixing bugs...

Show
Andrea Aime added a comment - This is apparently related to http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6463535 Sun bug... this is bad, they are notoriously slow at fixing bugs...
Hide
Andrea Aime added a comment -

This issue is reproducable on other machines and breaks the benchmarking tests we're doing for FOSS4G. Simone and Daniele are working on a patch (thanks a ton guys)

Show
Andrea Aime added a comment - This issue is reproducable on other machines and breaks the benchmarking tests we're doing for FOSS4G. Simone and Daniele are working on a patch (thanks a ton guys)
Hide
Andrea Aime added a comment -

Tests show that imageio-ext 1.0.4 solves at least the part of the problem found in the imageio-ext component (to actually survive a few benchmark runs one has also to disable jai and the jai imageio jpeg writer).
When is imageio-ext 1.0.4 coming out?

Show
Andrea Aime added a comment - Tests show that imageio-ext 1.0.4 solves at least the part of the problem found in the imageio-ext component (to actually survive a few benchmark runs one has also to disable jai and the jai imageio jpeg writer). When is imageio-ext 1.0.4 coming out?
Hide
Andrea Aime added a comment -

Upped the dependency to 1.0.4. Some issues are still there but they do seem to be out of our control.
Further investigations are needed in the future

Show
Andrea Aime added a comment - Upped the dependency to 1.0.4. Some issues are still there but they do seem to be out of our control. Further investigations are needed in the future

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: