Details
-
Type:
Bug
-
Status:
Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: UDIG 1.2.0
-
Fix Version/s: None
-
Component/s: application
-
Labels:None
-
Environment:eclipse=3.6
java.version=1.6.0_20
java.vendor=Sun Microsystems Inc.
BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=nb_NO
Framework arguments: -product net.refractions.udig.product
-
Patch attached:Yes
Description
Do the following to reproduce deadlock:
1. Run udig.product
2. Add any layer to a map
3. Ensure that the preference "Use Tiled Rendering System" is checked
4. Press "Full Extent" command rapidly until uDig freezes
The root cause of this bug is the same as the UDIG-1726 issue; the tiled rendering system does not always honor the SWT multi-threading policy. Tile.getSWTImage() and Tile.disposeSWTImage() methods in the net.refractions.udig.project.render.Tile class access the SWT subsystem by invoking methods on the SWT image, that in turn access methods in the org.eclipse.swt.internal.gtk.OS class. If the (main) user-interface thread also concurrently access a method in the OS class that requires a lock, a deadlock will occur once a synchronized statement is invoked in the Tile class. Yikes! This is hard to explain concisely. Just reproduce the bug and see for your self by inspecting the stack trace in debug mode. You will see that the threads that are deadlocked both have method(s) that locks in the OS class in the call stack. One of the two threads should be the user-interface (main) thread.
I believe that the root cause described above has been confirmed by applying the patch described next. This patch ensures that the SWT image instance in the Tile instance is only accessed asynchronously from the user-interface thread by wrapping access with the Display.asyncExec() method. However, this required that a Display instance is readily available, which it is not. Since the Tile instance does not have any information about the user-interface thread, the following "hack" is used to get it:
iterate over all threads until Display.findDisplay(Thread) returns a non-null reference to a Display instance
This method has some shortcomings though. I believe that the tiled rendering system should always honor the SWT multi-threading policy from the top down, by ensuring that all calls that might result in invoking methods in the OS class that requires a lock, as the Title.getSWTImage() and Title.disposeSWTImage() methods do, is run on the user-interface thread using the Display.asyncExec() method. The hack I have used, is based on the following assumptions: 1. only one user-interface thread exists the thread-group of the invoking thread (reasonable I think) 2. if found, it is assumed that the thread is alive as long as the application is running, thus once found, the instance is returned directly thereafter 3. if not found, it is safe to invoke Title.getSWTImage() and Title.disposeSWTImage() methods from the calling thread I hope that somebody can confirm that this issue exist on other platforms than Linux, and that my hack actually solves the problem.
The hack above is attached as a patch.
Note that the patch is not tested on the win32, carbon or cocoa platforms as I currently only have time to do uDig development on the GTK platform. Before the patch is committed to branches and trunk, members in the uDig team currently developing on win32 and cocoa should confirm that the bug does not exist, and that the patch does not introduce unwanted behavior on these platforms. I have tested the patch on linux and it works without any problems. I'll run through the project test procedures now to confirm that the patch is OK on the GTK platform.