Sorry for the delay in reply, didn't realise you had replied so quickly 
I was in two minds about caching Unsafe. When I was concerned about the serialisation, I noticed that the Javadocs for the API now specified readObject/writeObject so I checked the OpenJDK code and sure enough they've also switched to using an AtomicLong and cache the Unsafe needed to change it. I think caching is the safest bet across all VMs, although referencing the class is probably enough to do most of the work, based on the reference implementation (IIRC it's in the class initialisation).
With random, the method is already synchronised avoiding the race condition. Such a lock on the Class object's monitor is costly, I agree, but my concern was more that invoking e.g. Math.sin() or Math.floor() would instantiate a Random object which may never be used. Flicking through the JSR166 book again today, I see they provide what may be a desirable intermediate object (if a bit of a hack); they construct an inner class which holds the reference and which is then initialised lazily by the method. This avoids the synchronisation (as the VM gets a lock on the new class during initialisation anyway) but also means that the Random isn't created until needed e.g.
private final class RandomHelper
{
static final Random randomInstance;
}
Let me know what you think.
I've been looking into these issues further:
1. Making rand final would mean also losing the lazy initialization of this object. For most classes, this wouldn't be too much of an issue but most of Math's methods are orthogonal, so the result of changing this would mean the creation (and possible subsequent garbage collection) of an unused Random object. Is the cost of this worth the gain in making this field final?
2. I agree this can be patched to use AtomicLongs though I'm not sure how much the performance gain will be. I'd be interested to know if splitting the lock in the gaussian computation would be worthwhile – does the computation take a significant amount of time to compensate the additional overhead and complexity of doing this?