I dug into OpenJDK code and found that the contention is actually happening after the JCA subsystem has a Provider in hand, when requesting a Service from which it can create MD5 digest objects. The search for the correct Service happens under lock in the getService method shown in the bug description. That tells me the simplest way to avoid the contention (and all the searching) is to do what I did originally: clone an unused instance of MessageDigest for each new use.
Revisiting that yields mixed results. It definitely speeds up the uncontended case, but the numbers for contended cases vary wildly...probably because it's spinning up a bunch of short-lived threads and joining them. The thread overhead may be clouding these numbers.
Are you at liberty to test this? I could provide a patch that replaces MessageDigest.getInstance calls with cloning for the common digests and you could see how it looks on your system.