Octavian is correct that the shutdown hook is a source of heap leak whenever Log4j might be initialized multiple times in the one JVM lifetime, such as would occur with a servlet context reloading. We are registering a shutdown hook thread, which references the custom appender and attached appenders, preventing them from being gc()ed until JVM shutdown is underway (too late). Each Log4j reconfigure will result in a further shutdown hook being registered, increasing the heap.
Rather than making shutdown hook registration conditional (agree this fixes the webapp case, with additional configuration to remember), could we just ensure we remove (Runtime.removeShutdownHook(Thread)) the hook during close() of the Appender (lifecycle defined in AppenderSkeleton actually). The shutdown hook is superfluous after this time since it exists only to call close() if not already called. I realize that in a properly-managed servlet application the shutdown hook is going to be irrelevant, but provided it is removed it is harmless. The unregister hook approach means that webapp users don't need to remember additional configuration, and also that non-webapp users that initialize log4j repeatedly won't suffer the same heap growth.
Pull request: https://github.com/perf4j/perf4j/pull/11 .
The description and some of the observations in the original logging of this issue may be a separate/unrelated consideration. My recollection of Tomcat (and this is going way back to 4.x, 5.x days so may have changed) is that it has a hard-coded, non-configurable/non-ignorable timer it sets when tearing down a context, and context listener destroy methods and any self-managed thread cleanup must occur within that time, otherwise undefined behaviours occur. After the timeout, Tomcat invalidates the context's classloader, allowing it and things under it to be gc()ed (unless we hold a reference e.g. in a shutdown hook per above). My recollection is that this timeout was 10 seconds, which perhaps by coincidence this is the same (also hard-coded) wait time for shutdown of the queue draining thread GenericAsyncCoalescingStatisticsAppender stopDrainingThread(). So for a full/large queue perhaps the queue drain timeout will occur but only after Tomcat has started removing the context. The stack trace above is indicative of this sort of problem.
So for that part, we should raise a new issue to make the timeout configurable, see if that improves things.