detecting GC gone bad
Hi, we have a system with several hundred java processes running 7/24. Every now and then one of the processes gets into a bad state, where it's mostly doing GC. Sometimes the process will quickly get the OutOfMemory error, and sometimes it just stays in this state; preventing the process from doing any useful work (ie, it's hosed at that point).
What i'd like to do is auto-detect the latter situation (we can detect when OutOfMemory gets hit); ie, when the process is spending too much time doing GC and not enough time doing the work it's supposed to be doing.
We are using jdk1.5, and have been looking at some of the new features. Eg, we can use GarbageCollectorMXBean, which tells us how much time it's spending doing GC. We can then compare GC time vs. real time, and compare the percentage to some threshold and alert.
However, i was wondering if anybody had any other ideas/thoughts in this area.