Posted by kcpeppe
on January 9, 2012 at 2:51 AM PST
I mentioned the idea to use Wordle as an execution profiler while presenting the profiling section of my performance tuning course in Paris last December. The idea was seeded by presentation that Neal Ford did a few years ago in which he used Wordle to expose the vocabulary of a Java application. Instead of vocabulary, I wanted to see if Wordle could be used to visualize an applications dominate behavior.
I mentioned the idea to use Wordle as an execution profiler while presenting the profiling section of my performance tuning course in Paris last December. The idea was seeded by presentation that Neal Ford did a few years ago in which he used Wordle to expose the vocabulary of a Java application. Neal took a pile of source code and fed it into Wordle. The result was a word cloud that visualized the dominate words in the source code. What I was interested in was exploring how I could use Wordle to visualize an applications dominate activity (i.e., the bottleneck).
In profiling we setup a mechanism to repeatedly ask, “What are you doing now? What are you doing now, What are you doing now?” . Profilers aggregate and count the answers to this question. It is the treatment of the answers that allows us to identify hotspots (dominate activities) in our application. Wordle aggregates and counts words. It uses those counts to build a word cloud. So it seems as though it should work as long as we feed is the right set of words. In this case, the words should come from the answers to the repeated question, “What are you doing now?”. To answer that question, we can use thread dumps.
At this point you should be thinking, “Thread dumps to profile? Are you daft? The impact on performance will be horrendous!”. I completely agree, done haphazardly, continuously taking thread dumps from a running application would inflict a huge performance penalty on that application. But what if we didn’t need all that data to figure out what was going on? Instead, what if we switched to “slow profiling” where we took a thread dump every 5 minutes minimizing it’s impact on performance? I recently used this idea to create a Java agent called ConditionalThreadDumper which I then launched into a production system. The impact on performance was horrendously unnoticeable.
The ConditionalThreadDumper polled the system on a regular basis looking to see if it was in a particular (bad) condition. For example, we might look at CPU utilization and another performance metric to see if they have exceeded some level of utilization. If so, it would turn on a mechanism to created a thread dump every x minutes. If the system worked it’s way out of the condition, the thread dumper would be turned off. Launching in this agent into production resulted in several thousand data points over a period of a couple of days. Not a lot when you consider an execution profiler can collect that many samples in a few minutes. However, this was running in production and it was only running when the system was in trouble. So while there wasn’t a lot of data, it was significantly more relevant to the problem at hand, as in aggregate, these thread dumps tell a story of the dominate activity was in the badly behaving application.
Blasting thousands of raw thread dumps into Wordle simply won’t work so I hacked together a scanner that harvested stack traces of interest from the thread dumps. What I was looking for were application threads (in this case Tomcat threads, which are easily identifiable). After collecting and cleaning all of the stack trace entries for the Tomcat threads I ended up with more than 350,000 lines of data. The user interface for Wordle made inserting all of this data impractical so I modified the code to aggregate and count identical entries. This reduced that dataset to just under 90,000 lines.
I fed the top 100 or so items into Wordle and it came back with a clear but useless answer. The dominate activity was java.lang.Object.wait. Since I already knew that the thread pool was oversized it wasn’t a surprise to see that the dominate activity were threads blocked on a lock (see Figure 1).
- waiting on (a org.apache.tomcat.util.net.AprEndpoint$Worker):44586
Figure 1. Dominating Activities
I filtred java.lang.Object.wait and five other related and uninteresting entries from the data and then fed the top 100 or so back into Wordle. Not surprisingly, the output from Wordle demonstrated what we knew to be true from initial work with "real" performance tooling. The dominate activity was interactions with the JDO product, Kodo (kodo.kernel.BrokerImpl.find) r. Not far behind, was a proxy using reflection (java.lang.reflect.Method.invoke). This is quite visible in Figure 2.is
Figure 2. Wordle output showing dominate activities
So there you have it, how to use Wordle as a wall clock time, method exclusive execution profiler.
If you are interested in learning more about performance tuning using more traditional tooling and techniques, please join me in one of my open enrollment performance tuning seminars. I have a few scheduled for both Europe and North America. I'm also very excited to announce that I'm working with Exceed in Australia. I'll be there for two weeks. My first will be presenint the workshop at an inhouse offering. The second week, Feb 6th is for my first full open enrollment performance tuning workshop down under. This will also be a rare opportunity for me to meetup with people in Oz.