Charlie Hunt on Java Performance Monitoring and Tuning
Steven Haines: What are the most common Java performance issues that you've seen?
Charlie Hunt: The issues generally fall into a couple different categories:
- At the application source code level, poor choice of algorithms or data structures. Better algorithms & data structures almost always offer the biggest performance return on investment.
- Unnecessary object allocation, but more importantly, unnecessary object retention. In short, high object retention is challenging for any JVM's GC to handle well and offer an application good performance.
- Use of un-buffered I/O.
- A poorly tuned JVM.
- High lock contention, which leads to scalability bottlenecks.
- Data structures re-sizing. Those data structures using arrays as a backing store.
Essentially, there's content in some form or another in the Java Performance that talks to all of the above issues and offers approaches to addressing them.
SH: Can you describe the various techniques for monitoring application performance? Is monitoring different from profiling?
CH: First, yes monitoring is quite different from profiling. In Java Performance I offer a rather detailed description of each. In short, monitoring tends to be non-intrusive on application performance, tends to be more broad than profiling, is often times done in a production environment, and is generally used to identify potential problem and present symptoms of a potential performance issue. Profiling, in contrast, can be more intrusive on application performance, tends not to be done in production, also tends to be more focused than monitoring, and prior to profiling you usually have some idea of what you are looking for in a profiling task based on what you've observed via monitoring.
In the context of a Java application, it's a common practice to monitor statistics at the operating system level, JVM level, and even at the application level. So, you need tools to collect those statistics at each level. In addition, when stakeholders complain of application performance, it's also common to monitor an application in both an online mode and offline mode. It's also common, and recommended, to collect performance statistics and evaluate them in an offline mode, even when stakeholders are not complaining of performance issues. Evaluating those performance statistics, in either an online mode or offline mode, offers clues or symptoms as to the performance issue.
SH: What has Oracle done in the Hotspot JVM to increase its performance?
CH: One way to think about this is to compare and contrast the content of Wilson and Kesselman's Java Platform Performance (Prentice Hall, 2000) book's content versus the content found in the just published Java Performance book. I think you'll find that that there's quite a bit that has changed (and improved) between their publications. I also think it's interesting to think about what were the most common Java performance issues "then versus now."
SH: Sometimes people use the terms "performance" and "scalability" interchangeably. What is the difference? And how do I measure performance and how do I measure scalability?
CH: I interpret performance to be a more abstract term than scalability. For instance, performance can mean any one of the following (could take on additional meanings in different contexts):
- performance throughput
- performance latency or responsiveness
- memory footprint
- start-up time
- scalability
- In the context of a Java application, performance could also capture the notion of the amount of elapsed time until the application reaches peak performance.
Each of the above are measured in different ways.
The recent popularity of multi-core processors and systems with multiple CPU sockets has brought scalability to the forefront as one of the most common performance issues.
Scalability is all about an application's ability to take on (or service) additional load while maintaining the same throughput and/or latency.
SH: Are there any common performance-tuning options (low hanging fruit) that most applications can benefit from?
CH: I wouldn't necessarily say there are common performance-tuning options, but rather there are some common principles that usually offer a Java application to realize better performance. Several of these from a JVM tuning perspective are presented in the Tuning the JVM, Step-by-Step chapter of the book.
In the way of JVM command line options, my recommendation to folks has always been to justify why you want to use a given JVM command line tuning option. If you can, justify its use, other than merely saying you saw the command line option being used on some other application and it appeared to help that application. This justification doesn't fly with me. Every application is different. What works well for command line options does not necessarily mean it will work well for a different application.
I would also say that choosing a given tuning option usually has its advantages and consequences. For example, when you look at throughput, latency, and footprint, you generally sacrifice something in one of those in favor of one or both of the others. As to which ones are most important depends on the application and what the application stakeholders believe is most important.
SH: How is performance management different now, with large-scale cloud-based applications running on thousands of machines, than it was a few years ago with standard n-tier applications? How is such an environment monitored?
CH: Actually, I find "cloud" to be such an overloaded term. But, I think I understand what you're asking.
As software and application deployments become more complex, so do the tools to monitor those application deployments. However, the same needs exist to monitor operating system statistics, JVM statistics and application level statistics. The one area that's impacted the most is the application level statistics monitoring. But, in general the application level statistics still tend to measure the same types of things such as end-to-end response times. It's just that a given operation may span multiple machines, or multiple "somethings" in a cloud environment (it's a bit more abstract now). But, in the end, the application statistics still tend to measure the same or similar things — end-to-end response time for example. However, you might see additional instrumentation to identify critical transition points or phases of a given operation.
In addition, it's also desirable to correlate operating system statistics, JVM statistics, and application level statistics to see if one impacts the other(s). So, as applications are moved to a cloud-based environment, the need/want remains to correlate these statistics. It just often times it becomes a little harder to do so.
In other words, the statistics of interest still remain, but being able to collect the statistics and correlate the information usually becomes a little more complex/difficult.
Charlie Hunt is the JVM performance lead engineer at Oracle. He is responsible for improving the performance of the HotSpot JVM and Java SE class libraries. He has also been involved in improving the performance of the Oracle GlassFish and Oracle WebLogic Server. A regular JavaOne speaker on Java performance, he also co-authored Java Performance and NetBeans™ IDE Field Guide.