Posted by mlam
on November 22, 2006 at 5:52 PM PST
For a JavaME VM implementation, how much performance is enough performance? This article explains why some optimizations don't make sense.
This article continues with esoteric knowledge about the phoneME Advanced VM and the JavaME space that developers will need.
If you've looked at the phoneME Advanced VM source code, you'll see that a lot of the names of functions and data structures are prefixed with CVM. CVM is the informal name of Sun's CDC VM, and prefixing labels (especially for global functions and data structures) with CVM is a standard coding convention in this VM code base. This is probably common knowledge to most people who already work with Sun's CDC technology, but I thought I'd mention it anyway in case. Plus, now I can simply refer to CVM directly instead of having to say phoneME Advanced VM.
So, on to this entry's topic ...
Usually, no user will ever complain if you offer them more performance in their software. However, performance comes with a price. Usually, it means more complex code that makes better use of the hardware. That can mean a higher memory footprint may be needed to run the software. For a JavaME VM which is targetted for resource constrained embedded devices, this is definitely a great concern. Hence, any performance work needs to be justified against its cost. What this means is that platform developers can't just go wild with every optimization trick in the book that they know.
Having said that, I want you to know that I am not saying this because CVM's performance is anything to be embarrassed about. As far as we know, CVM is one of the fastest VM in this space, if not the fastest. To give you an idea of CVM's performance, a few years back, we benchmarked it against JavaSE 1.3 client VM on a subset of SPEC JVM98 . We had to use a subset because SPEC JVM98 uses deprecated APIs which have been removed from CDC. Hence, we had to do an internal "port" of the benchmark for this comparison. The comparison was done on a PowerPC PowerMac and a Solaris SPARC machine. CVM came out to be around 80-90% of the performance with only 10% of the static footprint in comparison with JavaSE. You should know that this is old data. JavaSE has improved significantly since, and so has CVM. Note: I'm only sharing about this comparison to give you an idea of the level of performance that can be achieved in JavaME. I'm not saying anything about which VM is better. That would be like comparing apples and oranges. More on that later.
So, when we talk about performance, one of the VM's component that people think of first is the dynamic adaptive compiler, also commonly know as the JIT. Below, I will talk about some performance issues around compilation. I will also touch on other areas / topics that are not JIT related but are important as well.Static Compilers vs JITs
One classic mistake that engineers make is to start implementing optimizations from their compiler text book in the JIT without due consideration. While some of those optimization techniques are meaningful, some are not. For one thing, the text books usually teach about static compilers. JITs, on the other hand, are dynamic compilers. Factor in JavaME's resource constrained requirements, and you'll have an even greater contrast in their attributes. Here's a comparison of the two technologies:
| Static compiler
|| JavaME JIT compiler
Can afford a lot of working memory to do the compilation work.
Must minimize / limit amount of working memory used.
Can afford more time / CPU cycles to do compilation work.
Must minimize / limit consumption of CPU cycles.
Typically assumes all methods are available to the compiler.
Works with only a subset.
Typically compiles all methods.
Compiles only hot methods.
Must be able to compile all types of code.
Only need / want to compile commonly used types.|
Let the interpreter handle the uncommon cases.
Note how static compilers may assume the availability of more memory and CPU resources. Hence, their compilation techniques may have similar assumptions. Obviously, that means that some of those techniques may not be suitable for JavaME.
Note also that the Java platform is a dynamic environment where it is normal to expect some code to be downloaded at runtime. Very late binding of code is expected in the Java VM and language specifications. This doesn't match the static compiler's assumption that the entire application code base will be available as input to the compilation process.
The JIT's ability to let the interpreter handle execution of uncommon cases also reduces on resource consumption (in both compiled code and compiler footprint) for compiling code which is not critical to performance.
Critics may say that the claims in my table above are based on broad generalizations that may not be true of some state of the art static compilers today. Why, yes, I am. For one, I am assuming static compilation also comes with static linking. But bear in mind that your compiler text books will probably not cover the state of the art either. I am also using a strict definition of static compilation i.e. I'm expecting it to compile static code. Granted that real-world implementations may have added capabilities that deal with dynamic code (which may be downloaded), but those aren't strictly static compilers anymore. The point here is that you should not apply classic compiler techniques blindly. Those techniques are usually targetted and optimized for a different kind of system (one that does not necessarily resemble the Java platform), and hence may not be suitable here.
Another misconception that people may have is that code generated by static compilers will be faster than JITted code. This is not always true. In some cases, JITted code will actually out-perform staticly compiled code. The key reason for this lies in the fact that the Java platform is dynamic and that late binding occurs. I'll leave the details of that discussion for another day (not a short discussion either).
Hence, there are many reasons why static compilation techniques may not be suitable, even when discounting the resource constrain issue and performance is all that you care about.
JavaSE Hotspot vs CVM
OK, so we can't just pick tricks from a compiler book. How about tricks from the JavaSE VM then? The answer is also "maybe". First of all, there is the resource constrain issue. JavaSE is tuned for significantly larger systems with a lot more resources. It is entirely reasonable and expected that they will make use of those resources to give you the best performance for your money. But when the resources are not available on your device, those techniques may be a no-go.
Another point that may not be obvious to the average developer is that a JavaME implementation (like CVM) is not just a smaller JavaSE. The type of devices that JavaSE targets are different beasts than those of JavaME. CVM is not smaller that JavaSE's Hotspot only because it has lesser functionality. CVM was architected with different design goals in mind to enabled it to work well in embedded devices. At each level of its design, a different choice was made for the speed-space tradeoff. For this reason, techniques used in JavaSE may not apply in CVM because they are tuned for a different tradeoff.
To give you a concrete example of how JavaME devices differ from JavaSE, some time back, a colleague of mine from the JavaSE side discovered that when he applied a certain technique to improve cache locality, he was able to get a performance gain of about 20% in one benchmark. This caught my attention. 20% is nothing to sneeze at. So, I applied the technique in CVM. To my surprise, that same benchmark yielded a jaw-dropping 70% gain in performance. What happened? The difference was that the JavaSE run was on a server class machine where the amount of cache was huge (possibly in the hundreds of KBs or maybe even a few MBs). I was running mine on an ARM device with only a 32KB cache. The improved cache locality had a greater impact here.
Hey, but doesn't that demonstrates the exact opposite, that it's good to import JavaSE techniques into CVM? Well, in this case, it worked out. But what if the JavaSE technique was one that made use of the fact that the target device will have a large cache, and optimized code to take advantage of that? Such a technique applied to CVM may actually cause a significant degradation in performance due to JavaME devices not having that expected amount of cache.
Hence, the point is that we should not import techniques from JavaSE blindly either. Note that the above illustration also shows that a JavaSE VM may not actually run faster than CVM if it was run on a JavaME device, even if you give it enough RAM (but not system cache) to fit. It would be like trying to power your car with a rocket engine. It sounds like a good idea, but your fuel system won't be able to handle it. And the result is not a faster car, but something that may not move at all.
JavaME is not just a smaller JavaSE. It is a different beast. This is why comparing JavaSE and JavaME may be like comparing apples and oranges.
But, in the end, the best way to know if a certain optimization will work is to try it out. We have often done that ourselves in the past. There have been ideas that failed, and were not incorporated into the code base. One important criteria for determining whether the optimization will be incorporated is, of course, how much it costs in terms of resource comsumption. Another is how much performance it buys you.
To measure performance gains, you will need to run benchmarks of some sort. One common mistake that people make is to run micro-benchmarks that only test the one area that is improved by the optimization. The issue here is that real world applications would probably not just sit in a tight loop and exercise that one area of code all day. Hence, benchmarks that are based on real world applications are more reliable as performance indicators. For JavaME, we like SPEC JVM98 , but as indicated earlier, it won't run on CDC without modification due to deprecated methods. Another one that we like is GrinderBench by EEMBC.
If possible, try to run your benchmark on a JavaME type device. As indicated above, JavaME is different from JavaSE. Benchmarking your changes on a JavaSE type desktop / server machine will give you an indication of the result your changes will yield, but it's not necessarily the same results you will get on a JavaME device. Exercise proper engineering discretion.
In general, the performance of a Java platform is not only dictated by its execution engine (the interpreter, the JIT). The quality of the VM runtime and class libraries also play a big part. Sometimes, they are even more important than the VM. We have seen this typically in graphics/GUI applications that spend a majority of their time in native code (as opposed to Java code). However, this doesn't mean that there's no work for the VM to do here. Strictly speaking, the VM runtime libraries are part of the VM implementation, and will need to be optimal as well.
Also, the VM can provide mechanisms which can help the class libraries perform better. They need to coorperate together. It's not a one or the other thing.
Lastly, there's the thing about native code. Some people think that their code will always run faster if they implement all the major pieces as native code. This is actually a fallacy. For various reasons, using native code can actually result in worse performance than implementing some or all functionality in Java bytecodes.
All these will be discussed in detail at a later date.
What does this mean to you?
Performance is a complex topic. We are only scratching the surface here. And as I tried to point out above, things aren't always what they seem. When trying to implement performance enhancements for a JavaME system, it is prudent to always think in the mind-frame of an embedded system developer. Each optimization technique needs to be evaluated individually for its viability in this space.
Have a nice day. :-)