Skip to main content

under the hood

11 replies [Last post]
iwadasn
Offline
Joined: 2004-11-09

Personally, I can't stand language changes. The language should be a simple as possible, certainly no more complicated than it is already.

API changes are somewhat better, want an API for everythign under the sun, fine.

Now lets talk about under the hood. Here are a few things I'd like to see.

1) Escape analysis. Use escape analysis to stack allocate unescaping objects and eleminate contention on them.

2) Object inlining. Use object inlining to combine a wholly contained object into its container.

3) Class cast elimination, this is closely related to generics. The javac can tag a class with additional information stating the type of the generic, or the VM can try to infer it at runtime, not sure which way is better. In either case, try to eliminate as many casts as possible.

4) Full use of auto-vectoring. The VM needs to fully utilize the vector units available in modern processors. This is difficult to do, but it would give java a performance advantage that is VERY difficult to achieve with native code. (Raise your hand if you think MS Word can use your SSE units, but of course netbeans would be able to just by virtue of being java.).

5) Isolation API. The VMs should share all executable code, including loaded bytecode and compiled bytecode, but not share static variables. This is already about halfway done with 1.5, and just needs to be finished.

If these things were done, they would have a few impacts.

1) Substantially increased performance, perhaps 10-20% or more across the board.

2) Decreased memory footprint. Sharing more data and cleaning up memory more efficiently (including having fewer object headers due to inlining) would help the memory footprint, especially when multiple VMs are running.

3) Increase scalability. Being able to stack allocate, and inline would allow the GC to manage fewer objects and reduce the need for "stop the world" garbage collections that quickly become a bottleneck with multiple GB of ram on multi processor systems.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
peterkessler
Offline
Joined: 2004-10-27

Of course, it depends on your application. We have some data on startup times for some applications on some operating systems on some hardware platforms. We'd love to have a representative sample of applications, but we have limited resources to collect them, run them with every build, etc. What kinds of applications do you write? Command line applications, Swing applications, Java web start applications, browser applets, or what?

What you should do is take your application(s) and try them with both JDK-1.4.2 and JDK-1.5.0. Those are the only numbers that matter. I'd be curious what you find.

It's sometimes tricky, though, to figure out when an application is finished starting (are the bits in the graphics pipeline, or are they on the screen yet?). We have some hooks for that, if you are interested. It's even trickier to figure out the memory footprint of an application, and some operating systems are trickier than others.

fdlcruz
Offline
Joined: 2004-08-23

Im quite interested in learning what these graphical and memory (hopefully non-native) hooks are.

coxcu
Offline
Joined: 2003-06-11

"Pre-compilation also doesn't work with network installs" seems like a bit of an overstatement. I grant that it makes the problem more complicated, but isn't it essentially just a cacheing problem? Between runs of a Java program running on Windows, the processor and amount of memory could change. That just means that you only use the data in the database if it is appropriate. Keeping a database of profiling data and precompiled/preoptimized code definitely increases complexity, but HotSpot isn't exactly simple now.

Imagine a database containing a snapshot of each program right before it accessed something outside of the JVM. Wouldn't most applications load far more quickly if they were simply read from the database pre-initialized?

I know that there are lots of tricky implementation issues to this approach, but Java will remain out-of-favor for applications that require quick start-up time until more is done. A database may even reduce the need to run programs with the client switch to such an extent that it can be eliminated thus simplifying the JVM. More and more sophisticated optimizations could be implemented because their costs would be distributed across many runs of an application.

Of course the beauty of Java is that all of these changes can happen under-the-hood, without application programmers needing to do anything.

crnflke
Offline
Joined: 2003-11-14

Have you looked into pre-compiling a number of classes?

There is a fair amount of work that has to be done each and every time parsing, and either interpreting or compiling a fair amount of framework code. It would be good if you could point a good static compiler at a big stack of framework code and if necessary then patch over that with hotspotted code (if you wanted to modify it to tune it better).

I don't know how this would fit in with the structure of Java, but one of the big downsides from what I see is definitely the time it can take a client application to "warm up". Server applications don't matter so much comparatively, as they can amortise the cost of these things, whereas it is very visible in Java UI applications.

tsinger
Offline
Joined: 2003-06-10

How about precompiling the JRE at installation time? Especially for Swing applications, which load a fairly amount of JRE-classes (vs. application classes), I guess, this would reduce the start-up-time significantly.

Tom

peterkessler
Offline
Joined: 2004-10-27

You can see us edging towards that slippery slope with the work we put into JDK-1.5.0. At installation time we take some of the classes from the bootclasspath and tease their class files into read-only and read-write portions as they would exist when loaded into the JVM, and build what's called a "shared archive". The archive can be memory mapped into subsequent JVM's (and the read-only portion shared between JVM's), saving class loading time. See http://java.sun.com/j2se/1.5.0/docs/guide/vm/class-data-sharing.html for details.

We'll probably do more interesting things with these archives in the future (but I'm not pre-announcing anything).

Compilation time is not the limiting factor. Compilation happens only for the small number of methods that are deemed worth compiling, and that varies from application to application. Precompiling all of something like Swing would probably increase the overall footprint of your application, since bytecodes were designed to be compact, and we only compile to native code the methods you use.

Pre-compilation also doesn't work with network installs, where a single installation is used by a number of clients on machines of different types (not just SPARC versus x86, but Pentium versus Pentium 4). The world is not just stand-alone PC's, fortunately.

asjf
Offline
Joined: 2003-06-10

do you have any data on how much start-up time has been decreased from 1.4.2->1.5?

linuxhippy
Offline
Joined: 2004-01-07

More work on startup-time would be great, not only on windows but also on linux which tends to need even more time to get JVM up and running. (why are the jvm shared-objects 3-4x as big as on windows).

Overall Java-Performance is great since 1.3, but startup is still somewhat slow.
We often see this at our customers which are quite angry about the 15s startup of the jvm blocking the browser till the plugin-part+jvm are loaded.

lg Clemens

peterkessler
Offline
Joined: 2004-10-27

Nice list. Thanks for posting it. Some comments on some of your suggestions:

1) Escape analysis is already being worked on. The easy cases are (sort of) easy, but the harder cases are hard. It's an engineering tradeoff how much work to do when compiling (at runtime) to get what kinds of performance benefits.

2) Object inlining goes hand-in-hand with escape analysis. It might get tricky, as you'd have to have two versions of the methods on the class: one for first-class objects and one for objects that have been inlined. Maybe inlining only applies if all the methods can be inlined, too.

3) Eliminating class casts: I'm not a language designer so I'll pass on this. One of the restrictions is that we can't depend on any particular javac compiler to supply annotations: the JVM has to be able to run old class files.

4) Vector units are used by the JVM for some of its internal operations, and to compile a few constructs (I'd have to check on which ones, but I know we compile differently at runtime depending on which instructions are available.) Again, it's a tradeoff of how long it would take to analyze the code for possible instruction set use versus the benefit one would get from using those instructions. We tend to go for the greatest good for the greatest number of users, but that there also some of speed demons lurking in the corners.

5) Isolation API. This is JSR-121, and is about providing API's around isolation for applications. Contrast that with sharing bytecodes and generated code between applications, which is about performance, footprint, and other JVM qualities of service. The Tiger release had some steps towards improved performance (mostly startup performance) and reduced footprint (based on sharing some VM data structures between VM instances), and we'll keep working on those. But don't confuse that work with JSR-121.

iwadasn
Offline
Joined: 2004-11-09

It's good to get such a thoughtful reply. Let me just toss a few more thoughts in here. It seems like the current breakdown of client/server would serve us well here. The client can not bother with a lot of the more demanding steps, but the server should try to include everything, even the kitchen sink.

Basically, it seems to me that computers are currently (and have been for awhile) growing exponentially in performance. Code size is growing somewhat more slowly, it seems to me only linearly, but perhaps this is arguable. The hotspots of code have been growing even slower than that (once again, this is from personal experience), and in fact I'd go so far as to say that the number of lines of code that together consume 90% of the processing time has probably actually decreased as a few really intensive hotspots have become totally dominant. Therefore it seems that we have exponentially increasing resources to devote to a shrinking volume of code, and therefor no punches should be pulled when it comes to compilation.

This seems even more significant once things like isolation and VM sharing come into the picture. I can easily see my G5 at home (I know this is apple's domain, not yours, but bear with me) starting up a few java programs and then never exiting the VM. The VM should become part of the OS, and idle time might as well be used compiling and optimizing as just burned off with an idle thread.

So now specifically......

1) For our programs, that are restarted about once a month, if that, The thing takes a long time to start up, in fact our system takes about 10 minutes to compile (javac on about 3,000 java files), and then it takes an hour to start up. I'd take an extra hour in startup time in exchange for 10% better performance while it runs. It seems very unlikely that even the most intense compilation system could result in an hour of additional startup time.

2) Same as number 1. I'm not sure you'd have to have two copies of the method most of the time though. For instance, if a string was inlined within another object, then the pointer to the other object would be X and the pointer to the string would be X + dx, where dx is some offset. With that little caveat it should work identically, right? Any method that operates on a string could operate on the X + dx pointer and have the same effect, as that is actually a string.

3) It seems like a reasonable compromize could be made here. A new compiler could annotate the class file with additional information that would allow casts to be eliminated. If the file doesn't have the annotations, then it must keep its casts, if it is annotated, then the VM might be able to optimize it a little bit. That allows interoperability in both directions. New VMs can run old files (no annotation data, so no elimination), and old VMs can run new files (just ignore the annotation, no cast elimination). Only when a new VM runs a new file would you get any advantage.

4) Once again, I'm not sure exactly what is done now. It seems to me that (for instance) the compiler could be mindful of vector units when unrolling loops. For instance, a loop full of integer or floating point code should be unrolled in multiples of the vector size so that the operands can be packed and vectorized. I'm pretty sure that this sort of thing isn't done now. It gets more important in newer chips that might start doing crazy things to get performance. For instance, (once again, I'm not an expert, but...) it seems likely that the operations needed to perform well on an IA-64 processor are closely related to the operations needed to fully utilize altivec or SSE. They're both just wide pipelines with especially restrictive scheduling rules, and a compiler that was smart about creating its executable code could use both of them fully.

5) It just seems that VMs will essentially become the OS sooner rather than later. It would be wise to get as much of the isolation/sharing infrastructure in place as early as possible. I'm not confusing them, but I think they're very closely related. For instance, it seems that there would essentially never be any need to have more than one VM process on a computer. Assuming that the VM cannot segfault, then any additional java apps should just be bundled into the currently running process to maximize sharing of all sharable resources. This of course only works if the java VM is inherently safe enough to prevent the programs from messing each other up, but that seems to be almost there already, just a little weirdness around things like System.exit(). That oddity can be fully fixed by breaking the static data for each "process" into a separate space within the VM, and then sharing everything else. Code/compilation wouldn't be duplicated (thus reducing further the compilation penalty for #1 and #2 above), and yet the programs would be safe from each other just as if they were running in separate processes.

peterkessler
Offline
Joined: 2004-10-27

I'm prodding one of our compiler guys to comment on the compiler issues (escape analysis, compilation time versus runtime performance, vector units), but I'll comment on number 5: the VM becoming the OS, and sharing versus isolation.

It's not always a win to throw more applications into a JVM. Consider the heap: there is a certain amount of overhead to garbage collection. Some of the phases are proportional in time to the size of the heap, so if I'm a small application I might not want to be run in the same JVM as a really big application. The overall GC overhead across the machine might be the same, but if I have separately scheduled processes, that big guy can spend cycles on GC without cutting into my time slices. Or think about code generation: the JVM makes certain optimizations around virtual method calls based on the fact that there is only one implementation of an abstract method loaded in the JVM, so that's the only implementation that can be called, so the call is faster because it's not really a virtual call any more. In fact, the method might be inlined, so there is no call at all. If you throw a lot of unrelated code into one big JVM, fewer call sites will be able to take advantage of those optimizations. There are also certain operations that are essentially serial across the JVM, e.g., class loading. If you throw lots of applications into one JVM, the serial operations for one application will start to interfer with the serial operations of the other applications.

Then, there's always the possibility of bugs. Not in the JVM itself (of course!), but in the native libraries needed by the applications, etc. Debugging would also be a problem: stopping the world and getting stack traces for all the threads would be confusing if you were really only interested in the threads for your application. Profiling has the same problem. Those could be worked around by better tools. Or by using separate operating system processes.

You rightly point out that there are benefits to be had by merging parts of multiple JVM's running on the same hardware. I've tried to point out why it's not a slam-dunk. We are working on sharing between JVM's, but it's a delicate engineering balance.