Skip to main content

Chat on the Hotspot virtual machine: March 15, 2005

27 replies [Last post]
peterkessler
Offline
Joined: 2004-10-27
Points: 0

On March 15, 2005 11:00 A.M. PST (19:00 UTC),
Ross Knippel and I will be in a chat room at
http://java.sun.com/developer/community/chat/
answering questions about Sun's HotSpot virtual machine.
We'll also field any questions you have about the
structure of the virtual machine code, how things work,
and all that stuff that goes on below the level of
Java code.

We invite you send in questions. That's what will make
the chat interesting, both for you and for us. If you
want to seed the questioning -- and give us a chance to
think of cogent answers -- you can add them to this forum
topic.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
mayhem
Offline
Joined: 2003-06-11
Points: 0

First of all, I wouldn't call any benchmark pointless. They may be incorrect, but not pointless. If HotSpot can't optimize simple code like this correctly, how can I trust it to optimize anything correctly? Too bad there is no easy way to the assembler code generated by HotSpot, otherwise I could point out the error in it.

sjasja
Offline
Joined: 2004-08-15
Points: 0

The benchmark in the javagaming.org chat is pretty badly broken. It mistakenly uses the same variable ("count") for two purposes, giving wildly incorrect results.

When you fix that, JDK 5.0 gives 21000 rounds/10s. GCC -O3 3.3.3 on the same hardware is slower, at 17000 rounds/10s.

I have the assembly produced by JDK. It's 70+ lines long, due to what seems to me like aggressive loop unrolling.

mayhem
Offline
Joined: 2003-06-11
Points: 0

The first benchmark is not broken (phazer's code). On an Athlon 1.4 GHz using Java 5 -server HotSpot is much slower than JRockit and GCC.

sjasja
Offline
Joined: 2004-08-15
Points: 0

The original benchmark may be timing Hotspot compilation time, not runtime. Or maybe it's an Athlon thing?

Here is a Hotspot-friendly variant and a C variant. JDK 5.0 -server beats gcc 3.3.3 -O3 on my Intel-based PC (2100 to 1800 iter/s, best out of five). How about on your Athlon? Maybe there is an Athlon-specific optimization Hotspot doesn't do? If so, be sure to file an RFE! And don't look at the first line of timing from the Java version, that's compilation, not execution.
[code]
public class javagaming
{
public static void main(String args[])
{
for (int n = 0; n < 5; n++)
doit();
}

private static float x = 0.7456f;
private static float y = 0.97543f;
private static int count = 100000;
private static float f1;

public static void doit()
{
long time = System.currentTimeMillis();

int times = 0;
while (System.currentTimeMillis() - time < 10000) {
float a = x;
float b = y;

for (int i = 0; i < count; i++)
b = a * b + a;

f1 = b;
times++;
}

System.out.println(times * 1000 / (System.currentTimeMillis() - time) +
" iterations / s");
}
}

#include
#include

static float x = 0.7456f;
static float y = 0.97543f;
static int count = 100000;
static float f1;
static float f2;
static double f3;

int main()
{
int n;

for (n = 0; n < 5; n++) {
int times = 0;
long start = time(NULL);

while (time(NULL) - start < 10) {
float a = x;
float b = y;
int i;

for (i = 0; i < count; i++)
b = a * b + a;

f1 = b;
times++;
}

printf("%d iterations / s\n", times / (time(NULL) - start));
fflush(stdout);
}
return 0;
}
[/code]

gandhipurav
Offline
Joined: 2004-11-24
Points: 0

Hi,

We have an web-application running on jdk1.5 and tomcat 5. I want that my application should not take more than 48 MB memory on to my machine. I used the -Xmx48m option, but it just controls the heap memory. But what about the non-heap memory? Here are few things I have noticed
-> when I use JConsole to monitor the application, I found that my applications non-heap memory Max is set to 96M.
-> Another thing I have noticed that when I use "top" command to monitor memory footprint of my application, it shows an alarming 125MB of usage( Sometimes when i use -server option it used to go upto 250M) , which may harm my other applications which are memory prone. Even when the GC runs java do not free any memory for other application.
-> I have tried many options like
-XX:+UseConcMarkSweepGC
-XX:NewSize=8m -XX:MaxNewSize=8m -XX:SurvivorRatio=2 -Xms48m -Xmx48m -Xss64k
-XX:MaxHeapFreeRatio=20 -XX:MinHeapFreeRatio=10 -XX:NewSize=32m -XX:SurvivorRatio=32 -Xss256k -Xms48m -Xmx48m
-XX:+UseParallelGC -XX:GCTimeRatio=20 -Xms30m -Xmx30m -Xss2048k
but failed to restrict it to 48M

Before using Sun JDK5 we are using IBM jdk1.3 with tomcat 3 with green thread. So we were able to configure it with just one option -mx48m. I am really stuck. I cannot believe that Sun have not given us any control on the maximum memory usage. Please help me .

carfield
Offline
Joined: 2003-06-10
Points: 0

Yes... I personal think java by default use more memory than it need, especially -XX:MaxHeapFreeRatio is default to 70

http://forum.java.sun.com/thread.jspa?threadID=5250587

mayhem
Offline
Joined: 2003-06-11
Points: 0

Does any1 from the HotSpot team have any comments about this benchmark:

http://www.javagaming.org/cgi-bin/JGNetForums/YaBB.cgi?board=Tuning;acti...

Scroll down to see the source code (phazer's comment). Basicly, JRockit and C compilers are leaving HotSpot in the dust in this test.

tackline
Offline
Joined: 2003-06-19
Points: 0

I'm not a member of the HotSpot team*, but:

Does it matter in the slightest how fast pointless micro-benchmarks run? They are useful for understanding why real code performs as it does. Is there some real code you are worried about?

As for the linked pointless microbenchmark. You can eaily improve the performance by checking for convergence. About a/(1-a) or +INF/-INF/NaN is my bet.

*Nor Sun, or any other company.

mayhem
Offline
Joined: 2003-06-11
Points: 0

First of all, I wouldn't call any benchmark pointless. They may be incorrect, but not pointless. If HotSpot can't optimize simple code like this correctly, how can I trust it to optimize anything correctly? Too bad there is no easy way to the assembler code generated by HotSpot, otherwise I could point out the error in it.

tackline
Offline
Joined: 2003-06-19
Points: 0

If you really want to see the generated code, I'm told it's straight forward using a machine code level debugger.

mayhem
Offline
Joined: 2003-06-11
Points: 0

If some1 has some info about how to do this, I would be grateful.

murphee
Offline
Joined: 2003-06-10
Points: 0

NIO Buffers:

What is the current state of performance with Buffers and DirectBuffers? When NIO came out, it was maintained that Hotspot would make accesses to them as fast (if not faster?) than array accesses.
Is this true now and/or are there any special ways to access Buffers to make use of any optimizations?
What about accessing DirectBuffers, are there any ideas for making access (store,read) data in DirectBuffers.

(Note: I did a little benchmark, basically a loop storing 1000000 ints in an array and in a DirectBuffer; the DirectBuffer took about 3x as much time... is this a representative result in your experience? (Yes, array and DirectBuffer were allocated outside the look, the benchmar was run about 200 times in the same VM)).

tackline
Offline
Joined: 2003-06-19
Points: 0

Cameron Purdy did some non-IO NIO benchmarks. There's an awful lot of inlining to be done, so you've got to go gentle on HotSpot.

http://www.jroller.com/page/cpurdy/20040406#more_on_nio_performance

It's also worth noting that Sun's NIO implementation assumes 64-bit addresses, so 32-bit machines will take a hit from that.

jdolphin
Offline
Joined: 2005-03-09
Points: 0

I understand you worked with AMD to optimize HotSpot for Opteron. What type of performance improvements are possible with a x64/AMD64 VM vs the x86 32-bit VM? I am specifically interested in the Solaris 10 implementation.

jdolphin
Offline
Joined: 2005-03-09
Points: 0

Based on independent tests (http://www.shudo.net/jit/perf/) HotSpot has a performance lead over the .NET CLR in almost every area. Nice work! Do you see yourselves extending this lead?

Message was edited by: jdolphin

jdolphin
Offline
Joined: 2005-03-09
Points: 0

Are there any major improvements/optimization in the works for the client VM?
I am particularly interested in any optimization which would benefit Swing/JDNC.

slohmeier
Offline
Joined: 2005-02-19
Points: 0

I would like to be able to redefine classes at runtime. I know that it is currently possible to redefine the body of mothods, but I think I can't add / remove methods and fields.

The use case I have in mind is RMI-based systems and Java-based systems that run for a long time. These systems currently need to be restarted when new versions of a class become available.

Are there any plans to add unrestricted class redefinition to the VM?

Sebastian

peterkessler
Offline
Joined: 2004-10-27
Points: 0

The way you change implementations for long-running applications is to use interfaces and class loaders. The technique is described in

"Dynamic class loading in the Java virtual machine"
by Sheng Liang and Gilad Bracha
ACM Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA)
1998

It sounds like you are thinking of using JVMTI to replace method bodies. There are several problems with trying to make comprehensive changes that way. E.g., you can't atomically change a collection of methods to implement some new feature. JVMTI is fine for doing things that don't change the semantics of the methods, for example, adding performance counters, or fix-and-continue development in a debugger, where you understand the risks of what you are doing, but we don't recommend it for upgrading a running system.

murphee
Offline
Joined: 2003-06-10
Points: 0

Future of -Xmx ?

Are there any ideas for getting rid of the dreaded Xmx (max. Heap) limit? And if getting rid is not possible, is it possible to add a way to increase it at runtime (even if this would have some cost and/or block the VM for a bit). The JMX MemoryManagement MBeans could be used to do this;

peterkessler
Offline
Joined: 2004-10-27
Points: 0

We have a design to get rid of the dreaded -Xmx maximum heap limit. The problem is that it has a performance cost. We are loathe to impose that penalty on people who aren't using it. We probably will make the solution available first in the 64-bit JVM, since those applications tend not to know their memory requirements as closely as the 32-bit applications.

There's an intermediate position, which is to specify a large -Xmx, and depend on the garbage collector to keep the heap space actually used down to near what's needed for the live data of the application. The problem with a larger -Xmx is that it reserves (but doesn't commit) the full -Xmx space in your swap area (depending on the operating system you use), which could be a problem if you run a lot of JVM's at the same time.

Also, a lot of people like the ability to limit the size of their Java object heaps, as a way of controlling resources on their machines, and to detect bugs. So we probably will keep the -Xmx even if we give you a way to say that you wanted it to be "unlimited".

murphee
Offline
Joined: 2003-06-10
Points: 0

Optional Optimizations:
There are a lot of options for tuning the GC, but not so many for tuning the compilation of the Code generator or optimizer of Hotspot (except for -client/-server and the compile treshold).

Are there any plans to make more fine grained configuration of Hotspot code generation/optimization possible?
This would allow to include more agressive optimizations that might yield benefits for some apps, but be very detrimental to others and are thus deemed unfit for inclusion in Hotspot.

rossknippel
Offline
Joined: 2005-03-15
Points: 0

The priority is to add optimizations to Hotspot that are benefical
to the general set of applications. These optimizations would
be triggered by the opportunity to apply them. So there are no plans to add a lot of optimization tuning options to the Hotspot runtime compilers.

Could you elaborate on what aggressive optimizations you would
like to see implemented? And perhaps why they may be
detrimental to some applications, if it's not obvious.

murphee
Offline
Joined: 2003-06-10
Points: 0

> Could you elaborate on what aggressive optimizations
> you would like to see implemented? And perhaps why they
> may be detrimental to some applications, if it's not
> obvious.

Hmm... off the top of my head:
low-overhead objects, ie. objects where you would not have the 2-word object header. This might be useful for some apps, but not for all, and it probably might add a bit of a performance decrease.

murphee
Offline
Joined: 2003-06-10
Points: 0

On 64 bit machines, all pointers/references are 64 bits long... even though an address space that large won't be available for decades.
Are there any ideas for using parts of the pointers to store data (think Immediate values, tagged values,...)? A 64 pointer could store all primitives (except long and double) and still leave at least 32 bits for tagging,... information. This might be used to get rid of object allocation for AutoBoxing (except for Long, Double of course).
There are problems of course (synchronization, findig out the TIB/class of an object, polymorphism; for these things, immediate values would have to be treated differently, maybe meaning a slight overhead for all classes for accessing these properties (I suppose this could be reduced to a mask + compare, and a mostly taken branch for non-wrapped objects), but maybe there are some solutions around for those.

peterkessler
Offline
Joined: 2004-10-27
Points: 0

We have thought about it, but we aren't willing to spend the time to mask off the pointer bits at each use of the reference. That's an overhead for everyone, whether they are using autoboxing, etc., or not. If we can figure out how to make it fast, yes, we are always looking for a few extra bits that we can use to make the JVM more efficient.

elizarov
Offline
Joined: 2005-02-08
Points: 0

Are there any plans to teach HotSpot to perform some basic reference
escape analysis and allocate short-lived objects (ones that live only
in the context of the method that is being compiled) on the stack
and/or perform some other memory-allocation optimizations?

It should be extremely useful for Iterator and StringBuffer/Builder
objects. Also, the Object.clone method is (ab)used in too many places
with detrimental effect on performance. It was a pity to learn, that
even newer additions to Java like "values()" method of enum classes
use "clone" technique, instead of returning a read-only Collection,
which is more flexible and is generally better for performance. A
simple escape analysis could be used by HotSpot to optimize away many
typical usages of Object.clone, because the result is often used only
in a single method context and is not modified.

rossknippel
Offline
Joined: 2005-03-15
Points: 0

There are plans to add escape analysis to the
Hotspot runtime compilers. This is a priority
for the next major release of J2SE: 6.0