Skip to main content

HotSpot performance parity with IBM and JRockit

20 replies [Last post]
jdolphin
Offline
Joined: 2005-03-09
Points: 0

HotSpot fares very well in a benchmark like SciMark, but in the test below (fibonacci - commonly used in benchmarks) I notice that Hotspot server (and client obviously) in JDK5.0 is considerably slower than either JRockit or IBM. I am not sure if it is because these VMs are better with recursion or if they have lower method call overheads - or somethings else. Does anyone on the Hotspot team have any comments on this? BTW Hotspot server is still faster than C# and J# on this test, but the client VM slightly slower.

class Class1
{
static int fib(int n)
{
if (n==1 || n==2) return 1;
else
return fib(n-1)+fib(n-2);
}

public static void main(String[] args)
{
for (int i=0; i<10; i++)
{
long start = System.currentTimeMillis();
System.out.println(fib(44));
System.out.println("millis "+(System.currentTimeMillis()-start));
}

}
}

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
yuriymikhaylovskiy
Offline
Joined: 2010-12-19
Points: 0

http://opt.sourceforge.net/ Java Micro Benchmark - control tasks required to determine the comparative performance characteristics of the computer system on different platforms. Can be used to guide optimization decisions and to compare different Java implementations.

sjasja
Offline
Joined: 2004-08-15
Points: 0

> Several benchmarks on the net use this as one of
> their tests.

And for this very reason, some compilers are bound to special case it, producing special handcrafted optimized code. This is one reason why an individual microbenchmark can tell very little about real performance.

jdolphin
Offline
Joined: 2005-03-09
Points: 0

> Could you post a machine description (cpu, memory,
> OS),
> the versions of the jvms (Sun, IBM, and JRockit),
> and any flags/options that were used?
>
> On a 2x AMD 1.8Ghz Operton "244" running XPSP2,
> the following three jvms appear to perform similarly
> on the Fibonacci benchmark:
>
> HotSpot 150b64 -> 7531 msecs (-server, no other
> flags)
> JRockit 1.5.0 -> 8172
> IBM142 "sov" -> 7410
>
> With more detail information, we should be able to
> reproduce
> the situation you're seeing and provide a better
> answer.
>
> Thanks.

I have now rerun the tests on the 2GHz Celeron using JRockit 1.5.0. Interestingly 1.5 is slower than both IBM and HotSpot, but JRockit 1.4.2_04 is twice as fast as HotSpot server on this machine. Interesting.

I get the following (approx):
JRockit 1.4.2_04: 5.4 Seconds -server
JRockit 1.5.0: 12 Seconds -server
HotSpot 1.5.0 (server): 10.6 Seconds
IBM 1.4.2 7.6 (approx) seconds
HotSpot 1.5.0 (client) 14 Seconds

The J# and C# times are slightly better than the 1.5 client VM times.

hlovatt
Offline
Joined: 2003-11-18
Points: 0

I think optimizing a compiler to do well with a bad benchmark is poor idea. You want the compiler to do well with a least a half decent benchmark and encourage people not to use bad benchmarks.

See below for the performance improvement with a half decent implementation of fibonacci's sequence:

[code]
package benchmarks.fib;

class Fib {
final static double sqrt5 = Math.sqrt( 5 );
final static double goldenRatio = ( 1 + sqrt5 ) / 2;

static int fibFaster( final int n ) { // from http://www.everything2.com/index.pl?node_id=1328208

return (int) ( Math.pow( goldenRatio, n ) / sqrt5 );
}

static int fibOriginal( final int n ) {
if ( ( n == 1 ) || ( n == 2 ) ) {
return 1;
} else {
return fibOriginal( n - 1 ) + fibOriginal( n - 2 );
}
}

public static void main( final String[] notUsed ) {
for ( int i = 0; i < 2; i++ ) {
System.gc();
System.gc();

final long start = System.currentTimeMillis();
final int fib = fibOriginal( 44 );
final long end = System.currentTimeMillis();
System.out.println( "fibOriginal = " + fib + " and took " +
( end - start ) + " ms" );
}

for ( int i = 0; i < 2; i++ ) {
System.gc();
System.gc();

final long start = System.currentTimeMillis();
final int fib = fibFaster( 44 );
final long end = System.currentTimeMillis();
System.out.println( "fibFaster = " + fib + " and took " +
( end - start ) + " ms" );
}
}
}
[/code]

When run I get using the client VM the following:

C:\Personal\Java>java benchmarks.fib.Fib
fibOriginal = 701408733 and took 13940 ms
fibOriginal = 701408733 and took 13950 ms
fibFaster = 701408733 and took 0 ms
fibFaster = 701408733 and took 0 ms

As you can see better algorithm makes a much larger improvement than any compiler could :)

hlovatt
Offline
Joined: 2003-11-18
Points: 0

For some reason my post above got messed up - seems to be something to do with using the code tag and having a URL inside it - second try without the code tag :(

You want the compiler to do well with a benchmark that is at least half decent. It isn't important how well the compiler does with bad code. For example a half decent version of the Fibonacci sequence is:

package benchmarks.fib;

class Fib {
final static double sqrt5 = Math.sqrt( 5 );
final static double goldenRatio = ( 1 + sqrt5 ) / 2;

static int fibFaster( final int n ) { // from http://www.everything2.com/index.pl?node_id=1328208

return (int) ( Math.pow( goldenRatio, n ) / sqrt5 );
}

public static void main( final String[] notUsed ) {
for ( int i = 0; i < 2; i++ ) {
System.gc();
System.gc();

final long start = System.currentTimeMillis();
final int fib = fibFaster( 44 );
final long end = System.currentTimeMillis();
System.out.println( "fibFaster = " + fib + " and took " +
( end - start ) + " ms" );
}
}
}

The results are on my machine using the client VM:

C:\Personal\Java>java benchmarks.fib.Fib
fibFaster = 701408733 and took 0 ms
fibFaster = 701408733 and took 0 ms

It is unlikely that any compiler could give such a good speed up as changing the algorithm can from a really bad algorithm to one that is at least OK.

jdolphin
Offline
Joined: 2005-03-09
Points: 0

I am perfectly aware that this is not the most efficient implementation of fib. That is not the point. I am interested in using a algorithm that is known to be slow as a benchmark to compare HotSpot to JRockit and the IBM JDK.

mthornton
Offline
Joined: 2003-06-10
Points: 0

However if you propose an obviously artificial benchmark, then the onus is on you to show how it relates to work in a real application. The fib benchmark essentially measures call overhead. Given that Hotspot will inline most short methods it is hard to think of real examples where the call overhead is significant, yet the method(s) can't be inlined.

jdolphin
Offline
Joined: 2005-03-09
Points: 0

Several benchmarks on the net use this as one of their tests.

hlovatt
Offline
Joined: 2003-11-18
Points: 0

Just because people use it, doesn't mean it is any good

jdolphin
Offline
Joined: 2005-03-09
Points: 0

Not all methods can be inlined by the VM, so method call overhead is still a factor.

mthornton
Offline
Joined: 2003-06-10
Points: 0

But is it an important factor in REAL code?
If it isn't important, the JVM developers shouldn't waste any time on it.

rossknippel
Offline
Joined: 2005-03-15
Points: 0

Could you post a machine description (cpu, memory, OS),
the versions of the jvms (Sun, IBM, and JRockit),
and any flags/options that were used?

On a 2x AMD 1.8Ghz Operton "244" running XPSP2,
the following three jvms appear to perform similarly
on the Fibonacci benchmark:

HotSpot 150b64 -> 7531 msecs (-server, no other flags)
JRockit 1.5.0 -> 8172
IBM142 "sov" -> 7410

With more detail information, we should be able to reproduce
the situation you're seeing and provide a better answer.

Thanks.

jdolphin
Offline
Joined: 2005-03-09
Points: 0

> Could you post a machine description (cpu, memory,
> OS),
> the versions of the jvms (Sun, IBM, and JRockit),
> and any flags/options that were used?
>
> On a 2x AMD 1.8Ghz Operton "244" running XPSP2,
> the following three jvms appear to perform similarly
> on the Fibonacci benchmark:
>
> HotSpot 150b64 -> 7531 msecs (-server, no other
> flags)
> JRockit 1.5.0 -> 8172
> IBM142 "sov" -> 7410
>
> With more detail information, we should be able to
> reproduce
> the situation you're seeing and provide a better
> answer.
>
> Thanks.

On different machines I get different results:
The numbers I mentioned previously were on a 2GHz Celeron Win XP SP1. 768MB RAM
The (approx) numbers were
Sun JDK1.5.0_01: 10.6 seconds switches: -server
JRocket 1.4.2_04: 5.4 seconds switches: -server
IBM 1.4.2: 7.8 seconds (no switches used)

When I test on my 3.4GHz Pentium 4 WinXP pro SP2 1Gig RAM I get:
Sun JDK1.5.0_02: 3.656 seconds -server
JRockit JDK1.4.2_04: 3.0 seconds -server
IBM 1.4.2: 4.626 seconds (no switches)

jdolphin
Offline
Joined: 2005-03-09
Points: 0

Can't anyone explain the performance gap in this example.

tackline
Offline
Joined: 2003-06-19
Points: 0

The microbenchmark appears to be about inlining of recursive code.

My old machine achieves from upper twenties to upper thirties cycles per fib(n), depending on whether I use -server or -client. That doesn't seem too unreasonable. Worth comparing against the C equivalent.

GCC 3.2.2 with -O3 gives about the same performance as -client. Curiously slightly slower if fib is marked as static, and faster if not. With conventional compile-linking, static in C permits a function to be inlined (and possibly called without an indirect jump). So apparently inlining is not necessarily a good idea in this case. To be fair, -mcpu=pentium2 increased and evened up the results, but still closer to -client than -server.

It reminds me of another microbenchmark from a year or so back. IIRC, six nested for loops (nice code) each iterating 16 times. -client and -server gave almost exactly the same performance. However increasing the iteration count favoured server. Lowering it favoured client. There is a compromise between how soon to compile and how hard. Ahead of time compilers win hands-down on this particular nonsense test, for sufficiently small iteration counts.

jdolphin
Offline
Joined: 2005-03-09
Points: 0

Even I increase the number of iterations to give Hotpot more time to optimize, JRockit and IBM are both still faster.

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

this is basically an very evil microbenchmark - nore useful neither true!

jdolphin
Offline
Joined: 2005-03-09
Points: 0

> this is basically an very evil microbenchmark - nore
> useful neither true!

The reason I post it is that this test used in a number of benchmarks. I even know someone who used one of these benchmarks to justify choose C# over Java. They didn't even test with the server VM!

I'm not sure what you mean by "neither true". Run the test for yourself.
On my PC I get:
Hotspot Server 1.5: 10.8seconds (best)
Hotspot Client 1.5: 14 seconds
IBM 1.4.2: 7.5seconds (best)
JRockit: less (but the stoopid license prevents me from disclosing any benchmark results).

sla
Offline
Joined: 2003-06-11
Points: 0

> JRockit: less (but the stoopid license prevents me
> from disclosing any benchmark results).

I believe the license has been changed recently so that you are free to disclose the results.

Regards,
/Staffan

jdolphin
Offline
Joined: 2005-03-09
Points: 0

> > JRockit: less (but the stoopid license prevents me
> > from disclosing any benchmark results).
>
> I believe the license has been changed recently so
> that you are free to disclose the results.
>
> Regards,
> /Staffan

As far as I remember the time was around 5.4 seconds.