Skip to main content

why is this so hard to compile to native code?

14 replies [Last post]
syntern
Offline
Joined: 2005-07-27
Points: 0

Hello, I have created some micro-benchmark, and I found that in this special case Java is _very_ slow compared to C++ compilers. My question is: why? Because I think this kind of numerical processing could be compiled very much the same to bytecode....

*** Edit: see code below

Message was edited by: syntern

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
sla
Offline
Joined: 2003-06-11
Points: 0

I modified your code a little bit:

1) Run the test twice so that the adaptive optimizations of the JVMs have a chance to optimize.
2) Remove the redundant array bounds check in the Java code: "if (x > 0 && x < tomb.length)"

Here are my results:

Hotspot server 1.6.0 : 11 sec
JRockit R26.4.0 : 8 sec
C code : 7 sec

I see no reason to be disappointed with the speed of Java, although you should be very skeptical to micro benchmarks such as this one.

This was on a Windows XP machine with 2 Xeon 3GHz processors. I used Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 with the /O2 switch. No arguments given to the JVMs.

Regards,
/Staffan - yes, I work for BEA

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

also keep in mind that Java does have to check for wether an arary-access does not excess array's bounds. Often these checks can be removed by the JIT, sometimes not.

Its like GC, comfort has its price ;)

lg Clemens

syntern
Offline
Joined: 2005-07-27
Points: 0

Ok, so next try of the source codes (sorry, the forum clears up every indent formatting option I've tried)

cat Test.java

[code]

public class Test {
public static void main (String[] args) {
int[] tomb = new int[65536];
int i = 0, j = 0;
for (i = 0; i < 65536; i++)
tomb[ i ] = (i*i) & 31;
for (i = 0; i < 65536; i++) {
for (j = i+1 ; j < 65536; j++) {
int x = (i+j) & 65535;
if (x > 0 && x < tomb.length)
tomb[x] = (i + tomb[ i ] + tomb[j] * 17) & 31;
}
}
}
}
[/code]

cat test.c

[code]
main () {
int tomb[65536];
int i = 0, j = 0;
for (i = 0; i < 65536; i++)
tomb[ i ] = (i*i) & 31;
for (i = 0; i < 65536; i++) {
for (j = i+1 ; j < 65536; j++) {
tomb[(i+j)&65535] = (i + tomb[ i ] + tomb[j] * 17) & 31;
}
}
}
[/code]

Intel(R) Pentium(R) 4 CPU 3.20GHz
gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)
java version "1.5.0_05"

Message was edited by: syntern

briand
Offline
Joined: 2005-07-11
Points: 0

If you enclose the code in '[ code ]'/'[ /code ]'
pairs, it will keep your structure. Unfortunately,
it still seems to translate the [ i ] to italics.

These formatting options aren't documented anywhere
that I've been able to find; I found them out the hard
way, just like you :-)

BTW - I thought that the brackets might have been dropped
by the forum software, but I didn't want to assume what your real intention was.

Brian

spdenne
Offline
Joined: 2006-02-15
Points: 0

I should also mention that I'm using eclipse's compiler. Sun's produces different code with different characteristics. I've no idea why or how.

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

I don't see a slowdown after recompilation, using javac and mustang-b104.

lg Clemens

spdenne
Offline
Joined: 2006-02-15
Points: 0

By recognising that i + tomb[ i ] is not changed by the inner loop, I can get the time down to 7.7 seconds

[code]
public final class Test {
private static int[] tomb;
private static int i;
public static void main(String[] args) {
long start = System.currentTimeMillis();
tomb = new int[65536];
for (i = 0; i < 65536; i++) tomb[ i ] = (i * i) & 31;;
for (i = 0; i < 65536; i++) innerLoop(i + tomb[ i ]);
System.out.println(System.currentTimeMillis() - start);
}
private static void innerLoop(int i_plus_tomb_i) {
for (int j = i + 1; j < 65536; j++) {
tomb[(i + j) & 65535] = (i_plus_tomb_i + tomb[j] * 17) & 31;
}
}
}
[/code]

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

would be interresting wetter ea would make it even faster ....

could you rerun on mustang with -XX:+DoEscapeAnalysis ?

thank you in advance, lg clemens

spdenne
Offline
Joined: 2006-02-15
Points: 0

On my PC, Escape Analysis aids the original code by about 0.2 seconds, makes the second code I posted slow down on recompile, and the third example takes 7.6 seconds.

Interestingly, without Escape Analysis, that third bit of code I posted slows down on recompile too. With Escape Analysis, it starts at 7.7 then when it recompiles, it runs consistently at around 7.6 seconds.

spdenne
Offline
Joined: 2006-02-15
Points: 0

Your code with timing of just the execution added (i.e. excludes start up time):
[code]
public class Test {
public static void main(String[] args) {
long start = System.currentTimeMillis();
int[] tomb = new int[65536];
int i = 0, j = 0;
for (i = 0; i < 65536; i++)
tomb[ i ] = (i * i) & 31;
for (i = 0; i < 65536; i++) {
for (j = i + 1; j < 65536; j++) {
int x = (i + j) & 65535;
if (x > 0 && x < tomb.length)
tomb[x] = (i + tomb[ i ] + tomb[j] * 17) & 31;
}
}
System.out.println(System.currentTimeMillis() - start);
}
}
[/code]

Is faster on server JVM than client JVM:

java -showversion -XX:+PrintCompilation -verbose:gc Test
java version "1.6.0-rc"
Java(TM) SE Runtime Environment (build 1.6.0-rc-b104)
Java HotSpot(TM) [b]Client[/b] VM (build 1.6.0-rc-b104, mixed mode, sharing)
1 java.lang.String::charAt (33 bytes)
2 java.lang.String::hashCode (60 bytes)
3 java.lang.String::indexOf (151 bytes)
4 java.lang.AbstractStringBuilder::append (40 bytes)
1% Test::main @ 21 (134 bytes)
19628

java -server -showversion -XX:+PrintCompilation -verbose:gc Test
java version "1.6.0-rc"
Java(TM) SE Runtime Environment (build 1.6.0-rc-b104)
Java HotSpot(TM) [b]Server[/b] VM (build 1.6.0-rc-b104, mixed mode)
1% Test::main @ 21 (134 bytes)
1% made not entrant Test::main @ 21 (134 bytes)
2% Test::main @ 58 (134 bytes)
2% made not entrant Test::main @ 58 (134 bytes)
14961

Removing the redundant array bounds check of yours makes it even faster. Repeating it a few more times shows that it gets faster when the server JVM recompiles it, and that the garbage collection time makes next to no difference:

[code]
public final class Test {
public static void main(String[] args) {
for (int k = 0; k < 8; k++) {
long start = System.currentTimeMillis();
int[] tomb = new int[65536];
for (int i = 0; i < 65536; i++)
tomb[ i ] = (i * i) & 31;
for (int i = 0; i < 65536; i++) {
for (int j = i + 1; j < 65536; j++) {
tomb[(i + j) & 65535] = (i + tomb[ i ] + tomb[j] * 17) & 31;
}
}
System.out.println(System.currentTimeMillis() - start);
}
}
}
[/code]

java -showversion -XX:+PrintCompilation -verbose:gc Test
java version "1.6.0-rc"
Java(TM) SE Runtime Environment (build 1.6.0-rc-b104)
Java HotSpot(TM) Client VM (build 1.6.0-rc-b104, mixed mode, sharing)java version "1.6.0-rc"
Java(TM) SE Runtime Environment (build 1.6.0-rc-b104)
Java HotSpot(TM) Client VM (build 1.6.0-rc-b104, mixed mode, sharing)
1 java.lang.String::charAt (33 bytes)
2 java.lang.String::hashCode (60 bytes)
3 java.lang.String::indexOf (151 bytes)
4 java.lang.AbstractStringBuilder::append (40 bytes)
1% Test::main @ 21 (131 bytes)
14882
14811
[GC 724K->112K(5056K), 0.0013471 secs]
14681
14571
14571
[GC 891K->112K(5056K), 0.0005403 secs]
14611
14551
14561

java -server -showversion -XX:+PrintCompilation -verbose:gc Testjava version "1.6.0-rc"
Java(TM) SE Runtime Environment (build 1.6.0-rc-b104)
Java HotSpot(TM) Server VM (build 1.6.0-rc-b104, mixed mode)
1% Test::main @ 21 (131 bytes)
1% made not entrant Test::main @ 21 (131 bytes)
2% Test::main @ 59 (131 bytes)
2% made not entrant Test::main @ 59 (131 bytes)
11357
3% Test::main @ 21 (131 bytes)
9243
[GC 724K->112K(5056K), 0.0023587 secs]
9213
9244
9223
[GC 891K->112K(5056K), 0.0005386 secs]
9203
9213
9204

9.2 seconds on a 1.8GHz centrino

spdenne
Offline
Joined: 2006-02-15
Points: 0

This forum translates [ i ] into italics, making it really hard to read your code.

briand
Offline
Joined: 2005-07-11
Points: 0

Your source code does not compile:

[code]
% javac -version Test.java
javac 1.5.0_09
Test.java:6: incompatible types
found : int
required: int[]
tomb = (i*i) & 31;
^
Test.java:10: operator + cannot be applied to int,int[]
tomb[(i+j)&65535] = (i + tomb + tomb[j] * 17) & 31;
^
Test.java:10: operator & cannot be applied to ,int
tomb[(i+j)&65535] = (i + tomb + tomb[j] * 17) & 31;
^
3 errors

[/code]

It would also be useful if you included some information
about the platform (OS version, CPU type and count, java
version being used, command line arguments), and your
results on that platform.

Brian

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

1.) use the server compiler
2.) microbenchmarks are not really valueable
3.)your benchmark runs simply way too short - the compiler will compile or not and you also measure compilation time.
4.) use the server compiler

best wishes, lg Clemens

syntern
Offline
Joined: 2005-07-27
Points: 0

Sorry for the compile error, I've copy-pasted the code, this forum really messed up it :( .

1, I've used server compiler and for sure, the client as well
2, I've tested on intel 32 bit (P4 3G+) and Sun Opteron (with Solaris) as well, the tendency remains: the C++ compiled code is _much_ (e.g. 10-20 times) faster than Java code
3, I've increased the array size, tried other kind of numerical calculations, increased the total amount of time to run (even logged the compilation and GC events), nevertheless: C++ code outperforms Java in these simple scenatios.
4, Yeah, microbenchmarks is bad, but I've benchmarked many aspects of Java, and except of this, it is always comparable to C++. I am just trying to solve th question:

Why is Java significantly slower in this? The array boundary check is not enough explanation...