Skip to main content

JVM1.4 and 1.5 are very slow on Solaris-10-AMD64-x86

7 replies [Last post]
dpjo
Offline
Joined: 2006-05-10
Points: 0

Hello All,
JVM1.4 and 1.5 are very slow on Solaris-10-AMD64-x86 machine (with 2cpu 4GRAM plenty of space, no resource consuming process other than test program).
I have exhusted most of the options java (like -Xxyz) but shows no performance improvement.
I am running this test loop to execute and taking time difference between start and end of loop:

for (int i = 0; i < 100000000; i++) {
Date d=new Date();
j=d.getTime();
j=j+i%10;
}

On windows XP professional 1G/2.8GHz it takes only 13sec approx but on solaris it takes more then 25 sec (tried with various options like -server/-client and various combination of -Xabc mentioned in various performance tuning document at Sun's site.) . All other java applications seems considerably slow on this solaris box.
I want to know is this a known behaviour or am I missing something?
Any help or pointer is highly appreciated.
Thank you.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
dpjo
Offline
Joined: 2006-05-10
Points: 0

Thank you both of you for your reply.

I'll try to be short in explaining the real problem.
"micro-benchmarks are not good indicator" - I am fully agree with briand.
But my real problem is the application that I am testing on
Windows,SunSPARC and SunADM64(which is a brand new box).

Application is multithreaded (around 25+ threads with few synchronization blocks as
threads share objects + with DB operations).
I test run it for approx 12-20 minutes (hope its enough time to "warm-up").
Same application gives good performance on SunSPARC-4cpu (as expected) but
worst on SunAMD64(2-cpu). I tested with java1.5.x -XX:+UseBiasedLocking option
too, but no improvemnt.
The only difference I see is older OS on SPARC comparing
to this box which has latest software+patches for Solaris10 and Oracle10g.

Thank you again for your time.
-dpjo

briand
Offline
Joined: 2005-07-11
Points: 0

Warm up times of 10-20 mins are well within reason.

You are comparing a 4-CPU SPARC (type? clock rate?) to a 2-CPU AMD (type? clock rate?), so your expectations on any direct comparisons should be appropriately adjusted. Depending on the vintages of the CPUs, your expectations may or may not be correct. Other factors, such as differences in the disk sub systems (caching disk arrays vs JBOD SCSI disks, for example) can also contribute to any observed differences.

To help you any more, I would need more detailed information on the hardware configuration and some performance statistics. It might also be useful to collect some application profiles from both the SPARC and AMD64 systems to see how they differ.

Also, I'll be at JavaOne next week, so I probably wouldn't be able to look at any data until the following week.

Brian

dpjo
Offline
Joined: 2006-05-10
Points: 0

Brian,

You gave a good hint, I should explore more on
hardware side and run more standard test to prove my claim.

To make things less complicated I am deliberately excluding SPARC
from the discussion for now. Have snapshot of data
from Windows and Solaris SunAMD & tried to maintain similar setup
in terms of load, processes etc on both environments.
For each iteration I used different -Xabc parameters to see if it make difference.
Each iteration goes for 10-20 mins.

The figures below we can call "messages processed per second" by
application (which is a single java process) are taken from actual application I talked earlier in my post.
I call AMD numbers "bad" in my definition of application performance.

Windows AMD
57.06134094 46.94835681
64.30868167 46.51162791
58.65102639 47.61904762
66.66666667 47.50593824
55.55555556 47.61904762
61.53846154 47.39336493
64.1025641 47.61904762
62.30529595 49.01960784
58.13953488 47.61904762
62.89308176 47.16981132
61.47540984 47.61904762
53.76344086 47.7326969

---
AMD Oteron 2cup 250 2392MHz 4GRAM:
SunOS hostname 5.9 Generic_118559-11 i86pc i386 i86pc
64-bit amd64 kernel modules
java version "1.5.0_06"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
Java HotSpot(TM) Server VM (build 1.5.0_06-b05, mixed mode)
---
Windows pro 2002 Intel P4 2.8GHz 1G RAM:
java version "1.5.0_06"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
Java HotSpot(TM) Client VM (build 1.5.0_06-b05, mixed mode, sharing)
I will post another set of test results later when you are back.

Thank you for your help.
-dpjo

dpjo
Offline
Joined: 2006-05-10
Points: 0

I did not find reasonable answer so far but doubling
number of threads may seem one option to achieve desired
numbers from my application for now.
Thanks all for help.

dpjo
Offline
Joined: 2006-05-10
Points: 0

Forgot to add that a test program at sun's site (HeapTest.java) gives expected (good) results on Solaris-AMD.
So I feel too much synchronization in my code is taking
toll of speed.

briand
Offline
Joined: 2005-07-11
Points: 0

There is the potential for a classic micro-benchmarking issue with this code which I covered in my JavaOne presentation last year. You are comparing a multi-processor system to a uni-processor system and the two systems have different default compilation policies that are impacted by operating system scheduling policies. At first I thought that's what was happening here, but I'm not yet convinced of it. Since I typed all this in, I figured I'd post it anyway as it might prove helpful to someone in the future.

When micro-benchmarks run on non-Windows systems, they run in background compilation mode. This means that compilation tasks are handled while the requesting thread continues to execute. On Windows systems, compilation happens in the foreground (-Xbatch vs -XX:+BackgroundCompilation) and the requesting thread blocks until the compilation is completed.

On uni-processor non-Windows systems, background compilations for most micro-benchmarks complete within a single time slice and the compiled code is ready by the time the requesting thread is back on processor. This happens because the compilation threads are typically idle when the request comes in and they have relatively high OS scheduler priority compared to the requesting thread. When the requesting thread is back on processor, it finds that the native code is available and immediately transfers to it, so it looks a lot like the -Xbatch mode used on Windows by default.

On multi-processor non-Windows systems, things are quite different for these micro-benchmarks. When the compilation thread gets a compilation task, it too has probably been idle and gets scheduled rather quickly. However, the requesting thread still has a free processor and therefore does not get preempted. Upon return from queuing the compilation request, it checks if the compile is complete and finds that it is not and so continues to execute in the interpreter. At some point in the not too distant future, the compiler will complete the compilation task and the native code will be ready. However, we won't transfer to it until the next call of the method. Since these micro-benchmarks typically only call the method of interest once (or even embed the measured code in main()), the thread will never transfer to that compiled code. So, we end up comparing interpreted code on the multi-processor system to a hybrid of interpreted and native code on the uni-processor (and multi-processor systems with -Xbatch).

Simply placing the code of interest into its own method and calling that method a few times before entering the measurement interval is all it really takes to avoid this issue.

That said, this doesn't seem to be what's going on with this micro-benchmark. This micro-benchmark seems to be measuring object allocation time, Date.getTime(), and garbage collection performance. It's possible that it's highlighting some difference in the underlying platform's support for Date or that there's some native interface differences. It's also possible that the multi-processor costs are contributing to the differences. It might also be due to a difference in the optimizations applied by the JIT. Right now, it's not obvious to me what the underlying cause really is.

Micro-benchmarks like these are typically not a good indicator of real application performance because real applications don't look or act like most micro-benchmarks. Your best bet is to run an actual application, and if you are using a multi-processor system, it's best if that application is multi-threaded.

You mention that other Java applications seem slow on this box. Is that a subjective comment or do you have data to support that theory? The multi-processor nature of the Solaris system does result in runtime synchronization differences within the JVM, which may be contributing to your perception. In this case, I would suggest that you try running these applications on 1.5.0_06 or later with -XX:+UseBiasedLocking. It can help avoid some of the synchronization costs associated with running on a multi-processor machine particularly, but not necessarily only, for single threaded codes).

I can tell you that Java performance and scalability on multi-threaded codes on multi-processor Solaris x86 and x64 systems is quite competitive and generally leads any other operating system on the same hardware. If you are going to JavaOne and want proof of that claim, attend our BOF on Thursday evening:

7:30-8:20p BOF-0623 ~ Javaâ„¢ Technology-Based Performance on Multithreaded Hardware Systems

HTH
Brian

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

Well this could have different reasons, I guess that all these sum up to the slowdown what you're seeing:

1.) On 64-bit machines java-objects are larger, and since your test creates tons of garbage the GC on AMD64 has more work (20-30%).

2.) Synchronization overhead is much larger on 2+ cpu boxes. On uniprocessor systems the jvm does some tricks, on a 2cpu system inter-cpu synchronization happens which is slow.
Since your test is single-threaded it also will only benefit very slightly from the 2nd cpu (the GC will use it maybe).

3.) Your test is system-dependent, since it includes asking the underlaiyng OS for the actual time.
This may be slower on Solaris than on windows.

I would not worry about results made by benchmarks like yours ;)

lg Clemens