Skip to main content

System.nanoTime sporadic and hangs on multi-processor VMWare

4 replies [Last post]
Joined: 2006-04-10

I would like to know if anyone knows of a configuration or alternate OS to Solaris that works under VMWare ESX.

Consecutive calls to System.nanoTime() return sporadic differences that are zero, negative, large/huge, and stuck/hung/frozen/not changing for long periods on multi-processor Solaris x86 on/under VMWare. This has not been reproduced on non-VMWare multi-processor systems and single-processor VMWare systems.

A trouble ticket was opened with VMWare and they indicated the %RDY was too high at 50 - 100% instead of 5%. They also recommend single-processor VMs vs. multi-processor VMs. The bottom line is that if the code does not run deterministically.

Date Created: Fri Jun 12 08:10:20 MDT 2009
Type: bug
Category: java
Subcategory: classes_lang
release: 6u10
hardware: sun4
OSversion: solaris_10
priority: 4
Synopsis: System.nanoTime stops advancing and GC Logging malfunctions reporting 0 time
Reproduced with both 1.6.0_2 and 1.6.0_13

SunOS 5.10 Generic_138889-06 i86pc i386 i86pc - latest patch level

Reproduced on both AMD Opteron 8356 and Intel Xeon E5345.
This issue has only been observed when running with multiple processors on VMWare ESX. Not observed or reproducible on multi-processor Ubuntu 8.10, Kubuntu 9.04, Gentoo Linux, or Sparc Solaris when they are not running on VMWare. Also not observed on single processor Solaris running on VMWare ESX.

GC logging malfunctions then stops reporting accurately. The underlying symptoms can be reproduced if System.nanoTime is polled continuously because it eventually stops incrementing. This condition causes many java.util.concurrent classes which rely on System.nanoTime to malfunction including ThreadPoolExecution, delay, and blocking queues among others.

Specifically in java.util.concurrent.locks.AbstractQueuedSynchronizer.ConditionObject.awaitNanos(long nanosTimeout) will not timeout if nanoTime stops incrementing and client code can hang indefinitely. Also it has been noted that System.nanoTime only on multi-processor VMWare sporadically increments by zero and negative values. This has never been observed when non-VMWare multi-processor or when on single-processor VMWare. System nanoTime returning sporadic negative values would also cause the above referenced awaitNanos to malfunction.

Run the attached test code on multi-processor VMWare and eventually the nanoTime will stop incrementing.

Not expecting nanoTime to stop incrementing indefinitely.
GC log sporadically reports large value, then reports zero indefinately:

Below you can see that GC logging initially is normal, then it reports a very large value ~38 hours, then reports zero indefinitely (for at least 12+ hours).

295.341: [GC [PSYoungGen: 197118K->3205K(204224K)] 578435K->388848K(1387968K), 0.5534078 secs]
296.489: [GC [PSYoungGen: 195013K->5698K(204672K)] 580656K->393592K(1388416K), 0.3963039 secs]
298.493: [GC [PSYoungGen: 197890K->3580K(204608K)] 585783K->392768K(1388352K), 0.4133058 secs]
300.697: [GC [PSYoungGen: 196156K->10969K(202688K)] 585344K->402131K(1386432K), 138476.3001686 secs]
138776.998: [GC [PSYoungGen: 201592K->12755K(203520K)] 592754K->413325K(1387264K), 0.0000000 secs]
138776.998: [GC [PSYoungGen: 203475K->12147K(198912K)] 604045K->423147K(1382656K), 0.0000000 secs]
138776.998: [GC [PSYoungGen: 197491K->8443K(201216K)] 608491K->430399K(1384960K), 0.0000000 secs]

Output from attached test code: eventually the nanoTime stops incrementing indefinately:

Below it can be seen that nanoTime has not incremented for 40-50 thousand executions.
Also it can be observed that the maximum difference between two consecutive nanoTime execution was reported as ~80 days. The test only ran for a few minutes.
On non-VMWare it increments on every execution and has never been observed to stop incrementing.

pool-1-thread-9 maxConsec=43314 minDiff=0 maxDiff=6978157632565349 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-22 maxConsec=48786 minDiff=0 maxDiff=6978158312802313 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-24 maxConsec=50552 minDiff=0 maxDiff=6978158093338584 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-16 maxConsec=44805 minDiff=0 maxDiff=6978157852663558 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-27 maxConsec=48969 minDiff=0 maxDiff=6978158252826428 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-19 maxConsec=48628 minDiff=0 maxDiff=6978157632643095 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-1 maxConsec=46531 minDiff=0 maxDiff=6978157792642802 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-3 maxConsec=46575 minDiff=0 maxDiff=6978157752584362 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-12 maxConsec=49622 minDiff=0 maxDiff=6978157472579260 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-17 maxConsec=48469 minDiff=0 maxDiff=6978157912748608 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-28 maxConsec=53015 minDiff=0 maxDiff=6978157674021878 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-13 maxConsec=42159 minDiff=0 maxDiff=6978157512622147 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-28 maxConsec=53016 minDiff=0 maxDiff=6978157674021878 a=13956314863692146 b=13956314863692146 changed=maxConsec

This bug can be reproduced often.

---------- BEGIN SOURCE ----------
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

* @author Troy T. Collinsworth
public class TestNanoTime implements Runnable {

public static void main(String[] args) throws InterruptedException {
int threads = 30;

final ExecutorService es = Executors.newFixedThreadPool(threads);
for (int i = 0; i < threads; i++) {
es.execute(new TestNanoTime());
System.out.println("Completed test, exiting");

public void run() {
int loopSize = 10;
long[] times = new long[loopSize];

int consec = 0;
int maxConsec = 0;
int currConsec = 0;
long a = System.nanoTime();
long b = System.nanoTime();
long minDiff = b - a;
long maxDiff = b - a;

String changed = null;
while (true) {
for (int i = 0; i < loopSize; i++) {
times[i] = System.nanoTime();
a = b;
b = System.nanoTime();
long newDiff = b - a;
if (newDiff < minDiff) {
minDiff = newDiff;
changed = "minDiff";
if (newDiff > maxDiff) {
maxDiff = newDiff;
changed = "maxDiff";
if (a == b) {
if (consec > currConsec) {
currConsec = consec;
if (currConsec > maxConsec) {
maxConsec = currConsec;
changed = "maxConsec";
} else {
consec = 0;
if (changed != null) {
System.out.println(Thread.currentThread().getName() + " maxConsec=" + maxConsec + " minDiff=" + minDiff
+ " maxDiff=" + maxDiff + " a=" + a + " b=" + b + " changed=" + changed);
changed = null;

---------- END SOURCE ----------

Message was edited by: tcollinsworth

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Joined: 2005-02-28

God bless you, tcollinsworth! This was a nasty issue for us until we found your description here. Your test program accurately reported our problem. More on our problem here:

Joined: 2006-04-10

Dual processor Windows Server 2008 with Java 1.6.0_13 on VMWare does not exhibit zero, negative, or stuck nano times. Consecutive nanoTime calls appear 5-10 times slower than on Solaris.

Joined: 2006-04-10

Dual processor Fedora 11 on VMWare does not exhibit zero, negative, or stuck nano times, so this appears to be a Solaris on VMWare issue.

So as it turns out the VMWare instance created for windows was reloaded with both Fedora 11 and Solaris 5.10 and none of them exhibited the issue. So it seems that the VM image defaults for Windows vs. Solaris may be the issue.

Message was edited by: tcollinsworth

Joined: 2005-11-01

One resaon you might see negative timing is nanoTime is that takes the clock cycle count on the CPU and divides it by what it believes is the clock speed. However different CPUs don't start at exactly the same time so it possible to see time going backward if your thread switches between CPUs.

Linux for example, works out the difference between the CPU and compensates for this. However, windows XP for example doesn't

I suspect you may not get this issue if you locked your VM or your process to one CPU (even with multiple cores)