Skip to main content

java.util.concurrent.locks.Condition.await(timeout, units) hangs forever

19 replies [Last post]
neighbour
Offline
Joined: 2007-11-27
Points: 0

During an execution of
java.util.concurrent.locks.Condition.await(timeout,
TimeUnit.MILLISECONDS) a thread hangs on Solaris/AMD x64 instead of being resumed after timeout is gone.

The environment is:
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) Server VM (build 11.2-b01, mixed mode)
SunOS x2001 5.10 Generic_127128-11 i86pc i386 i86pc
Update from 10.06.2009: Sorry, the version of Solaris is: SunOS x2001 5.10 Generic_137138-09 i86pc i386 i86pc
AMD Opteron 2356 2 CPU x Quad-Core 2312 MHz

Steps to reproduce the bug:
1) Run the attached test application on Solaris/AMD x64 platform.
2) In 2-20 minutes the bug should be reproduced with the message "JVM Bug found in 8 threads !!!!" in the log

Here is a part of source code:

// one thread:

lock.lock();
try {
try {
while (queueSize == 0) {
if (condition.await(AWAIT, TimeUnit.MILLISECONDS)) {
conditionCount++;
}
}
awaitCount++;
} catch (InterruptedException e) {
e.printStackTrace();
}
} finally {
lock.unlock();
}

// another thread:

lock.lock();
try {
bufferPos = (int) (eventsCount % LONG_DATA_COUNT);
if (queueSize == 0 && Math.random() > SIGNAL_PROBABILITY) {
condition.signal();
}
eventsCount++;
queueSize++;
} finally {
lock.unlock();
}

The bug is NOT reproduced on the platforms:
Solaris / UltraSPARC T1
Solaris / UltraSPARC T2+
Solaris / UltraSPARC IV+
Linux / Xeon

Solaris version is corrected from "Generic_127128-11" to "Generic_137138-09"

Message was edited by: neighbour

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
dholmes
Offline
Joined: 2005-12-11
Points: 0

Paul,

> the bug seemed to be fixed after we applied the
> latest Solaris patches. After we applied them we were
> no longer able to reproduce the bug ( using the code
> provided in that forum thread ).

Do you recall exactly which patches you applied that fixed this? I'm trying to pin down the root cause.

Thanks,
David Holmes

neighbour
Offline
Joined: 2007-11-27
Points: 0

Unfortunately, the attached test and showrev output were removed somehow.
Here they are: "jvm-await-bug.zip", "x2001-show-rev.txt".

neighbour
Offline
Joined: 2007-11-27
Points: 0

After applying the latest Solaris release from May 2009 "Kernel version: SunOS 5.10 Generic_139556-08, Solaris 10 5/09 s10x_u7wos_08 X86", the bug is not reproduced any longer.

dholmes
Offline
Joined: 2005-12-11
Points: 0

Thanks for the update.

So you went from a Solaris 10 update 5 install to Solaris 10 update 7, and that has seemingly fixed the problem.

We were unable to reproduce the problem. I will try to track down what fix in S10u6 or S10u7 might have fixed this.

David Holmes

Edited to correct version info.

Message was edited by: dholmes

dholmes
Offline
Joined: 2005-12-11
Points: 0

It is possible that the root cause here was Solaris bug 6600939, which was fixed in Solaris 10 update 6.

David Holmes

neighbour
Offline
Joined: 2007-11-27
Points: 0

No, we went from Solaris 10 update 6 (Solaris 10 10/08 s10x_u6wos_07b X86) to Solaris 10 update 7 (Solaris 10 5/09 s10x_u7wos_08 X86).

Indeed the root cause was Solaris bug 6600939, I submitted this problem too: http://forums.java.net/jive/thread.jspa?threadID=61998.
But the point was: the Solaris bug 6600939 is stated as fixed in the patch 137112-01 (http://sunsolve.sun.com/search/document.do?assetkey=1-21-137112-01-1) dated by June 2008. We had "Solaris 10 10/08 s10x_u6wos_07b X86" dated by October 2008, so that Solaris release contained this patch, but the bug was still there.

Only when we moved to update 7, the problem has gone.

dholmes
Offline
Joined: 2005-12-11
Points: 0

Neighbor,

Your original post states:

SunOS x2001 5.10 Generic_127128-11 i86pc i386 i86pc

and 127128-11 corresponds to Solaris 5/08 which is update 5. Hence my comment.

If you are indeed seeing this with update 6 then I need to dig further in update 7 to see if there is any additional fix related to 6600939.

When a future time is reported due to this bug subsequent calls to nanoTime will return the same value until time catches up with the erroneous value - this guarantees the monotonic non-decreasing property of nanoTime (at least on Solaris).

David Holmes

neighbour
Offline
Joined: 2007-11-27
Points: 0

The version "Generic_127128-11" was a mistake. The real version was "Generic_137138-09". BTW in the attached file "x2001-show-rev.txt" the correct version is mentioned from the very beginning.

I've updated the top message in this thread accordingly.

chrjohn
Offline
Joined: 2008-03-11
Points: 0

Hi,
does someone still have the attachments available (especially the test program)? They seem to be missing from this thread.

Thanks in advance,

Chris.

dholmes
Offline
Joined: 2005-12-11
Points: 0

Thanks for the clarification. Unfortunately it means we've gone from "problem solved" to having a new mystery. :(

David Holmes

pmehrer
Offline
Joined: 2009-02-23
Points: 0

Hello,

the bug seemed to be fixed after we applied the latest Solaris patches. After we applied them we were no longer able to reproduce the bug ( using the code provided in that forum thread ).

best regards
Paul

rogyeu
Offline
Joined: 2006-07-30
Points: 0

Thanks for the info. neighbour, can you please make sure you have the latest Solaris patches?

-- RY

neighbour
Offline
Joined: 2007-11-27
Points: 0

We've applied the latest Solaris patches to the host and still we are able to reproduce the problem.
So the patches didn't help.

the output of the command "showrev -a" is in the attachment http://forums.java.net/jive/servlet/JiveServlet/download/119-57703-33217...
Probably, we missed some patches?

SUN admitted this bug with the number 6807483:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6807483

rogyeu
Offline
Joined: 2006-07-30
Points: 0

Please try the suggestion in the bug report comment. We would like to isolate the issue. You may post on the bug report.

Thanks,
RY

neighbour
Offline
Joined: 2007-11-27
Points: 0

Sorry, what is the suggestion you are talking about? What else should I post on the bug report?

dholmes
Offline
Joined: 2005-12-11
Points: 0

The suggestion in the bug report was to replace Thread.sleep with Object.wait to exclude the possibility of Thread.sleep returning early - which is a problem on Solaris in some circumstances.

The latest request is for a pstack/jstack dump of the hung process showing where the threads are. Taking a few in quick succession would help establish which threads are truly blocked.

There are also issues with synchronization (or lack thereof) in your test program. This is unlikely to be the cause of the problem unless it occurs when values overflow from needing 32-bits to needing 33-bits. Note that volatile longs still need to be accessed under a lock as the update to them is NOT atomic.

Please forward information via the bug report as I don't follow these forums.

Thank you.
David Holmes

neighbour
Offline
Joined: 2007-11-27
Points: 0

> The suggestion in the bug report was to replace Thread.sleep with Object.wait to exclude the possibility of Thread.sleep returning early - which is a problem on Solaris in some circumstances.

In the provided test there is no wrong usage of methods. It is not a right way to "solve" the problem by making the reproducible code non reproducible, is it? Apart from this test, in our production code there are no usages of Thread.sleep, but there is a usage of "await(timeout, units)" and other java.util.concurrent methods with timeouts, and this is the point.

> The latest request is for a pstack/jstack dump of the hung process showing where the threads are. Taking a few in quick succession would help establish which threads are truly blocked.

In the attached file "park-nanos-pstack-jstack.zip" there is output of the commands "pstack", "jstack -l", "jstack -m" with a few seconds interval.

> Please forward information via the bug report as I don't follow these forums.

I looks like there is no chance to attach anything to a ticket in bugs.sun.com, only text description is allowed. So I put the attachment here and will provide a link to it from the bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6807483.

rogyeu
Offline
Joined: 2006-07-30
Points: 0

Could you please submit a bug report at http://bugreport.sun.com with a complete runnable sample?

Thanks,
Roger Y.

pmehrer
Offline
Joined: 2009-02-23
Points: 0

Hello,

our new sun X2200 M2 with opteron 2356 quad core and java 1.5.0_17-b04 have the same issue. We first noticed it in the gc.log. The elapsed time since jvm start just jumps from a valid value of several ten thousand seconds to something 2 million seconds. From there on the tomcat becomes instable. We already submited a bug report.

Thanks for the code snippet, it seems to be a good indicator.

best regards
Paul