Posted by praveren
on August 10, 2010 at 12:21 AM PDT
We have a multi-threaded java application which uses approx. 500 to 600 threads as "worker" threads.
The worker threads are started when the application starts up and are in the wait state till a task is available.
When a task is available, a worker thread is chosen randomly from the wait pool and the selected worker thread completes the assigned task and goes back in to the wait pool. The application receives tasks (several hundreds per minute) and delagates the tasks to worker threads.The worker threads have the same thread PRIORITY and compete for processor time.
The task are short-lived and a thread typically takes 1 or 2 seconds (on a average) to complete the task. We are facing an issue with the thread scheduling/execution as follows ;
A worker thread A is assigned a task T and while executing the task T ,thread A enters 'wait' state for event X to occur.When event X occurs, another thread B
notifies thread A that event X has occurred. After thread A is notified, thread A should resume its work immediately and continue with the task T.
However, it takes 5 to 15 seconds for thread A to begin executing again (i.e) get the processor time. We added debug traces to check if the notification from thread B happens with a delay but the traces confirm that thread A is notified immediately after event X occurs (which occurs almost within a few milliseconds after thread A entered the WAIT state).
This behavior is sporadic and happens 4 or 5 times for every 10,000 tasks (on an average). The threads normally get notified and resumes execution immediately after the event occurs but in some cases , we are seeing a huge delay in the order of 15 seconds which is unacceptable for the application's throughput.
We are unable to determine the cause/reasons for such a behavior and wonder if it is related to thread scheduling/time slicing algorithms defined by Java/underlying OS.
Any inputs/ideas to find out the root cause or resolve the problem ?
NOTE: We are using JRE/JDK version 1.6.0 update 17 on Windows 2003 server platform and all worker threads have the same thread priority of 8.
Thanks in advance.