Skip to main content

Garbage Collection Pauses & Non-interruptable System Calls

3 replies [Last post]
Joined: 2003-06-11

I have a problem I was hoping with which I need some advice.

We wrote a custom JNI library for file I/O that sits underneath the Java NIO FileChannel. One of our driving requirements is highly performant file I/O. We achieved this by doing DMA I/O from large direct memory aligned buffers. The JNI is very trivial - it just takes a buffer and performs the appropriate system call based on the parameters given to it. 100% of the logic for calculating offsets, buffer management, etc. is all in our implementation of java.nio.FileChannel.

Here's our problem: We have requirements to respond to some messages in as little as 250 ms. During this time, we're doing file writes of 128 MB that take around 200 ms. When GC kicks in, it tries to pause all threads. Because the DMA write is non-interruptable, GC waits for the I/O to complete before being able to pause the thread & run. That means that GC can take well over 200 ms putting us in grave danger of missing our timelines. Worse, there is always the chance the write will hang due to a bad filesystem. We've seen this cause the JVM to hang indefinitely forcing us to cycle the process.

Unless we find a solution that allows GC to continue while doing this I/O, we will convert all the code to C++. While that might solve our timeline for that particular process, we have many less performance critical processes that use our JNI FileChannel libraries that would hang if a filesystem goes bad.

We've tweaked the file system device timeouts down to a minimum, but they are still very high (on the order of several seconds to minutes). It would be nice if the JVM had a similar timeout for pausing threads, i.e., where the pause times out after X number of milliseconds. We'd be willing to sacrifice a larger heap size and postpone GC in the hopes that the next time it ran GC, we wouldn't be in the middle of a non-interruptable system call.

The only solution being batted around here is pushing the system calls out of Java threads and into native threads. The JNI call would push the info for the I/O call onto a native C++ queue where a small number of native threads (3?) would pull the data off the queue and perform the actual system call. The trick is finding an implementation where the Java thread blocked waiting on a response from the native thread is interruptible. All this assumes GC doesn't try to pause native threads. We thought about using pthreads, but were concerned about its signal interaction with the JVM. So, we're leaning towards using pipes to push data from one thread to another.

If you have any suggestions or advice, we are desperate for your wisdom.


Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Joined: 2005-07-11

You should consider looking into real time java. With little change to your code, you
can run your high priority threads at priorities higher than the garbage collector. With
more change to your code, you can get your response times down significantly from
your 250ms limit, giving you plenty of head room.


Joined: 2003-07-06

You could introduce clustering in the picture. Load-balancer will only hand requests to instances that are not in imminent need to collect (which you could determine by inspecting memory pools via JMX). Notification of load-balancer can be in either "pull" or "push" fashion. IN "push" scenario, monitoring thread would ask to be removed from the cluster, force a GC on itself, and re-add itself to the cluster, fresh and rested :)

I did not understand what solution you are looking for with regards to bad filesystem case. As is, you seem to be violating SLA if that happens. But you want to keep the SLA and tell the client that you failed saving, right? The solution is fairly trivial if the client is remote, since its accepting thread and connection can either be interrupted or time out, respectively. If the client is in-process, I'd like to think a queue-based solution can be built to give you the behavior you are looking for.

Joined: 2003-06-10

Do the DMA I/O from a thread not associated with the Java runtime?