Skip to main content

Java I/O clarification required

6 replies [Last post]
georgew
Offline
Joined: 2005-01-26
Points: 0

Can someone please confirm or deny that the following code for file copy:

public static void io (InputStream in, OutputStream out)
throws IOException {
byte[] buffer = new byte[8192];
int amount;
while ((amount = in.read (buffer)) >= 0)
out.write (buffer, 0, amount);
}

will result in:

1) one buffer-size of data read in the buffer and copied to the output stream

2) the same buffer-size is written to disc from the output stream.

3) the next buffer-size of data is APPENDED to the output stream

4) and is also written to disc, and so on.

With the net result that at the end of the copy the output stream in memory is as large as the input file, while the physical input and output are handled only one buffer at the time.

I am triyng to ascertain how much memory is really being used by the copy. Is it a buffer size only, or the whole file size, or the whole file size times two?
Many thanks

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
georgew
Offline
Joined: 2005-01-26
Points: 0

Thank you Noah and Christian for your replies and further clarifications. It would seem then that there is no easy way to guarantee that only one buffer of real memory is used because the operating system will try to optimise speed even if we don't want to, for instance, when memory is more important than speed.
Thank you for the interesting code Christian, I'll look at it to see if i can use it to advantage.
Cheers

christian_schli...
Offline
Joined: 2005-05-21
Points: 0

If you're just scared about memory rather than performance (is this for J2ME?), then you might consider using a static member which is a SoftReference to a buffer which is protected by a synchronized statement.

This would omit the "new" statement on every method call. Please remember that there is no guarantee that upon exit of your method, the memory allocated by the "new" statement is actually reclaimed. This depends on the behaviour of the garbage collector, which normally works asynchronously.

Regards,
Christian

christian_schli...
Offline
Joined: 2005-05-21
Points: 0

This topic is actually much more complex than it seems...

Your I/O routine works, but there are two performance bottlenecks:

1) A new buffer has to be allocated for each call to io(...) on the heap. While this is thread safe, it puts a big burden on the JVM.

2) The copying is synchronous, so the read side is idling while the write side is busy and vice versa.

This results in really bad copying performance. Hence, for my upcoming release of TrueZIP 5.0, I have developed an asynchronous, thread safe version of a copy routine named cat(InputStream, OutputStream). This implementation uses a separate Thread (the anonymous Reader class) to read the source and store the result in a set of shared buffers cached in SoftReferences. Of course, this algorithm is thread safe.

According to various tests, I have monitored that cat(...) cuts down the gross run time for copying files in the same directory by about 33% (!) compared to the plain synchronous routine. Depending on your system configuration, results may vary of course.

Please note that InputIOException is a plain subclass of IOException and that you have to set a suitable size for the shared buffers. According to some tests, 64KB (!) seem to be a good balance for various kinds of Input/OutputStreams (FileInput/OutputStream, ZipFile/ZipOutputStream, etc.) for files in the same directory.

If the InputStream and OutputStream refer to resources on different disks, the performance results will even get better because then the routine will benefit from the fact that these devices can work truly concurrently.

I am currently working to improve this algorithm with an automatic buffer sizing strategy, so that the routine can deliver best possible performance for all combinations of devices.

package de.schlichtherle.io;

public class File /*...*/ {
    /*... */

    /**
     * Copies all data from one stream to another without closing them.
     * 

* This method uses asynchronous read/write operations on shared buffers * in order to provide a performance equivalent to JSE's nio * package. The buffers are reclaimable by the garbage collector. *

* Like any of the cat methods in this class, this method * is guaranteed never to close any parameter stream. * * @throws InputIOException If copying the data fails because of an * IOException in the input stream. * @throws IOException If copying the data fails because of an * IOException in the output stream. */ public static void cat(final InputStream in, final OutputStream out) throws IOException { // Note that we do not use PipedInput/OutputStream, because these // classes are slooowww. This is partially because they are using // Object.wait()/notify() in a suboptimal way and partially because // they copy data to and from an additional buffer byte array, which // is redundant if the data to be transferred is already held in // another byte array. // As an implication of the latter reason, although the idea of // adopting the pipe concept to threads looks tempting it is actually // bad design: Pipes are a good means of interprocess communication, // where processes cannot access each others data directly without // using an external data structure like the pipe as a commonly shared // FIFO buffer. // However, threads are different: They share the same memory and thus // we can use much more elaborated algorithms for data transfer. // Finally, in this case we will simply cycle through an array of // byte buffers, where an additionally created reader thread will fill // the buffers with data from the input and the current thread will // flush the filled buffers to the output. final Buffer[] buffers = allocateBuffers(); /* * The task that cycles through the buffers in order to fill them * with input. */ class Reader extends Thread { /** The index of the next buffer to be written. */ int off; /** The number of buffers filled with data to be written. */ int len; /** The IOException that happened in this task, if any. */ volatile InputIOException exception; Reader() { super("Reader@File.cat(...)"); //setDaemon(true); // not required. } public void run() { // Cache some data for better performance. final InputStream _in = in; final Buffer[] _buffers = buffers; final int _buffersLen = buffers.length; // The writer thread interrupts this thread to signal that it // cannot handle more input because there has been an // IOException during writing. // We stop processing in this case. int read; do { // Wait until a buffer is available. final Buffer buffer; synchronized (this) { while (len >= _buffersLen) { try { wait(); } catch (InterruptedException interrupted) { return; } } buffer = _buffers[(off + len) % _buffersLen]; } // Fill buffer until end of file or buffer. // This should normally complete in one loop cycle, but // we do not depend on this as it would be a violation // of InputStream's contract. final byte[] buf = buffer.buf; try { read = _in.read(buf, 0, buf.length); } catch (IOException failure) { read = -1; exception = new InputIOException(failure); } if (Thread.interrupted()) read = -1; // throws away buf - OK in this context buffer.read = read; // Advance head and notify writer. synchronized (this) { len++; notify(); // only the writer could be waiting now! } } while (read != -1); } private void shutdown() { interrupt(); // Synchronization with this reader thread is required so that // a re-entry to the cat(...) method by the same thread cannot // not reuse the same shared buffers that an unfinished reader // thread of a previous call is still using. while (true) { try { join(); break; } catch (InterruptedException ignored) { } } } } try { final Reader reader = new Reader(); reader.start(); // Cache some data for better performance. final int buffersLen = buffers.length; int write; while (true) { // Wait until a buffer is available. final int off; final Buffer buffer; synchronized (reader) { while (reader.len <= 0) { try { reader.wait(); } catch (InterruptedException ignored) { } } off = reader.off; buffer = buffers[off]; } // Stop on last buffer. write = buffer.read; if (write == -1) break; // reader has terminated because of EOF or exception // Process buffer. final byte[] buf = buffer.buf; try { out.write(buf, 0, write); } catch (IOException failure) { reader.shutdown(); throw failure; } // Advance tail and notify reader. synchronized (reader) { reader.off = (off + 1) % buffersLen; reader.len--; reader.notify(); // only the reader could be waiting now! } } if (reader.exception != null) throw reader.exception; } finally { releaseBuffers(buffers); } } private static final Buffer[] allocateBuffers() { synchronized (buffersList) { Buffer[] buffers; for (Iterator i = buffersList.iterator(); i.hasNext(); ) { buffers = (Buffer[]) ((Reference) i.next()).get(); i.remove(); if (buffers != null) return buffers; } } // A minimum of two buffers is required. // The actual number is optimized to compensate for oscillating // I/O bandwidths like e.g. in networks. final Buffer[] buffers = new Buffer[4]; for (int i = buffers.length; --i >= 0; ) buffers[i] = new Buffer(); return buffers; } private static final void releaseBuffers(Buffer[] buffers) { synchronized (buffersList) { buffersList.add(new SoftReference(buffers)); } } /** * Holds a soft reference to an array initialized with {@link Buffer} * instances. */ private static final List buffersList = new LinkedList(); private static class Buffer { /** The byte buffer used for asynchronous reading and writing. */ byte[] buf = new byte[ZipConstants.FLATER_BUF_LENGTH]; /** The actual number of bytes read into the buffer. */ int read; } /*...*/ } \

georgew
Offline
Joined: 2005-01-26
Points: 0

Thanks tackline, you have clarified some of the issues but I am still unclear if there is a way (in Java 1.4) to avoid or prevent the operating system from creating the whole stream in memory, I want to make sure memory use is limited to the size of the buffer.
Cheers

noahcampbell
Offline
Joined: 2004-04-26
Points: 0

Keep in mind that OS and the hardware rely on caches to get a majority of it's performance increase. The easiest way to demonstrate this is to copy a file using the loop above with and without a call to flush(). This is an instruction for the OS to move data out of cache and back to the file system. It may or may not remove the memory used for the logical file, but it's contract is to write it to disk.

Digging a little deeper, this write may fail due to 3 feet of water surrounding the computer in which case your program nor the disk will get any information regarding the success of the write. If you want to be absolutely certain, you need to write to a raw partition, something java may be able to do given an appropriate JNI library for the particular os. However, you're application will either be more complex or slow.

Noah

tackline
Offline
Joined: 2003-06-19
Points: 0

> public static void io (InputStream in, OutputStream
> out)

I assume these are FileInputStream and FileOutputStream respectively.

> I am triyng to ascertain how much memory is really
> being used by the copy. Is it a buffer size only, or
> the whole file size, or the whole file size times
> two?

Most operating systems will do some level of buffering. If you have sufficient free memory, both files may end up in the system file cache. It would be unfortunate if the disc head had to move across the disc twice for each iteration of your loop (assuming both files are on the same physical drive). If you are unlucky, your operating system may swap out your applications to make way.

There is no need for the whole of either file to be resident in the process memory space at once, nor for any allocations to be made within the loop. Unless one of the input streams is, say, a ByteArray(In|Out)putStream.

If it's performance you are after, you can move the transfer outside of application space:

http://download.java.net/jdk6/docs/api/java/nio/channels/FileChannel.html#transferTo(long, long, java.nio.channels.WritableByteChannel)