Skip to main content

Copy-on-write optimization for StringBuilder and StringBuffer

11 replies [Last post]
serverperformance
Offline
Joined: 2008-08-29
Points: 0

Hello everybody.

The old 1.4 version of StringBuffer had a very nice implementation of a "copy-on-write" (COW), [as have the C++ String and the .NET StringBuilder, I think]. In few words: when the toString() method is called, it doesn't copy the char array to the String, but shares it with the String and marks as "shared" inside; and if there are future call to insert or remove or others in the "immutable" section of the char array, then creates a copy of the array and goes on (so the String keep immutable and the StringB can mute.

For some reason, this changed in JDK 1.5, becoming in a "always copy" strategy. Why? Haven't find it. I have only found the forum entry dated in 2005: http://forums.java.net/jive/thread.jspa?messageID=29032&#29032.

I don't understand what kind of concurrency problem that the COW approximation has. Can someone put the light on me?

For testing, I have hacked AbstractBuilder, StringBuilder and StringBuffer with the COW hack. And in a microbenchmark there is a significant boost in both cases, also NetBeans, Glassfish, my J2EE apps and so on worked great with this patch. Sorry but no deep multithreading testing...

Why not reintroducing it in JDK 7? No API changes, cheap change, and a good boost.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
mthornton
Offline
Joined: 2003-06-10
Points: 0

The trouble with security flaws is that someone will try to use them deliberately, so their rarity in normal code is irrelevant.

Yes escape analysis could be used to avoid the copy in such cases, but whether it is worth the effort (and increase in size of the JVM) is another question.

serverperformance
Offline
Joined: 2008-08-29
Points: 0

@telio: That is! I hadn't found the bug but I knew it had to be threr. Very well explained, at leasts this issue has been taken seriously by JDK guys during 2005 and 2006.

@mthornton: I agree. I think it worths it, should be considered.

@all: Nothing to add to nthornto explanations nor the conclussions taken in 2006 (shown in the Bug database)... But now let's talk about JDK 7 and newest Hostpot versions. Three years later. I think it is important enough to re-test, re-think, re-consider, re-escape.

We have seen last months several "JDK 6 performance release"s. I can't believe this issue doesn't matter.

Nevertheless, I have some ideas in this direction, in the following days I will open other thread i this forum with an API change proposal...

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

> We have seen last months several "JDK 6 performance
> release"s. I can't believe this issue doesn't matter.
Well, I trust the guys at Sun to understand the "global" impact of this de-optimization quite well, so in this case it doesn't seem to matter a lot outside of micro-benchmarks.

> Nevertheless, I have some ideas in this direction, in
> the following days I will open other thread i this
> forum with an API change proposal...
OpenJDK is open and free, so feel free to hack :)

lg Clemens

serverperformance
Offline
Joined: 2008-08-29
Points: 0

As I promised, I have worked in my idea. No rare things or magic, just a different point of view.

You can see the details, source code, microbenchmark and results in the OpenJDK core-libs development forum... http://mail.openjdk.java.net/pipermail/core-libs-dev/2008-September/0007...

As you can see in my microbenchmark results, executed in Linux x64 and
Windows 32 bits (-server, -client, and -XX:+AggressiveOpts versions), we can
achieve a performance increase between 1% and 167% (depends on the scenario and
architecture)...

The abstract is:

- Create a new class java.lang.StringAppender implements Appendable
1. Mostly same in its exposed public constructors and methods than
StringBuilder, but the only operations are the “append()” ones (no insert,
no delete, no replace)
2. Internally represented as a String array
3. Only arraycopy() or create char arrays once, inside the toString() method
(well, this isn’t completely true: also arraycopies when appending
objects/variables other than String instances or char arrays, but the most
typical operation is appending strings!)

- Add a new constructor in the java.lang.String class (actually 5 new
constructors for performance reasons, see below):
1. public String(String... strs)
2. public String(String str0, String str1)
3. public String(String str0, String str1, String str2)
4. public String(String str0, String str1, String str2, String str3)
(NOTE: these 3 additional constructors are needed to boost appends of a
small number of Strings, in which case the overload of creating the array
and then looping inside is much greater than passing 2, 3 or 4 parameters in
the constructor invocation).

- Change the javac behavior: the String + operator must be translated into
“new String(String... )” instead of “new StringBuilder().append().append()... ..toString()”

Comments?

perrito666
Offline
Joined: 2006-08-20
Points: 0

On the same note, is it possible to provide two more optimizations on StringBuilder

a) raw access to the char buffer backing stringbuilder thru callback

say

stringbuilder.runThisCallback(new StringBuilderCallback {
void handleData(final char[] data, int length) {
}
}

This will allow us to access the internal data of string builder very very efficiently without the need to create a new string or call charAt() on each

b) Port all the fancy methods on String to operate on StringBuilder directly so that one do not need to create a new string everytime he wants to play with the data.

Since StringBuilder is like stl string, i feel it should have a lot of power and fairly good design behind it.

i30817
Offline
Joined: 2006-05-02
Points: 0

Really man, the only thing that is really needed is to make the CharSequence interface support getChars (that by the way, is the way to copy the chars without a needless copy. Your handleDate would copy or allow internal modification. Not ok even for stringbuilders).

For your perusal i will append these ugly classes that implement a LazyStringBuilderReader (so you can stream chars from a stringbuilder). I'm not sure about these classes - seems like a lot of code just to do a Lazy Char Reader - , but what the hell.

Edit: i refactored the code, it looks better now.

[code]
package util.io;

import java.io.IOException;
import java.io.Reader;

/**
* A character stream whose source is a StringBuilder.
*
* This class exists to stream stringbuilders without copies.
* Just copied the (open)jdk class and did some alterations.
*/
public class StringBuilderReader extends Reader {
StringBuilder str;
int length;
int next = 0;
int mark = 0;

/**
* Creates a new string reader.
*
* @param s String providing the character stream.
*/
public StringBuilderReader(StringBuilder s) {
super();
this.str = s;
this.length = s.length();
}

/** Check to make sure that the stream has not been closed */
private void ensureOpen() throws IOException {
if (str == null)
throw new IOException("Stream closed");
}

/**
* Reads a single character.
*
* @return The character read, or -1 if the end of the stream has been
* reached
*
* @exception IOException If an I/O error occurs
*/
public int read() throws IOException {
synchronized (lock) {
ensureOpen();
if (next >= length)
return -1;
return str.charAt(next++);
}
}

/**
* Reads characters into a portion of an array.
*
* @param cbuf Destination buffer
* @param off Offset at which to start writing characters
* @param len Maximum number of characters to read
*
* @return The number of characters read, or -1 if the end of the
* stream has been reached
*
* @exception IOException If an I/O error occurs
*/
public int read(char cbuf[], int off, int len) throws IOException {
synchronized (lock) {
ensureOpen();
if ((off < 0) || (off > cbuf.length) || (len < 0) ||
((off + len) > cbuf.length) || ((off + len) < 0)) {
throw new IndexOutOfBoundsException();
} else if (len == 0) {
return 0;
}
if (next >= length)
return -1;
int n = Math.min(length - next, len);
str.getChars(next, next + n, cbuf, off);
next += n;
return n;
}
}

/**
* Skips the specified number of characters in the stream. Returns
* the number of characters that were skipped.
*
*

The ns parameter may be negative, even though the
* skip method of the {@link Reader} superclass throws
* an exception in this case. Negative values of ns cause the
* stream to skip backwards. Negative return values indicate a skip
* backwards. It is not possible to skip backwards past the beginning of
* the string.
*
*

If the entire string has been read or skipped, then this method has
* no effect and always returns 0.
*
* @exception IOException If an I/O error occurs
*/
public long skip(long ns) throws IOException {
synchronized (lock) {
ensureOpen();
if (next >= length)
return 0;
// Bound skip by beginning and end of the source
long n = Math.min(length - next, ns);
n = Math.max(-next, n);
next += n;
return n;
}
}

/**
* Tells whether this stream is ready to be read.
*
* @return True if the next read() is guaranteed not to block for input
*
* @exception IOException If the stream is closed
*/
public boolean ready() throws IOException {
synchronized (lock) {
ensureOpen();
return true;
}
}

/**
* Tells whether this stream supports the mark() operation, which it does.
*/
public boolean markSupported() {
return true;
}

/**
* Marks the present position in the stream. Subsequent calls to reset()
* will reposition the stream to this point.
*
* @param readAheadLimit Limit on the number of characters that may be
* read while still preserving the mark. Because
* the stream's input comes from a string, there
* is no actual limit, so this argument must not
* be negative, but is otherwise ignored.
*
* @exception IllegalArgumentException If readAheadLimit is < 0
* @exception IOException If an I/O error occurs
*/
public void mark(int readAheadLimit) throws IOException {
if (readAheadLimit < 0){
throw new IllegalArgumentException("Read-ahead limit < 0");
}
synchronized (lock) {
ensureOpen();
mark = next;
}
}

/**
* Resets the stream to the most recent mark, or to the beginning of the
* string if it has never been marked.
*
* @exception IOException If an I/O error occurs
*/
public void reset() throws IOException {
synchronized (lock) {
ensureOpen();
next = mark;
}
}

/**
* Closes the stream and releases any system resources associated with
* it. Once the stream has been closed, further read(),
* ready(), mark(), or reset() invocations will throw an IOException.
* Closing a previously closed stream has no effect.
*/
public void close() {
str = null;
}
}
[/code]

[code]
package util.io;

import java.io.IOException;
import java.io.Reader;

/**
* A reader that reads its characters lazily and without copy
* from a StringBuilder.
*
* The fillStringBuilder() method should manipulate the
* stringbuilder, in each evocation, the stringbuilder
* is drained, so new contents can be written without
* calling setLength(0).
*
* When you want to stop reading, make isReady() return false.
*
*
* Be careful working with this, since if fillStringBuilder()
* uses an collection iterator to iterate over the data
* and the collection is used afterwards,
* but before the LazyStringBuilderReader read everything,
* it will throw a concurrent modification exception the next
* time that LazyStringBuilderReader tries to read.
*
* Specifically, if the fillStringBuilder() uses an iterator,
* you may need to lazily create it (on the isReady() method).
*
* @author i30817
*/
public abstract class LazyStringBuilderReader extends Reader {

private StringBuilderReader inner;

/**
* Instantiate the LazyStringBuilderReader with the
* an StringBuilderReader of size stringBuilderSize

* @param stringBuilderSize
*/
public LazyStringBuilderReader(int stringBuilderSize) {
super();
inner = new StringBuilderReader(new StringBuilder(stringBuilderSize));
}

/**
*
* @return the StringBuilder the reader uses
*/
protected final StringBuilder getStringBuilder() {
return inner.str;
}

/**
* Fill the StringBuilder in this method
*/
protected abstract void fillStringBuilder();

/**
*
* @return false when you want to stop reading
*/
protected abstract boolean isReady();

@Override
public int read(char[] cbuf, int off, int len) throws IOException {
int result = inner.read(cbuf, off, len);
//fill the stringbuffer. Optionally call stop
if (result == -1 && isReady()) {
inner.str.setLength(0);
fillStringBuilder();
inner.next = 0;
inner.length = inner.str.length();
return inner.read(cbuf, off, len);
}
return result;
}

@Override
public void close() throws IOException {
inner.close();
}
}

[/code]

Example:
[code]

return new LazyStringBuilderReader(180) {

Iterator it;

@Override
public boolean isReady() {
if (it == null) {
it = map.entrySet().iterator();
}
return it.hasNext();
}

@Override
public void fillStringBuilder() {
Entry e = it.next();
getStringBuilder().append(e.getValue());
//do lots of conditional appends here.
}
};
[/code]

Message was edited by: i30817

mthornton
Offline
Joined: 2003-06-10
Points: 0

If the StringBuffer's array is significantly bigger than the current length, then copy on write wastes a lot of memory (because that unused space becomes 'locked' by the String). A variety of strategies have been tried in the past but all created performance penalties for some types of code. The current approach is simple and has easily understood performance characteristics, albeit slower in some cases.

With StringBuilder there are synchronization problems. Suppose two threads both have access to a single StringBuilder instance. One thread is performing toString, while the other is continuing to mutate the buffer. Then the lack of synchronization means that the String might change after creation. The mutating thread won't see the 'copied' flag until some time after the String has been created. Without some form of synchronization it may never see the change in the 'copied' state.

serverperformance
Offline
Joined: 2008-08-29
Points: 0

Ok, good reasons but...

[b]Regarding synchronization problems:[/b]

StringBuilder explicitly is thread-unsafe. From its javadoc: [i]"This class provides an API compatible with StringBuffer, but with no guarantee of synchronization. This class is designed for use as a drop-in replacement for StringBuffer in places where the string buffer was being used by a single thread (as is generally the case)"[/i]

And, as far as I know System.arraycopy() isn't threadsafe, is it? So the current "always-copy" StringBuilder implementation in JDK 1.5, 1.6 and 1.7 would have exactly the same synchronization/mutability problem you mentioned if inserting during an arraycopy operation.

[b]Regarding heap size problems:[/b]
Ok, I understand the problem: the String keeps a reference to, say, 2x the needed size in the worst case if the builder is no reused (after reusing, can be more).

[i]albeit slower in some cases[/i]
I would say "in most cases". All aroung the world, all the source codes are plagued with .append().append().append().append().toString(), or a+b+c+d has the same problem (and some more). Including JDK, application servers and final applications.

[b]And now my proposal to JDK 7 or whatever[/b]

So, if the problem is only the memory consumption, I propose for both StringBuffer, StringBuilder, AbstractStringBuilder:
(*) Only share if a static booean flag (copyOnWriteEnabled) has been set, as happens with String.stringCacheEnabled, which in turn is activated in the JVM option [b]-XX:+AggressiveOpts[/b].
(*) Of course, also rethinking the 1.4 solution to content heap. Maybe only sharing if (count<<2 >= value.length) or something like that.

And... if (and only if) there are problems after reusing (which I don't see), add to the JDK a new NonReusableStringBuilder class extending AbstractStringBuilder, and change the javac to convert String + operator to it, and evangelize about this new class. As Sun did with the 1.5 API changes.

Regards!

mthornton
Offline
Joined: 2003-06-10
Points: 0

> And, as far as I know System.arraycopy() isn't
> threadsafe, is it? So the current "always-copy"
> StringBuilder implementation in JDK 1.5, 1.6 and 1.7
> would have exactly the same
> synchronization/mutability problem you mentioned if
> inserting during an arraycopy operation.
While the content of the String might not be what you expect, the critical feature is that it doesn't change once created. It is important for security that String is immutable, and copy on write from a StringBuilder can't do this without introducing synchronization where currently there is none.

serverperformance
Offline
Joined: 2008-08-29
Points: 0

Ok, security reasons in a multithreading environment with threads writing in a StringBuilder and threads reading from the derivated String...

It is frustrating, don't you agree?: A security issue in a extremely rare scenario has performance impact in the 99,99% of the StringBuilder main use: create->concatenating->toString->never use again the Builder->work with the String. All in one only thread.

Maybe, someway, can't be this another target for Escape Analysis?

teilo
Offline
Joined: 2004-02-02
Points: 0