Skip to main content

Performance improvements in collections

10 replies [Last post]
brownidj
Offline
Joined: 2003-06-12

Top of my wishlist - making iteration through a collection comparable in speed to iterating through an array - it's the casting that lets the side down. Without a major performance improvement generics is just syntactical sugar. Collections are almost unusable for large datasets.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
tsinger
Offline
Joined: 2003-06-10

And you are sure, that iterating over an collection is the performance-bottleneck in your application?

I'm absolutely sure, that you can save easily 20% to 50% of time somewhere else in your application and don't need to tune collections to save 1%. And, if it really is that important for you, why don't you write your own hyper-fast-special-collection?

Tom

stefanrufer
Offline
Joined: 2003-06-11

Iterating through an ArrayList is slower than iterating through a LinkedList - but the runtimes are not at all long if we talk about 1 mio entries. Have a look at a few recent test cases I wrote if you are interested: http://www.stefanrufer.ch/snipsnap/space/Virtual+Brain/Java/List+Speed

Note that the results were measured using the Collection.contains method call so most of the time is spent comparing strings to each other - traversing is just peanuts.

Where exactly do you have performance problems with your iterators?

kcpeppe
Offline
Joined: 2003-06-15

> Iterating through an ArrayList is slower than
> iterating through a LinkedList -

I think we all agree that there will be performance differences given the different implementations.

> but the runtimes are
> not at all long if we talk about 1 mio entries. Have
> a look at a few recent test cases

I'll be taking a look but... we need to be careful when setting up a benchmark in that we first understand the question that we are asking and secondly, ensuring that the code that we use to answer that question actually does answer that question.

In this instance, we are only asking about the cost of iterating through a collection. Since follow on treatment of the objects retrieved from the collection will vary with the application and thus is not really a cost of the iteration process and should not be considered as part of the question. So we have

Iterator iter = aCollectionOfObjects.iterator();
long start = System.currentTimeMillis();
while ( iter.hasNext()) {
Object o = iter.next();
aPresizedTargetCollection.add( o);
}
long completed = System.currentTimeMillis();

being compare against....

Iterator iter = aCollectionOfString.iterator();
long start = System.currentTimeMillis();
while ( iter.hasNext()) {
String string = (String)iter.next();
aPresizedTargetCollection.add( string);
}
long completed = System.currentTimeMillis();

What we have is a minimal treatment of results. This treatment is needed to ensure that we can control GC and that hotspot doesn't mangle the benchmark. We don't get a pure number but what we can see is a differential that is caused by the cast. In this case, since String extends Object directly, the benchmark is skewed towards hierarchies of depth 1 and is not working from an interface. Maybe not the best but acceptable to show that casting though not insignificant is not a major contributor to overall iteration run time.

Any other treatment of an object after aquisition is your business ;)

rickcarson
Offline
Joined: 2004-03-04

Here's a different take on the same angle:

random lookups into that array or list.

Its a lot less expensive to do an insert at a random place into the list than it is in the array.

So for a little extra cost (the cost of a binary search) you can have an ordered list.

Now, here's the 64 Million $ question: how expensive is it to traverse the links in the list???

IE I'm searching in a sorted list of length 15. The first place I want to look is position #8. Since I'm not doing a comparison, but merely traversing the links, it should be fairly (very) fast...

Timing info on that anyone?

So if you accept a little up front cost (logN for each insert, or NlogN for the whole list, then you can do searches into the list for the following cost(s):
MN + logN, where M is the cost of traversing a link, and logN is the cost of the binary search (measured in String comparisons) (whether or not the string you are looking for is present)

Whereas the average cost of searching a randomly sorted array is 0.5N, measured in String comparisons, if the string is present, or 1.0N if the string is not present...

A 1.3% timing increase for conversion is incredibly small potatoes in the above scenario.

NB: heap advocates pipe up... wait for it... now!

kcpeppe
Offline
Joined: 2003-06-15

> Here's a different take on the same angle:
>
> random lookups into that array or list.

humm, now we are straying into the world of algorithms.

brownidj
Offline
Joined: 2003-06-12

Yes, I do have microbenching evidence that iterating through a collection is much slower than iterating through an array. I will dig up my code and post it in about 48 hours time when I'm back at work!

mthornton
Offline
Joined: 2003-06-10

Its the assertion that "it's the casting that lets the side down" that I dispute, not that iterating is slower than using an array.

kcpeppe
Offline
Joined: 2003-06-15

I should have added that getting rid of the need to explicitly cast would be nice.. The compiler should understand that when I assign an object to a place holder typed String that it should add the cast. Now that's what I call syntatical sweet and low :)

kcpeppe
Offline
Joined: 2003-06-15

I just did a micro-performance benchmark and found that in the JDK 1.4, a simple cast to String added 1.3% to the overall runtime (iterating through an ArrayList containing 1,000,000 String objects). So while it's not free, it's not a performance hotspot running under these conditions.

mthornton
Offline
Joined: 2003-06-10

Do you have any evidence that casting is the reason for slower performance? Escape analysis might help with iterator performance.