Skip to main content

Sorting large data sets

1 reply [Last post]
Joined: 2010-01-18

I'm working on something which needs to able to sort variable amounts of records. As I'm reading this data from a file, I'm looking at how using a TreeMap would perform, inserting each row as it's read - then iterating through the resultant map to write out the results.
My concern is that as I scale up the number of records - the performance will degrade as memory runs short. To try and get round this I'm planning on splitting the process, building maps to a certain point and then write it out to a temporary file. Repeating this for as many times as needed and then merging these resultant files into a final sorted result.
Can anyone see any problems with this or can anyone come up with alternatives.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Joined: 2008-02-22


Yes, probably you'll run out of memory when running the sort over a large dataset.

There are a number of solutions to this, you could use a MergeSort using files as intermediate storage for example, this will take long but it will surely be memory efficient.

Other than that, have you tried using Collection.sort() instead of using a TreeMap?