Skip to main content

Picking a Collection and Design for querying 1G Point objects

4 replies [Last post]
ennuages
Offline
Joined: 2009-02-18
Points: 0

Hi everyone, any chance I could get some design tips?

I have one million Point objects where the x and y values are floats with 2 decimal places. These points exist somewhere between a top left corner of (0,0) and a bottom right corner(400,200).

When the user specifies a top left and bottom right Point(for a rectangle) (no slanted diaganol lines), I have to return all of my Point objects that are inside of the Rectangle specified by the user - in less than a second.

I tried making 2 TreeSets -- one ordered by X and one ordered by Y, where they both use the same Point objects, (I'm avoiding instantiating double the points). The 1st treeset is instantiated without problems, but when my application attempts instantiating the 2nd TreeSet, I get an out-of Heap memory exception. If that were to have worked correctly instead, I was planning on using the "subset" method in TreeSet to get all the Point objects in the user's selected X range, .. and then vice-versa on the "Y" treeset for the points matching the user's specified Y rectangle values,.. and then I could take the intersection of those two result lists to arrive at a list with the points inside the user's specified rectangle.

Tweaking the JVM memory is Not going to be an option.

Is there a different design I should be using?
Thanks so much

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
ennuages
Offline
Joined: 2009-02-18
Points: 0

Thanks for the tips.

I made a home-made spatial index by using a 2d collection (HashMap of HashMaps of Lists)-- where the first is ordered by X, and the second is ordered by Y, which then contains a list of points for that spatial index grid cell.

Instead of using every exact x/y value, I just used the lowest whole number multiple of 10. Example: When encountering Point(x,y) with a value of (19.9, 20.0) I would put this into my 2d hashmap spatial index grid location of (19,20).

My x hashmap has keys of ex: 10,20,30.... 400

MY Y hashmap has keys of ex: 10,20,30....200

Then when I receive the rectangle query that the user wants,.. I know (based on the rectangle top-left and bottom-right) - the exact hashmap index key values I need to look in for possible point matches. Once I get each list of possible results, - I can then filter them out based on their actual 2 float decimal places.

This ends up running fast on a cheap computer,.. but not as fast as a "Quad Tree" will,

Thanks!

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

I guess your problem is memory consumption, not actually speed. If you write a class Point thats on a 32-bit JVM 8byte for the object header + 8 byte for the two floats and at least a 4-byte reference - otherwise you wouldn't be able to reach it. Makes only for your Point-Class 20byte, let alone container-objects of the datastructure you're using.
If you store your points as two (or even one) float-arrays, each entry takes only 4byte - so even with a tight 64mb memory limit like for applets you're well.
For one milling entries a simple loop should be fast *enough* and its O(n) so isn't that bad.

lg Clemens

walterln
Offline
Joined: 2007-04-17
Points: 0

Looks like you're looking for a QuadTree: http://en.wikipedia.org/wiki/Quadtree
If you want to save extra memory you could use int pairs for a point (and divide by hundred when showing in the GUI to get decimals) (or even shorts if you translate to -200, -200 so you max values is 20000). Assuming the only two decimal places requirement won't change in the future :).

roguexz
Offline
Joined: 2003-08-13
Points: 0

Have you considered using a Rectangle2D.Float object instead? You can override the contains() methods to handle the case of distinct points.
Alternatively, you can create a custom class that implements the Shape interface.
Regarding the data model for the points in question, can these points be modeled mathematically? If not, I would suggest that you store the points information in a data store (like a database or a file) and try searching for points in the file instead.
Creating a series of points in memory does not sound like a very scalable idea.
Hope that helps.