Skip to main content

Limiting ammount of Data JPA pulls from Database

11 replies [Last post]
bpeck
Offline
Joined: 2007-11-09

Hey all,

I'm setting up JPA to connect to my PostgreSQL database, however when it tries to initialize the JPA objects I run out of heap memory for the java VM.

I was wondering if there was a way to tell the JPA to only pull back a certain number of values from the database? And if so can it be told after startup to change which ones it has in memory?

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
mvatkina
Offline
Joined: 2005-04-04

It's more complicated that just using your own implementation of a List - the provider can replace your implementation with its own for the LAZY loading.

Regards,
-marina

bpeck
Offline
Joined: 2007-11-09

Thanks, at first I was confused because I thought I had already taken precautions to handle that and thought JPA was just pulling it in on creation.

But after reading the posts implying that there is no JPA issue, just watch what you're pulling into memory, i went back through my code and found a spot I forgot to fix.

Everything is happy now
- Thanks again.

kevinatgz
Offline
Joined: 2009-04-30

hi,bpeck, can you introduct what's the problem?

jeffreyrodriguez
Offline
Joined: 2007-02-23

How many objects are you talking about? What's your query look like? What about your objects, what do they look like? Which values do you want back?

You can also increase your heap size, but I suspect that's not your problem.

bpeck
Offline
Joined: 2007-11-09

Well I can narrow it down to only a single table coming back (1 object?).

This one table has ~25 fields and it's ok in memory at 15k rows. But at 100k when JPA starts up I get the heap memory error.

And 100k isn't our max. we needed to simulate 1m rows in that table.

But I can't query the database since it dies during the creation of the JPA objects (I think that's where it's dying, my stack trace was giving me the heap error from the toplink persistence methods.

lancea
Offline
Joined: 2003-06-13

Well you can invoke Query.setMaxResults() and Query.setFirstResult() and write your application so it uses these methods for pagination.

Regards
Lance

whartung
Offline
Joined: 2003-06-13

Yea, this is a kind of "Doc, it hurts when I..." thing.

Simply put, don't suck in 1M rows in to the heap, or get more heap.

There's basically no reason for (most) applications to load, and cache, that many rows in one gulp.

The only way JPA would do that is if you:

a) Specifically told it to: select o from MyTable o, wher MyTable has a gazillion rows, or

b) You have a parent object with a OneToMany relationship to a table with a huge amount of rows, and then you access the parents list triggering the JPA to load in the entire collection.

If a) is the case, don't do that. Bring in chunks of data, either through careful filtering, or, as mentioned, the setMaxResults and setFirstResults.

If b) is the case, them, well, don't do that either. If you have that kind of structure, have a JPA configured to lazily load such a huge collection is like running with scissors. You will trip sometime, and you will fall.

So, don't have JPA manage that relationship, but handle it yourself.

Through careful use of the Value List Pattern and some clever caching, you can easily manage tabels with millions of rows, without obliterating your memory. But slurping all of the rows in to RAM really isn't the best option here.

ljnelson
Offline
Joined: 2003-08-04

There's something interesting about all this, and it goes to the heart of API design and program evolution. This is a bit of a ramble but I was curious what others thought on the subject.

Suppose you have a parent object with children. Call them Order and OrderLine, just for fun. Anyone who does back-of-the-envelope object design will come up with something like an Order having many OrderLines via a List or array getter:
[code]public List getOrderLines();[/code]

I wonder: should domain object API design just always take into account paging concerns? Should that be:
[code]public List getOrderLines(final int startingIndex, final int numberOfLines);[/code]
...?

We all start out by blindly doing the getter/setter/Javabean/property thing, but I wonder...I wonder if paging shouldn't make it into the domain layer as a standard architecture component--maybe even a footnote in a JavaBeans specification update.

This is a contrived example because you would hopefully never make an Order that was so big that paging would actually be a concern, but you see what I mean.

Less contrived: What about other areas of an application? System.getUsers(), for example, might be one of those APIs where you start out by thinking that your application won't be very popular, but then it becomes the next MySpace.

So what's my point? I guess to the extent I have one it's just that it's easy to wave our hands and say "use JPA's paging facilities", but those concerns bubble up to the domain layer.

It would be interesting to hear where and how others have brought paging "up" into their domain layer.

Best,
Laird

whartung
Offline
Joined: 2003-06-13

> This is a contrived example because you would
> hopefully never make an Order that was so big that
> paging would actually be a concern, but you see what
> I mean.
>
> Less contrived: What about other areas of an
> application? System.getUsers(), for example, might
> be one of those APIs where you start out by thinking
> that your application won't be very popular, but then
> it becomes the next MySpace.
>
> So what's my point? I guess to the extent I have one
> it's just that it's easy to wave our hands and say
> "use JPA's paging facilities", but those concerns
> bubble up to the domain layer.

Yes, of course, the painful reality when purity of design meets crushing realities.

One could simply argue that the domain model does not change, but rather the implementations of the internal constructs change.

Using you example, you'll notice that you provide a List. Thankfully (and rightly so), List is an interface. One could suggest that internally the actual list implementation is OMGThisListIsHuge, and it manages the heartache for us.

So, in that light, we shouldn't have to compromise the domain layer.

Truth is harsher than that, of course, so many just punt.

But it doesn't necessarily have to be that way.

ljnelson
Offline
Joined: 2003-08-04

> One could simply argue that the domain model does not
> change, but rather the implementations of the
> internal constructs change.

I've played both sides of this game. I guess what I'm musing on is: does "paging" (for want of a better term) constitute purely an implementation detail, or is it significant enough to warrant consideration inside the domain model? One could argue the flip side of your position: write a cover method, order.getOrderLines(), that delegates to order.getOrderLines(0, Integer.MAX_VALUE).

I really don't have an opinion yet, and the usual annoying disclaimer about how this will change depending on your requirements etc. still holds, but I've never really seen people discuss paging as anything other than a slap-on UI-driven infrastructure need--and I'm wondering if instead it shouldn't get some first-class love at the domain level. The thought would be something like this: look, deciding how many chunks to retrieve is something that callers really should be thinking about all the time. It isn't tied to a particular technology. For these reasons, the logic would go, make sure every layer down the stack can expose paging as a first-class concern.

I don't know whether I buy that or not; my jury's still out. Anyhow, I thought it would be interesting to see if others have thoughts on this subject--thanks for chiming in!

Cheers,
Laird

whartung
Offline
Joined: 2003-06-13

Well, we can play semantic games.

Paing IS solely a UI manifestation. For example, I don't know of any order posting routine that wants its line items in pages.

And, for good or ill, it is performance dependent. Getting and/or caching 10-20 rows is essentially a "free" operation all said and done.

But at some point, the numbers of rows pass the "this is too slow" intangible point.

To address your point, tho, there is really no reason why the persistence layer can't implement, as an optimization through a configuration option, a paging lazy loader.

Toplink, for example, offers the feature that if you pass a unmanaged object via reference (i.e. to a class in the WAR from a Session Bean), it still maintains its lazy load behavior, even though it's not officially managed. I think it wouldn't be a horrible hack to tweak that code to be page senstive, so that when you iterated from objectItems[10] thru [20], the lazy loader is smart enough to load, say, 2 or 3 blocks of rows of data.

But a lot of it really depends on the JDBC implementation. Just because you can specify a scrolling cursor to the DB (a typical mechanism to implement something like this), doesn't mean it performs well. It would be rather yucky to have the driver load the first 1000 rows so it can show rows 1000-1010.

But that doesn't mean that they couldn't offer such functionality thru the Toplink implementation, so that if you DB and JDBC driver support such paging like queries well, you'd be able to leverage them.