Skip to main content

garbage collection in large webapps

3 replies [Last post]
mpiercejava
Offline
Joined: 2005-11-04

I work for a company that maintains a moderately large ecommerce site. We keep a large group of objects (user sessions) alive for several hours each. They eventually get moved to the old generation heap. By the end of the day the old generation is bloated with live sessions, and full garbage collections take 15 or 30 seconds during which our website is frozen. We do not call System.gc(), but many of the open source jars we use do. We could turn explicit garbage collection off, but then the old generation would continue to grow until it did one large collection that might take minutes, which we obviously want to avoid. Ideally one full garbage collection would get run every day at something like 5:00 in the morning.

In looking through java documentation, the concurrent mark sweep collector or the incremental collector look like they are made for this problem, but we find that we get terrible throughput when running either.

This whole thing seems like it would be a common problem. How has it been dealt with?

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
ysramakrishna
Offline
Joined: 2005-04-25

You may need to tune the size and layout of the heap
manually when using the concurrent collector in order
to take good advantage of it. In particular, the default
behaviour of the concurrent collector is to tenure all
objects that survive a scavenge. For various reasons, this
might be a suboptimal policy for your application, so some
tuning of the survivor spaces and the tenuring threshold
may be in order.

If you have GC logs (with PrintGCDetails and PrintGCTimeStamps)
available, feel free to post appropriate snippets here to
illustrate the issue.

You also want to ensure that the concurrent collector
thread does not run all the time and adversely impact
mutator throughput. This might involve tuning the heap
size and possibly the cms initiating threshold.
If you are running on a platform with few cpu's you
might want to consider the use of the increnental
version of the concurrent collector; please refer to
the GC documentation for JDK 1.5.0.

mpiercejava
Offline
Joined: 2005-11-04

We're running java 1.42_03 on Solaris. We've got 4 processors in production. We've been running parallel scavenge with a 2 gig total heap, with 1 gig for eden. Moving to 1.5 is not an option for us yet due to some third party stuff we're tied to that won't support it. We have room to make the heap bigger if needed. Generally, by the end of the day, we have approximately 800 megs of live objects in the old generation.

I've discovered through research since I posted first that the concurrent collector uses 1 processor (for the concurrent stuff), so part of the reason I may have been getting bad results is that in QA where I've been testing we only have 2 processors.

It seems like by switching to concurrent we will lose a little throughput but decrease our pauses, and I feel like we can't tell to what degree this will happen until we try it on the 4 processor box.
My only remaining concern with this is that the concurrent kicks in early (default 68% of old heap), and that if our number of live objects exceeds 68% of the heap we will be doing concurrent GC perpetually. I know this can be adjusted, but java warns that making the number too high risks OOM exceptions.

eileeny
Offline
Joined: 2005-07-13

Could you tell us which version of java you're running on and what flags you're using?

Thanks,
Eileen