Skip to main content

GC killing me

7 replies [Last post]
hunterp
Offline
Joined: 2007-04-30
Points: 0

I have an app and GC is killing me. everything is fine until the memory reaches the heap then gc takes forever (while still applying load) and the app is completely unresponsive (up to 2 minute wait times).

I thought the time ratio combined with incremental would collect more aggressively (i have lots of free cpu). But with the parameters below, for a 4 minute run, the jvm spent well under a minute GC before it hit the wall and blew up. FYI the PSOldGen space is where 75% of the garbage stays until the wall. I've tried the -XX:MinHeapFreeRatio=30 (instead of default
40) but no difference.

Adding memory only prolongs the onset of the "wall". So my goal is to get the JVM to collect more of the OldGen to try and make the wall as distant as possible.

Here are the OPTS
-Xmx2G -Xms2G -Xmn500M -XX:GCTimeRatio=1 -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:CMSIncrementalSafetyFactor=50 -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 -XX:MaxPermSize=256m -Xss128k

Thanks.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
smoov
Offline
Joined: 2007-05-01
Points: 0

Since you mentioned this was a load test scenario, but didn't mention which JDK version or environment (OS & server) you are running in, I recommend backing up and starting with something more generic like:

OPTS="-server -XX:+AggressiveHeap -Xverbosegc:file=
/.log -XX:+HeapDumpOnOutOfMemoryError"

After you run your test you can either analyze the resulting verbose GC logfile to properly tune your heap sizes or analyze the resulting Heap Dump to find out where all the object retention is going on.

Good luck and report back your findings!
-Ken

olliplough
Offline
Joined: 2006-09-13
Points: 0

I agree with what the others have said so far. Just wanted to give a hint to try out the train garbage collector (there must be a VM switch for doing that, read the docs). This GC always keeps some waggons of pointers at hand that can be disposed immediately altogether in case memory gets low freeing lots of memory at once.

Regards, Oliver

briand
Offline
Joined: 2005-07-11
Points: 0

The train collector is gone in 1.6.0. In 1.5.0, the train collector still exists, but
the -Xincgc option than enabled it prior to 1.5.0 now enables Incremental
CMS. That will tell you how much we think of the train collector. I.e. just say
no.

If you have more than 2 processors on your system, then you might want to consider
switching from iCMS to CMS by dropping the following from your command line:

[code]
-XX:+CMSIncrementalMode -XX:CMSIncrementalSafetyFactor=50
-XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0
-XX:CMSIncrementalDutyCycle=10
[/code]

And maybe adding the following:

[code]
-XX:CMSInitiatingOccupancyFraction=1 -XX:UseCMSInitiatingOccupancyOnly
[/code]

This will result in CMS running concurrently as soon as the old gen is 1% full.
In this configuration, CMS will constantly consume a full CPU. This will obviously
reduce your throughput, as 1 CPU will be fully utilized all the time by GC, but
it will give CMS a chance to keep up with the application. If that's too aggressive,
then use larger numbers.

Another possibility is to play with the young gen sizing, and in particular the
survivor space sizing to see if you can take some pressure off of the old
gen. Larger survivor spaces may give objects a better chance to die in the
young gen instead of getting promoted to the old gen where they add
more work for the CMS collector to do. However, larger survivor spaces also
tends to result in longer minor collection times - only you can decide if your
app can tolerate longer minor collection times.

Of course, if the behavior you are seeing is due to an application memory
leak or other inefficient use of memory, then this too is likely to fail. If that's the
case, you have no recourse other than to solve your application memory
issues.

Incidentally, you should drop -XX:GCTimeRatio=1 when running with CMS.
This option only affects the Parallel GC (-XX:+UseParallelGC, which is the
default on server class machines since 1.5.0).

haskovec
Offline
Joined: 2004-03-20
Points: 0

I have had big GC problems in the past using the ConcMarkSweepGC, we used to have all sorts of problems. I switched to the ParallelGC and our app performs much better. I would consider trying that if possible.

hunterp
Offline
Joined: 2007-04-30
Points: 0

I did profile our app. I am load testing my application with many concurrent queries. As far as physical mem, I have 8G.

Look. My cpu usage is no where near 90-100% before I hit the wall. How can i tell the JVM to GC more aggresively earlier? When the CPU is being used effectively, then it will be time to look at other issues.

sdo
Offline
Joined: 2005-05-23
Points: 0

How much physical memory is on your machine? The behavior you describe typically occurs when the OS has to swap or page memory during GC -- so a lower-sized heap that fits comfortably in memory (along with the other applications on your machine) will actually be better.

-Scott

ewin
Offline
Joined: 2005-07-15
Points: 0

Don't blame the GC for what is likely a bug in your code.

You are likely eating up memory in an uncontrolled way (e.g. "memory leak"), or have chosen an unsuitable algorithm (e.g. trying to read a 2GB file in 1GB of memory at once). Don't act surprised when you are suddenly out of memory. Fix your code. Get a memory profiler and figure out where you are eating your memory, instead of wasting your time by blaming the GC and fiddling with it before you have a clue what really goes wrong.