Skip to main content

Bad performance when moving from 1.5 to 1.6 with GC as ConcMarkSweepGC

5 replies [Last post]
sbaronia
Offline
Joined: 2009-04-22

After moving from JDK 1.5 to 1.6 using ConcMarkSweepGC, the program started losing msg coming from multicast cloud.

We have a hashMap which is growing upto 2 Millions.On average it receives 1 Millions msgs around half are added and other half removes transactions. In worse case, we have 2 to 4 million msgs. Since the message have to be serially handled, so we have three threads. one for retrieving msg from multicast socket.
other for processing msg.
last outputing on a multicast socket.

Processing thread is add /remove /modified hashMap. 45% msgs are add, 44% remove, and remaining is modifying.

500 M to 1.5 M messages are received in a day.

But we lose messages at the incoming multicast socket layer.

Linux Red Hat
JVM is 64 bit 1.6.3
66GB RAM
24 CPU

Options:
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCTaskTimeStamps -XX:PrintCMSStatistics=1 -XX:+PrintGCTaskTimeStamps -XX:+PrintTenuringDistribution -XX:PrintFLSStatistics=1
-Xmx5G -Xms3G -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:+UseParNewGC -XX:ParallelGCThreads=3 -XX:NewSize=300m -XX:MaxNewSize=300m

Garbage collector: 
Name = 'ParNew', Collections = 454, Total time spent = 1 minute
Garbage collector: 
Name = 'ConcurrentMarkSweep', Collections = 0, Total time spent = 0.000 seconds

Two full GC cycle within one day.

Sample GC log:
=================

Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max Chunk Size: 0
Number of Blocks: 0
Tree Height: 0
41371.089: [ParNew
Desired survivor size 15728640 bytes, new threshold 3 (max 4)
- age 1: 11049928 bytes, 11049928 total
- age 2: 3746240 bytes, 14796168 total
- age 3: 3238576 bytes, 18034744 total
: 272123K->26192K(276480K), 0.1709270 secs] 978512K->735868K(3115008K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 270709815
Max Chunk Size: 270709815
Number of Blocks: 1
Av. Block Size: 270709815
Tree Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max Chunk Size: 0
Number of Blocks: 0
Tree Height: 0
icms_dc=0 , 0.1711980 secs] [Times: user=0.40 sys=0.02, real=0.17 secs]
Heap after GC invocations=144 (full 2):
par new generation total 276480K, used 26192K [0x00002aaaae030000, 0x00002aaac0c30000, 0x00002aaac0c30000)
eden space 245760K, 0% used [0x00002aaaae030000, 0x00002aaaae030000, 0x00002aaabd030000)
from space 30720K, 85% used [0x00002aaabd030000, 0x00002aaabe9c42a8, 0x00002aaabee30000)
to space 30720K, 0% used [0x00002aaabee30000, 0x00002aaabee30000, 0x00002aaac0c30000)
concurrent mark-sweep generation total 2838528K, used 709675K [0x00002aaac0c30000, 0x00002aab6e030000, 0x00002aabee030000)
concurrent-mark-sweep perm gen total 21248K, used 11612K [0x00002aabee030000, 0x00002aabef4f0000, 0x00002aabf3430000)
}
Total time for which application threads were stopped: 0.1719370 seconds
Application time: 79.7146460 seconds
{Heap before GC invocations=144 (full 2):
par new generation total 276480K, used 271952K [0x00002aaaae030000, 0x00002aaac0c30000, 0x00002aaac0c30000)
eden space 245760K, 100% used [0x00002aaaae030000, 0x00002aaabd030000, 0x00002aaabd030000)
from space 30720K, 85% used [0x00002aaabd030000, 0x00002aaabe9c42a8, 0x00002aaabee30000)
to space 30720K, 0% used [0x00002aaabee30000, 0x00002aaabee30000, 0x00002aaac0c30000)
concurrent mark-sweep generation total 2838528K, used 709675K [0x00002aaac0c30000, 0x00002aab6e030000, 0x00002aabee030000)
concurrent-mark-sweep perm gen total 21248K, used 11612K [0x00002aabee030000, 0x00002aabef4f0000, 0x00002aabf3430000)

It is not a network issue. It is definely GC setup issue. Any help would be appreciate.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
peter__lawrey
Offline
Joined: 2005-11-01

There is no JVM 1.6.3, perhaps you mean 1.6.0_03. In this case I suggest you try 1.6.0_13
The GC has only a very tenuous connection with UDP (In that you need to create objects to receive UDP traffic). There is no GC setting which relates with TCP or UDP traffic.

You say you have 2 million messages, but what is the rate of message per second?

sbaronia
Offline
Joined: 2009-04-22

Peter,

Usually the number message could vary, but I have seen program losing msgs even when the number of msgs are low i.e. 800K/min.
Peak is 100K msgs per sec
Avg is 22K msg per sec

I have tried 1.6.0_10 with similar or better result, but they never came close to 1.5.
I will try 1.6.0_13

Thanks
/Sharad

peter__lawrey
Offline
Joined: 2005-11-01

I have tended to avoid UDP when reliable transport is needed.
There are solutions which work with Java 1.6 which I have used, so I'm believe Java 1.6 can be made to work without having to change the GC settings.
I would need to be convinced the GC is cause, except you do have a high peak number of messages.
I may be that you have to use Java 1.5...

sbaronia
Offline
Joined: 2009-04-22

Thanks Peter.
We don't have option to go back to Java 1.5

Do you know a good tool for tuning GC? (other than visual GC, GC viewer, JConsole)

One thing I have notice that old generation increases memory by 8M for every minor GC collection.
After 2 or 3 full GC collection, it loses more messages. Is because of fragmentation in the memory.

I know Java is coming out with G1 Garbage collector which will replace CMS. G1 will handled memory fragmentation properly by design.
I have downloaded it Java 1.6 update 14 beta.

Do you thing it make sense to try G1?

/Sharad

linuxhippy
Offline
Joined: 2004-01-07

I have to agree with peter, nothing in your logs seems to indicate the GC is the root of the problem.
Especially I don't believe e.g. that because of fragmentation you loose messages, where do you see an evidence for that?

The only thing I could immagine are CMS failures causing a fallback to a stop-the-world collection (this can be caused by fragmentation), but your logs don't indicate that.

- Clemens