Skip to main content

GMS errors and high load in DAS

No replies
DouglasJohnson
Offline
Joined: 2013-09-17
Points: 0

Greetings,

I'm wondering if anyone has experienced a problem similar to ours. After running fine for a period of time, with no errors in server.log and low CPU load, our DAS begins generating lots of GMS-related errors and the load gets quite high. Restarting the DAS and all clusters makes the problem go away for a couple days until it returns again. Increasing the java VM memory helped, but only by making the problem take longer to recur. Some info about the system:

* Three VMs - one as the DAS server and the other two as nodes where all instances are hosted
* 11 clusters, each with two instances (one per node)
* Memory limit upped to 2G: -Xmx2048m, -Xms2048m
* ulimit -n 65536 (open files limit, this is Solaris 10)
* ulimit -s 65536 (stack size limit)

The java process for the DAS domain itself is the one that starts using all the CPU time. Are the threads mentioned in the log messages below part of a particular thread pool? It seemed to me that they are not. Multicast is enabled and in use for these servers.

Any ideas on what I need to adjust to correct this? Here's an example of the error messages that I see:

[#|2013-09-24T09:38:14.122-0700|SEVERE|glassfish3.1.2|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=423;_ThreadName=Thread-2;|Connection timed out|#]

[#|2013-09-24T09:38:19.873-0700|SEVERE|glassfish3.1.2|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=520;_ThreadName=Thread-2;|Connection timed out|#]

[#|2013-09-24T09:38:25.062-0700|WARNING|glassfish3.1.2|ShoalLogger|_ThreadID=32;_ThreadName=Thread-2;|GMS1042: failed to send heartbeatmessage with state=aliveandready to member 169.254.182.77:9197:228.9.114.120:7221:Cluster1:Cluster1-Node1-Instance. Reason: IOException:failed to connect to 169.254.182.77:9197:228.9.114.120:7221:Cluster1:Cluster1-Node1-Instance|#]

[#|2013-09-24T09:38:44.164-0700|WARNING|glassfish3.1.2|ShoalLogger|_ThreadID=421;_ThreadName=Thread-2;|failed to send a pong message
java.io.IOException: failed to connect to 169.254.182.77:9111:228.9.0.164:18979:Cluster2:Cluster2-Node1-Instance
at com.sun.enterprise.mgmt.transport.grizzly.grizzly1_9.GrizzlyTCPConnectorWrapper.send(GrizzlyTCPConnectorWrapper.java:132)
at com.sun.enterprise.mgmt.transport.grizzly.grizzly1_9.GrizzlyTCPConnectorWrapper.doSend(GrizzlyTCPConnectorWrapper.java:96)
at com.sun.enterprise.mgmt.transport.AbstractMessageSender.send(AbstractMessageSender.java:74)
at com.sun.enterprise.mgmt.transport.grizzly.GrizzlyNetworkManager.send(GrizzlyNetworkManager.java:288)
at com.sun.enterprise.mgmt.transport.grizzly.PingMessageListener.receiveMessageEvent(PingMessageListener.java:82)
at com.sun.enterprise.mgmt.transport.AbstractNetworkManager.receiveMessage(AbstractNetworkManager.java:144)
at com.sun.enterprise.mgmt.transport.grizzly.grizzly1_9.GrizzlyMessageDispatcherFilter.execute(GrizzlyMessageDispatcherFilter.java:75)
at com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:137)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:104)
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:90)
at com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:54)
at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59)
at com.sun.grizzly.ContextTask.run(ContextTask.java:71)
at com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532)
at com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at com.sun.grizzly.TCPConnectorHandler.finishConnect(TCPConnectorHandler.java:297)
at com.sun.grizzly.connectioncache.client.CacheableConnectorHandler.finishConnect(CacheableConnectorHandler.java:230)
at com.sun.enterprise.mgmt.transport.grizzly.grizzly1_9.GrizzlyTCPConnectorWrapper$CloseControlCallbackHandler.onConnect(GrizzlyTCPConnectorWrapper.java:185)
at com.sun.grizzly.CallbackHandlerContextTask.doCall(CallbackHandlerContextTask.java:70)
... 5 more
|#]

[#|2013-09-24T09:38:44.163-0700|SEVERE|glassfish3.1.2|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=423;_ThreadName=Thread-2;|Connection timed out|#]

[#|2013-0924T09:38:44.163-0700|SEVERE|glassfish3.1.2|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=548;_ThreadName=Thread-2;|Connection timed out|#]

Thanks in advance for any suggestions!