Skip to main content

Cluster sometimes stops without any exception

10 replies [Last post]
sebigavril
Offline
Joined: 2011-09-28

I have the following problem. Our Glassfish 3.1 server is configured with a cluster for some heavy processing. From time to time, quite rarely, the cluster shuts down without any known reason. I do not find any exceptions in the log, just the server logging shut down messages. I am attaching 2 fragments of the log file, one from the DAS and the other from one of the cluster instances. All the nodes are on the same machine. One issue that i find in the logs is related to the number of members listed when receiving GMS instances. We only have 3 memebers (DAS and two cluster instances) but 10 are listed. Would this have anything to do with the cluster being stopped? Any help on this would most welcomed.

Thanks

DAS
11:46:45,322 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [file:/E:/Glassfish3/glassfish/domains/domain1/generated/jsp/managementtool-1.12.0.20/loader_1608711619/logback.xml]
11:46:45,416 |-ERROR in ch.qos.logback.classic.joran.action.InsertFromJNDIAction - [SMTPServer] has null or empty value
11:46:45,416 |-INFO in ch.qos.logback.classic.joran.action.InsertFromJNDIAction - Setting context variable [LOG_LOCATION] to [E:\\v1\\logs]
11:46:45,416 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.rolling.RollingFileAppender]
11:46:45,416 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [FILE]
11:46:45,478 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy - No compression will be used
11:46:45,494 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy - Will use the pattern E://v1//logs/managementtool/spm-%d{yyyy-MM-dd}.%i.log for the active file
11:46:45,494 |-INFO in ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP@468b2c18 - The date pattern is 'yyyy-MM-dd' from file name pattern 'E://v1//logs/managementtool/spm-%d{yyyy-MM-dd}.%i.log'.
11:46:45,494 |-INFO in ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP@468b2c18 - Roll-over at midnight.
11:46:45,494 |-INFO in ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP@468b2c18 - Setting initial period to Fri Jun 29 11:46:45 EEST 2012
11:46:45,494 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
11:46:45,588 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - Active log file name: E://v1//logs/managementtool/spm-2012-06-29.0.log
11:46:45,588 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - File property is set to [null]
11:46:45,588 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
11:46:45,603 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDOUT]
11:46:45,603 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
11:46:45,603 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to INFO
11:46:45,603 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT]
11:46:45,603 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [com.mypackage.central] to DEBUG
11:46:45,603 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [com.mypackage.managementtool] to DEBUG
11:46:45,603 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
11:46:45,603 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@80e1ba6 - Registering current configuration as safe fallback     [#|2012-06-29T11:47:09.479+0300|INFO|glassfish3.1.1|javax.enterprise.system.tools.admin.org.glassfish.deployment.admin|_ThreadID=111;_ThreadName=Thread-2;|managementtool-1.12.0.20 was successfully deployed in 49,564 milliseconds.|#]

</pre>
<pre>[#|2012-06-29T11:49:57.263+0300|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=19;_ThreadName=Thread-2;|GMS1015: Received Group Shutting down message from member: server of group: POD_Processing_Cl01|#]

</pre>
<pre>[#|2012-06-29T11:49:57.263+0300|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1092: GMS View Change Received for group: POD_Processing_Cl01 : Members in view for MASTER_CHANGE_EVENT(before change analysis) are :

1: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address: 10.220.10.67:9171:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
2: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address: 10.220.20.110:9181:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
3: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address: 10.220.20.195:9134:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
4: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address: 10.220.10.67:9106:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
5: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address: 10.220.20.110:9152:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
6: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address: 10.220.20.195:9090:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
7: MemberId: server, MemberType: SPECTATOR, Address: 10.220.10.67:9157:228.9.103.177:16084:POD_Processing_Cl01:server
8: MemberId: server, MemberType: SPECTATOR, Address: 10.220.20.110:9154:228.9.103.177:16084:POD_Processing_Cl01:server
9: MemberId: server, MemberType: SPECTATOR, Address: 10.220.20.195:9149:228.9.103.177:16084:POD_Processing_Cl01:server
10: MemberId: server, MemberType: SPECTATOR, Address: 10.220.20.197:9129:228.9.103.177:16084:POD_Processing_Cl01:server

and

Cluster
2012-06-29 11:43:27,396 [com-mypackage-worker-Execution slot for task {runId=4121, environmentId=33}}] DEBUG envId{33} - sesId{} - com.mypackage.worker.taskprocessing.flow.stage.DispatcherNotificationStage[61] - Beginning stage WORKER_DISPATCHER_NOTIFICATION...
2012-06-29 11:43:27,396 [com-mypackage-worker-Execution slot for task {runId=4121, environmentId=33}}] INFO  envId{33} - sesId{} - com.mypackage.worker.communication.jms.JMSCompletionMessageSender[45] - Sending to Dispatcher JMS completion message for task {runId=4121, environmentId=33}, status=RUN_COMPLETED}...
2012-06-29 11:43:27,411 [com-mypackage-worker-Execution slot for task {runId=4121, environmentId=33}}] INFO  envId{33} - sesId{} - com.mypackage.worker.communication.jms.JMSCompletionMessageSender[45] - Sending to Dispatcher JMS completion message for task {runId=4121, environmentId=33}, status=RUN_COMPLETED}...DONE
2012-06-29 11:43:27,411 [com-mypackage-worker-Execution slot for task {runId=4121, environmentId=33}}] DEBUG envId{33} - sesId{} - com.mypackage.worker.taskprocessing.flow.stage.DispatcherNotificationStage[61] - Ending stage WORKER_DISPATCHER_NOTIFICATION...
2012-06-29 11:43:27,411 [com-mypackage-worker-Execution slot for task {runId=4121, environmentId=33}}] INFO  envId{33} - sesId{} - com.mypackage.worker.taskprocessing.slotpool.SlotPool[49] - Releasing execution slot for completed task {runId=4121, environmentId=33}}...
2012-06-29 11:43:27,411 [com-mypackage-worker-Execution slot for task {runId=4121, environmentId=33}}] INFO  envId{33} - sesId{} - com.mypackage.worker.taskprocessing.slotpool.SlotPool[49] - Releasing execution slot for completed task {runId=4121, environmentId=33}}...DONE
2012-06-29 11:43:27,411 [com-mypackage-worker-Execution slot for task {runId=4121, environmentId=33}}] DEBUG envId{33} - sesId{} - com.mypackage.worker.taskprocessing.slotpool.SlotPool[61] - Ending stage WORKER_SLOT_POOL...task {runId=4121, environmentId=33}}
2012-06-29 11:43:27,411 [com-mypackage-worker-Execution slot for task {runId=4121, environmentId=33}}] DEBUG envId{33} - sesId{} - com.mypackage.worker.taskprocessing.ProcessingSlot[61] - Ending method doProcess()...
2012-06-29 11:43:27,411 [com-mypackage-worker-Execution slot for task {runId=4121, environmentId=33}}] DEBUG envId{33} - sesId{} - com.mypackage.worker.taskprocessing.ProcessingSlot[61] - Ending stage WORKER_SLOT...
2012-06-29 11:44:00,443 [admin-thread-pool-24850(5)] DEBUG envId{} - sesId{} - com.mypackage.worker.service.WorkerInstancePropertiesService[61] - Beginning method destroy()...
2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)] DEBUG envId{} - sesId{} - com.mypackage.worker.service.WorkerInstancePropertiesService[61] - Ending method destroy()...
2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)] DEBUG envId{} - sesId{} - com.mypackage.worker.WorkerImpl[61] - Beginning method destroy()...
2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)] DEBUG envId{} - sesId{} - com.mypackage.worker.WorkerImpl[61] - Ending method destroy()...
2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)] DEBUG envId{} - sesId{} - com.mypackage.worker.taskrepository.TaskPreallocationQueue[61] - Beginning method shutdown()...
2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)] DEBUG envId{} - sesId{} - com.mypackage.worker.taskrepository.TaskPreallocationQueue[61] - Ending method shutdown()...
2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)] INFO  envId{} - sesId{} - org.springframework.context.support.ClassPathXmlApplicationContext[1020] - Closing org.springframework.context.support.ClassPathXmlApplicationContext@aedb61f: startup date [Fri Jun 29 11:31:34 EEST 2012]; root of context hierarchy
2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)] DEBUG envId{} - sesId{} - com.mypackage.appframework.persistence.PersistenceServiceImpl[74] - Destroying datasource for environment id : 33
2012-06-29 11:44:00,599 [admin-thread-pool-24850(5)] INFO  envId{} - sesId{} - com.mypackage.appframework.persistence.PersistenceServiceImpl[76] - Destroyed data source for environment id : 33
2012-06-29 11:44:00,599 [admin-thread-pool-24850(5)] INFO  envId{} - sesId{} - org.hibernate.impl.SessionFactoryImpl[925] - closing
2012-06-29 11:44:00,615 [admin-thread-pool-24850(5)] INFO  envId{} - sesId{} - org.springframework.jmx.export.MBeanExporter[429] - Unregistering JMX-exposed beans on shutdown
2012-06-29 11:44:00,646 [Duration logger] INFO  envId{} - sesId{} - com.mypackage.appframework.util.Consumer[75] - Duration logger thread is shutting down
2012-06-29 11:44:00,646 [Alert reporter] INFO  envId{} - sesId{} - com.mypackage.appframework.util.Consumer[75] - Alert reporter thread is shutting down
2012-06-29 11:44:00,709 [admin-thread-pool-24850(5)] INFO  envId{} - sesId{} - com.mypackage.appframework.cache.metadata.MetadataRegistrationService[78] - Successfully unregistered for environment with id 33 and name Env1
2012-06-29 11:44:00,724 [admin-thread-pool-24850(5)] INFO  envId{} - sesId{} - org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean[441] - Closing JPA EntityManagerFactory for persistence unit 'central'
2012-06-29 11:44:00,740 [admin-thread-pool-24850(5)] INFO  envId{} - sesId{} - org.hibernate.impl.SessionFactoryImpl[925] - closing
2012-06-29 11:44:00,740 [admin-thread-pool-24850(5)] DEBUG envId{} - sesId{} - com.mypackage.central.persistence.CentralDataSource[57] - Destroying central data source.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
jfialli
Offline
Joined: 2003-06-16

On 11/26/12 9:03 AM, forums@java.net wrote:
> I have the following problem. Our Glassfish 3.1 server is configured
> with a
> cluster for some heavy processing. From time to time, quite rarely, the
> cluster shuts down without any known reason. I do not find any
> exceptions in
> the log, just the server logging shut down messages. I am attaching 2
> fragments of the log file, one from the DAS and the other from one of the
> cluster instances.
I did not see any ShoalLogger instance shutdown messages in the
submitted log file entries.
You should look over all log file entries that contain ShoalLogger to
track the life cycles of
each clustered instance.

> All the nodes are on the same machine. One issue that i
> find in the logs is related to the number of members listed when
> receiving
> GMS instances. We only have 3 memebers (DAS and two cluster instances)
> but 10
> are listed. Would this have anything to do with the cluster being
> stopped?
A fragment taken from the GMS View in the submitted log.

There is not enough info provided to answer your questions, but I can
provide an interpretation of the one GMS View that was submitted.

Note that the 3 clustered instances below all have same common instance
name "POD_Processing_Cl01:POD_Processing_Cl01_ins01"
but they have distinct Shoal GMS listener network address and ports.
Since all 3 instances are using same multicast group address and port,
they are joining the same Shoal GMS cluster POD_Processing_Cl01.

1: MemberId:
POD_Processing_Cl01_ins01, MemberType: CORE,
Address:
*10.220.10.67:9171*:/228.9.103.177:16084/:POD_Processing_Cl01:POD_Processing_Cl01_ins01

2: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE,
Address:
*10.220.20.110:9181*:/228.9.103.177:16084/:POD_Processing_Cl01:POD_Processing_Cl01_ins01

3: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE,
Address:
*10.220.20.195:9134*:/228.9.103.177:16084/:POD_Processing_Cl01:POD_Processing_Cl01_ins01

The bold ip address and port above, such as *10.220.10.67:9171, * is the
network interface address and port for shoal GMS Listener for the
clustered instance.
The italic ip address and port above, such as /228.9.103.177:16084, /is
the multicast group address and multicast port being used by Shoal GMS
on the clustered instance.

I am uncertain if someone else is running the exact same configuration
on your subnet (and they should not be sharing the same manually set
multicast group address and port) or
if you are running on one machine with virtual vms that are producing
the multiple ip addresses on one machine. Just be aware that there are
definitely 3 distinct clustered instances
with same clustered instance name, different port listeners and joining
the same cluster running in your network environment.

Shoal GMS will allow multiple clustered instances with exact same name
in same subnet AS LONG as the multicast group address and ports are
different.
A configuration error could lead to the above but is not the only reason
the above could happen.

-Joe

> Any help on this would most welcomed. Thanks DAS 11:46:45,322 |-INFO in
> ch.qos.logback.classic.LoggerContext[default] - Found resource
> [logback.xml]
> at
> [file:/E:/Glassfish3/glassfish/domains/domain1/generated/jsp/managementtool-1.12.0.20/loader_1608711619/logback.xml]
>
> 11:46:45,416 |-ERROR in
> ch.qos.logback.classic.joran.action.InsertFromJNDIAction -
> [SMTPServer] has
> null or empty value 11:46:45,416 |-INFO in
> ch.qos.logback.classic.joran.action.InsertFromJNDIAction - Setting
> context
> variable [LOG_LOCATION] to [E:\\v1\\logs] 11:46:45,416 |-INFO in
> ch.qos.logback.core.joran.action.AppenderAction - About to instantiate
> appender of type [ch.qos.logback.core.rolling.RollingFileAppender]
> 11:46:45,416 |-INFO in ch.qos.logback.core.joran.action.AppenderAction -
> Naming appender as [FILE] 11:46:45,478 |-INFO in
> c.q.l.core.rolling.TimeBasedRollingPolicy - No compression will be used
> 11:46:45,494 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy -
> Will use
> the pattern E://v1//logs/managementtool/spm-%d{yyyy-MM-dd}.%i.log for the
> active file 11:46:45,494 |-INFO in
> ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP@468b2c18 - The date
> pattern
> is 'yyyy-MM-dd' from file name pattern
> 'E://v1//logs/managementtool/spm-%d{yyyy-MM-dd}.%i.log'. 11:46:45,494
> |-INFO
> in ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP@468b2c18 -
> Roll-over at
> midnight. 11:46:45,494 |-INFO in
> ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP@468b2c18 - Setting
> initial
> period to Fri Jun 29 11:46:45 EEST 2012 11:46:45,494 |-INFO in
> ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming
> default
> type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder]
> property 11:46:45,588 |-INFO in
> ch.qos.logback.core.rolling.RollingFileAppender[FILE] - Active log
> file name:
> E://v1//logs/managementtool/spm-2012-06-29.0.log 11:46:45,588 |-INFO in
> ch.qos.logback.core.rolling.RollingFileAppender[FILE] - File property
> is set
> to [null] 11:46:45,588 |-INFO in
> ch.qos.logback.core.joran.action.AppenderAction - About to instantiate
> appender of type [ch.qos.logback.core.ConsoleAppender] 11:46:45,603
> |-INFO in
> ch.qos.logback.core.joran.action.AppenderAction - Naming appender as
> [STDOUT]
> 11:46:45,603 |-INFO in
> ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming
> default
> type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder]
> property 11:46:45,603 |-INFO in
> ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level
> of ROOT
> logger to INFO 11:46:45,603 |-INFO in
> ch.qos.logback.core.joran.action.AppenderRefAction - Attaching
> appender named
> [STDOUT] to Logger[ROOT] 11:46:45,603 |-INFO in
> ch.qos.logback.classic.joran.action.LoggerAction - Setting level of
> logger
> [com.mypackage.central] to DEBUG 11:46:45,603 |-INFO in
> ch.qos.logback.classic.joran.action.LoggerAction - Setting level of
> logger
> [com.mypackage.managementtool] to DEBUG 11:46:45,603 |-INFO in
> ch.qos.logback.classic.joran.action.ConfigurationAction - End of
> configuration. 11:46:45,603 |-INFO in
> ch.qos.logback.classic.joran.JoranConfigurator@80e1ba6 - Registering
> current
> configuration as safe fallback
> [#|2012-06-29T11:47:09.479+0300|INFO|glassfish3.1.1|javax.enterprise.system.tools.admin.org.glassfish.deployment.admin|_ThreadID=111;_ThreadName=Thread-2;|managementtool-1.12.0.20
>
> was successfully deployed in 49,564 milliseconds.|#]
> [#|2012-06-29T11:49:57.263+0300|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=19;_ThreadName=Thread-2;|GMS1015:
>
> Received Group Shutting down message from member: server of group:
> POD_Processing_Cl01|#]
> [#|2012-06-29T11:49:57.263+0300|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1092:
>
> GMS View Change Received for group: POD_Processing_Cl01 : Members in
> view for
> MASTER_CHANGE_EVENT(before change analysis) are : 1: MemberId:
> POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.10.67:9171:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 2: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.20.110:9181:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 3: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.20.195:9134:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 4: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.10.67:9106:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 5: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.20.110:9152:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 6: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.20.195:9090:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 7: MemberId: server, MemberType: SPECTATOR, Address:
> 10.220.10.67:9157:228.9.103.177:16084:POD_Processing_Cl01:server 8:
> MemberId:
> server, MemberType: SPECTATOR, Address:
> 10.220.20.110:9154:228.9.103.177:16084:POD_Processing_Cl01:server 9:
> MemberId: server, MemberType: SPECTATOR, Address:
> 10.220.20.195:9149:228.9.103.177:16084:POD_Processing_Cl01:server 10:
> MemberId: server, MemberType: SPECTATOR, Address:
> 10.220.20.197:9129:228.9.103.177:16084:POD_Processing_Cl01:server and
> Cluster
> 2012-06-29 11:43:27,396 [com-mypackage-worker-Execution slot for task
> {runId=4121, environmentId=33}}] DEBUG envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.flow.stage.DispatcherNotificationStage[61]
>
> - Beginning stage WORKER_DISPATCHER_NOTIFICATION... 2012-06-29
> 11:43:27,396
> [com-mypackage-worker-Execution slot for task {runId=4121,
> environmentId=33}}] INFO envId{33} - sesId{} -
> com.mypackage.worker.communication.jms.JMSCompletionMessageSender[45] -
> Sending to Dispatcher JMS completion message for task {runId=4121,
> environmentId=33}, status=RUN_COMPLETED}... 2012-06-29 11:43:27,411
> [com-mypackage-worker-Execution slot for task {runId=4121,
> environmentId=33}}] INFO envId{33} - sesId{} -
> com.mypackage.worker.communication.jms.JMSCompletionMessageSender[45] -
> Sending to Dispatcher JMS completion message for task {runId=4121,
> environmentId=33}, status=RUN_COMPLETED}...DONE 2012-06-29 11:43:27,411
> [com-mypackage-worker-Execution slot for task {runId=4121,
> environmentId=33}}] DEBUG envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.flow.stage.DispatcherNotificationStage[61]
>
> - Ending stage WORKER_DISPATCHER_NOTIFICATION... 2012-06-29 11:43:27,411
> [com-mypackage-worker-Execution slot for task {runId=4121,
> environmentId=33}}] INFO envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.slotpool.SlotPool[49] - Releasing
> execution slot for completed task {runId=4121, environmentId=33}}...
> 2012-06-29 11:43:27,411 [com-mypackage-worker-Execution slot for task
> {runId=4121, environmentId=33}}] INFO envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.slotpool.SlotPool[49] - Releasing
> execution slot for completed task {runId=4121, environmentId=33}}...DONE
> 2012-06-29 11:43:27,411 [com-mypackage-worker-Execution slot for task
> {runId=4121, environmentId=33}}] DEBUG envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.slotpool.SlotPool[61] - Ending stage
> WORKER_SLOT_POOL...task {runId=4121, environmentId=33}} 2012-06-29
> 11:43:27,411 [com-mypackage-worker-Execution slot for task {runId=4121,
> environmentId=33}}] DEBUG envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.ProcessingSlot[61] - Ending method
> doProcess()... 2012-06-29 11:43:27,411 [com-mypackage-worker-Execution
> slot
> for task {runId=4121, environmentId=33}}] DEBUG envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.ProcessingSlot[61] - Ending stage
> WORKER_SLOT... 2012-06-29 11:44:00,443 [admin-thread-pool-24850(5)] DEBUG
> envId{} - sesId{} -
> com.mypackage.worker.service.WorkerInstancePropertiesService[61] -
> Beginning
> method destroy()... 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)]
> DEBUG envId{} - sesId{} -
> com.mypackage.worker.service.WorkerInstancePropertiesService[61] - Ending
> method destroy()... 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)]
> DEBUG envId{} - sesId{} - com.mypackage.worker.WorkerImpl[61] - Beginning
> method destroy()... 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)]
> DEBUG envId{} - sesId{} - com.mypackage.worker.WorkerImpl[61] - Ending
> method
> destroy()... 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)] DEBUG
> envId{} - sesId{} -
> com.mypackage.worker.taskrepository.TaskPreallocationQueue[61] -
> Beginning
> method shutdown()... 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)]
> DEBUG envId{} - sesId{} -
> com.mypackage.worker.taskrepository.TaskPreallocationQueue[61] - Ending
> method shutdown()... 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)]
> INFO envId{} - sesId{} -
> org.springframework.context.support.ClassPathXmlApplicationContext[1020]
> -
> Closing
> org.springframework.context.support.ClassPathXmlApplicationContext@aedb61f:
>
> startup date [Fri Jun 29 11:31:34 EEST 2012]; root of context hierarchy
> 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)] DEBUG envId{} -
> sesId{}
> - com.mypackage.appframework.persistence.PersistenceServiceImpl[74] -
> Destroying datasource for environment id : 33 2012-06-29 11:44:00,599
> [admin-thread-pool-24850(5)] INFO envId{} - sesId{} -
> com.mypackage.appframework.persistence.PersistenceServiceImpl[76] -
> Destroyed
> data source for environment id : 33 2012-06-29 11:44:00,599
> [admin-thread-pool-24850(5)] INFO envId{} - sesId{} -
> org.hibernate.impl.SessionFactoryImpl[925] - closing 2012-06-29
> 11:44:00,615
> [admin-thread-pool-24850(5)] INFO envId{} - sesId{} -
> org.springframework.jmx.export.MBeanExporter[429] - Unregistering
> JMX-exposed
> beans on shutdown 2012-06-29 11:44:00,646 [Duration logger] INFO
> envId{} -
> sesId{} - com.mypackage.appframework.util.Consumer[75] - Duration logger
> thread is shutting down 2012-06-29 11:44:00,646 [Alert reporter] INFO
> envId{}
> - sesId{} - com.mypackage.appframework.util.Consumer[75] - Alert reporter
> thread is shutting down 2012-06-29 11:44:00,709
> [admin-thread-pool-24850(5)]
> INFO envId{} - sesId{} -
> com.mypackage.appframework.cache.metadata.MetadataRegistrationService[78]
> -
> Successfully unregistered for environment with id 33 and name Env1
> 2012-06-29
> 11:44:00,724 [admin-thread-pool-24850(5)] INFO envId{} - sesId{} -
> org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean[441] -
> Closing JPA EntityManagerFactory for persistence unit 'central'
> 2012-06-29
> 11:44:00,740 [admin-thread-pool-24850(5)] INFO envId{} - sesId{} -
> org.hibernate.impl.SessionFactoryImpl[925] - closing 2012-06-29
> 11:44:00,740
> [admin-thread-pool-24850(5)] DEBUG envId{} - sesId{} -
> com.mypackage.central.persistence.CentralDataSource[57] - Destroying
> central
> data source.
>
> --
>
> [Message sent by forum member 'sebigavril']
>
> View Post: http://forums.java.net/node/892700
>
>

sebigavril
Offline
Joined: 2011-09-28

Thanks for the reply. The server was configured with an automated tool. There are multiple machines in our network that use the same tool to configure Glassfish. Thus, all instances will have the same name (the tool creates 2 cluster instances). If i understand correctly, what you are saying is that some instances might share (besides the same name) the same multicast group address and multicast port being used by Shoal GMS. Does this mean that if a shutdown message is being sent to an instance on one machine, there is a chance of two or more instances on different machines actually receiving the signal? If this is the case, I believe we should make sure that all instances have different Shoal GMS addresses and/or ports. Can you please confirm?

jfialli
Offline
Joined: 2003-06-16

On 11/26/12 1:11 PM, forums@java.net wrote:
> Thanks for the reply. The server was configured with an automated
> tool. There
> are multiple machines in our network that use the same tool to configure
> Glassfish. Thus, all instances will have the same name (the tool
> creates 2
> cluster instances). If i understand correctly, what you are saying is
> that
> some instances might share (besides the same name) the same multicast
> group
> address and multicast port being used by Shoal GMS.
If one just created the cluster using the command

% asadmin create-cluster

and did that twice to two different DASes, you would never get the same
multicast
group address and port associated with even the same cluster-name.

It looks to me that the tool is explicitly providing the multicast group
address and port.
When that is done, the local admin must ensure it is unique for each
different
combination of cluster name and DAS.

Here is the command that I suspect that automated tool is using:

% asadmin create-cluster --multicastaddress /228.9.103.177
--multicast-port /16084

The tool should not be explicitly specifying the information.

You can find the cluster configuration element in element cluster in
/glassfish3/glassfish/domains/*/config/domain.xml
and in the clustered instances in
/glassfish3/glassfish/nodes/*/config/domain.xml.

Here is a sample:

The attribute values gms-multicast-port and gms-multicast-address are
generated by the create-cluster AS LONG as
no one specifies them. Looks like in your case they are being
specified. Or someone is copying around domain.xml
after it is created the first time. Running asadmin create-cluster with
same cluster name to different das would
not generate same multicast group address and same multicast port.

-Joe
> Does this mean that if a
> shutdown message is being sent to an instance on one machine, there is a
> chance of two or more instances on different machines actually
> receiving the
> signal? If this is the case, I believe we should make sure that all
> instances
> have different Shoal GMS addresses and/or ports. Can you please confirm?
>
> --
>
> [Message sent by forum member 'sebigavril']
>
> View Post: http://forums.java.net/node/892700
>
>

sebigavril
Offline
Joined: 2011-09-28

No, the address and port are not specified when configuring glassfish. Is it possible to somehow end up having the same ip address and port after using a "% asadmin create-cluster" approach tens of times on hundreds of machines on the same network?

jfialli
Offline
Joined: 2003-06-16

On 11/27/12 7:30 AM, forums@java.net wrote:
> No, the address and port are not specified when configuring glassfish.
> Is it
> possible to somehow end up having the same ip address and port after
> using a
> "% asadmin create-cluster" approach tens of times on hundreds of
> machines on
> the same network?

Rather than venture a guess, I will just state the ranges that these are
getting generated from.

The multicast port is a random number selected between 2048 to 32000
when the "asadmin create-cluster" command is run.
It is recorded and placed in the domain.xml configuration file.
The default ephemeral port range is 32k to 64k on some OS.

One can manually set the multicastport for each cluster (thus managing
no collisions when many clusters are being created)
by providing a multicastport between 2048 to 49151 as the parameter
--multicastport to the "asadmin create-cluster" command.
(to use between 32k to 49k, the native OS means for specifying the
ephemeral port range must be used to ensure no collisions
between ephemeral port range (used for generation of network client
socket temporary ports).

For generating the multicast group address, the first 2 parts of the IP
address are fixed at "228.9" as your addresses were. So the
address is very likely generated. However, the last 2 parts of the
multicast group address are generated as random numbers between
0-255.

Copying of the domain.xml may be more likely than a 3 way collision that
you reported, but it is definitely not impossible.

If the above randomness is not sufficient for your frequency of creating
clusters with same name,
the ability to explicitly set the multicast group address and/or
multicast port is the means
to avoid collisions if the automated, default mechanism is not sufficient.

-Joe

>
> --
>
> [Message sent by forum member 'sebigavril']
>
> View Post: http://forums.java.net/node/892700
>
>

sebigavril
Offline
Joined: 2011-09-28

I've tried to reproduce this behavior manually. I've got my hands on two machines that have glassfish installed and configured the same way. I've manually changed the GMS address and port on one of the clusters. Then I've restarted the clusters a couple of times. I couldn't make the other cluster stop (or start). When stopping them, i would get a normal log. However, when starting one of them, i would get the following, expected log:

[#|2012-11-28T00:51:16.879-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1092: GMS View Change Received for group: POD_Processing_Cl01 : Members in view for JOINED_AND_READY_EVENT(before change analysis) are :
1: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address: 10.220.20.118:9095:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
2: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address: 10.220.20.194:9137:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
3: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address: 10.220.20.118:9104:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
4: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address: 10.220.20.194:9106:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
5: MemberId: server, MemberType: SPECTATOR, Address: 10.220.20.194:9143:228.9.103.196:16084:POD_Processing_Cl01:server
|#]

[#|2012-11-28T00:51:16.879-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1016: Analyzing new membership snapshot received as part of event: JOINED_AND_READY_EVENT for member: POD_Processing_Cl01_ins01 of group: POD_Processing_Cl01|#]

[#|2012-11-28T00:51:16.879-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1025: Adding Joined And Ready member: POD_Processing_Cl01_ins01 group: POD_Processing_Cl01 StartupState: INSTANCE_STARTUP |#]

[#|2012-11-28T00:51:32.847-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1092: GMS View Change Received for group: POD_Processing_Cl01 : Members in view for JOINED_AND_READY_EVENT(before change analysis) are :
1: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address: 10.220.20.118:9095:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
2: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address: 10.220.20.194:9137:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
3: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address: 10.220.20.118:9104:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
4: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address: 10.220.20.194:9106:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
5: MemberId: server, MemberType: SPECTATOR, Address: 10.220.20.194:9143:228.9.103.196:16084:POD_Processing_Cl01:server
|#]

[#|2012-11-28T00:51:32.847-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1016: Analyzing new membership snapshot received as part of event: JOINED_AND_READY_EVENT for member: POD_Processing_Cl01_ins02 of group: POD_Processing_Cl01|#]

[#|2012-11-28T00:51:32.847-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1025: Adding Joined And Ready member: POD_Processing_Cl01_ins02 group: POD_Processing_Cl01 StartupState: INSTANCE_STARTUP |#]

Still, only one cluster is up. Not both of them. Are there any other configurations that could have been done in the first case, when both clusters came down?

Thank you for your time and patience.

jfialli
Offline
Joined: 2003-06-16

I am uncertain what you are trying to illustrate with this email;
however, I do have a simple solution for you
to use if your development mode is such that all clustered instances for
a cluster are all on the same machine.

Use the following command to limit the multicast to only being local if
all clustered instances for a cluster
are on one machine.

% asadmin create-cluster --properties "GMS_MULTICAST_TIME_TO_LIVE=0"
....

The above results in a cluster that can only be seen on that single
machine.
See asadmin create-cluster --help for a description for this property.
The above
limits udp multicast broadcast by shoal gms to be only on the local machine.
This mode is sufficient for development/testing where all clustered
instances for a cluster are on same machine.
It definitely not be used in a development or production environment
where the clustered instances are spread across multiple machines.
(but one would not expect a production environment to have hundreds of
clusters with same name).

Limiting the scope of shoal gms udp multicast broadcast works around the
following constraint not being maintained in your environment.
Namely, the following tuple must be unique within the subnet for each
unique cluster.

cluster name, multicast group address, multicast port

It is not a bug in the system if the above constraint is violated and
you observe multiple instances from different machines
seeing each other.

-Joe

On 11/28/12 3:03 AM, forums@java.net wrote:
> I've tried to reproduce this behavior manually. I've got my hands on two
> machines that have glassfish installed and configured the same way. I've
> manually changed the GMS address and port on one of the clusters. Then
> I've
> restarted the clusters a couple of times. I couldn't make the other
> cluster
> stop (or start). When stopping them, i would get a normal log.
> However, when
> starting one of them, i would get the following, expected log:
> [#|2012-11-28T00:51:16.879-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1092:
>
> GMS View Change Received for group: POD_Processing_Cl01 : Members in
> view for
> JOINED_AND_READY_EVENT(before change analysis) are : 1: MemberId:
> POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.20.118:9095:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 2: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.20.194:9137:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 3: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.20.118:9104:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 4: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.20.194:9106:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 5: MemberId: server, MemberType: SPECTATOR, Address:
> 10.220.20.194:9143:228.9.103.196:16084:POD_Processing_Cl01:server |#]
> [#|2012-11-28T00:51:16.879-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1016:
>
> Analyzing new membership snapshot received as part of event:
> JOINED_AND_READY_EVENT for member: POD_Processing_Cl01_ins01 of group:
> POD_Processing_Cl01|#]
> [#|2012-11-28T00:51:16.879-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1025:
>
> Adding Joined And Ready member: POD_Processing_Cl01_ins01 group:
> POD_Processing_Cl01 StartupState: INSTANCE_STARTUP |#]
> [#|2012-11-28T00:51:32.847-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1092:
>
> GMS View Change Received for group: POD_Processing_Cl01 : Members in
> view for
> JOINED_AND_READY_EVENT(before change analysis) are : 1: MemberId:
> POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.20.118:9095:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 2: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.20.194:9137:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 3: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.20.118:9104:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 4: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.20.194:9106:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 5: MemberId: server, MemberType: SPECTATOR, Address:
> 10.220.20.194:9143:228.9.103.196:16084:POD_Processing_Cl01:server |#]
> [#|2012-11-28T00:51:32.847-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1016:
>
> Analyzing new membership snapshot received as part of event:
> JOINED_AND_READY_EVENT for member: POD_Processing_Cl01_ins02 of group:
> POD_Processing_Cl01|#]
> [#|2012-11-28T00:51:32.847-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1025:
>
> Adding Joined And Ready member: POD_Processing_Cl01_ins02 group:
> POD_Processing_Cl01 StartupState: INSTANCE_STARTUP |#] Still, only one
> cluster is up. Not both of them. Are there any other configurations that
> could have been done in the first case, when both clusters came down?
> Thank
> you for your time and patience.
>
> --
>
> [Message sent by forum member 'sebigavril']
>
> View Post: http://forums.java.net/node/892700
>
>

sebigavril
Offline
Joined: 2011-09-28

Thanks for the solution you provided. But in order to validate that the solution does indeed work for my initial problem i should be able to reproduce it. I can not, so i am wondering if there are things we have not taken into consideration.

jfialli
Offline
Joined: 2003-06-16

On 11/29/12 2:20 AM, forums@java.net wrote:
> Thanks for the solution you provided. But in order to validate that the
> solution does indeed work for my *initial problem* i should be able to
> reproduce it. I can not, so i am wondering if there are things we have
> not
> taken into consideration.
>
Sorry but I do not have any other thoughts on what could have been
happening in your environment.

The solution I provided using TIME_TO_LIVE set to 0 will definitely
isolate clusters with same name, multicast group address
and multicast port if each cluster is truely just suppose to have
clustered instances on one machine.

-Joe
> --
>
> [Message sent by forum member 'sebigavril']
>
> View Post: http://forums.java.net/node/892700
>
>

sebigavril
Offline
Joined: 2011-09-28

Thanks Joe so much for your help! I appreciate it. Have a good day. :-)