Skip to main content

Need help for glassfish 3.1 clustering

Please note these java.net forums are being decommissioned and use the new and improved forums at https://community.oracle.com/community/java.
12 replies [Last post]
jihad r hamza Guest
Offline
Joined: 2011-06-17

Hi ,

I am a software developer and new to glassfish , currently trying to
implement one glassfish clulster with three physical machines ( DAS, node1,
node2).
I am able to create the cluster and able to start and stop it.
Am Running this command to list out the instances,

./asadmin list-instances -l

it shows ,
NAME HOST PORT PID CLUSTER STATE
instance1 xxx.xxx.xxx.xxx 24848 15316 cluster1 running
instance2 xxx.xxx.xxx.xxx 24848 13194 cluster1 running

But, when am running the command

get-health cluster1,

it shows,
instance1 not started
instance2 not started

Can you please guide me on this

--
*Thanks and Regards, *
**
*Jihad R Hamza*

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
vanya_void
Offline
Joined: 2011-06-22

Hi,

I have similar problem with glassfish clustering.

sessions from my app, not replicated to another node, but in all other aspects cluster works correctly.

asadmin list-instances -l
NAME HOST PORT PID CLUSTER STATE
portal-instance1 localhost 24848 6343 portal-cluster running
portal-instance3 fss-portal3 24848 4628 portal-cluster running
Command list-instances executed successfully.

asadmin get-health portal-cluster
portal-instance1 started since Wed Jun 22 17:17:20 MSD 2011
portal-instance3 started since Wed Jun 22 17:17:21 MSD 2011
Command get-health executed successfully.

When i'm running asadmin validate-multicast --multicastaddress 224.0.0.251 --multicastport 24567 --bindaddress 172.17.12.172 --timeout 45 on second node in logs appears these error:

[#|2011-06-22T17:35:16.680+0400|WARNING|glassfish3.1|ShoalLogger|_ThreadID=29;_ThreadName=Thread-1;|GMS1071: damaged multicast packet discarded
java.lang.IllegalArgumentException: magic number is not valid
at com.sun.enterprise.mgmt.transport.MessageImpl.parseHeader(MessageImpl.java:172)
at com.sun.enterprise.mgmt.transport.BlockingIOMulticastSender$MessageProcessTask.run(BlockingIOMulticastSender.java:349)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

bbissett
Offline
Joined: 2003-06-16

On 6/22/11 9:50 AM, forums@java.net wrote:
> Hi,
>
> I have similar problem with glassfish clustering.
>
> sessions from my app, not replicated to another node, but in all other
> aspects cluster works correctly.

Can you start a new thread with a subject that describes the issue?
Having the "asadmin get-health" output is very good to include, as that
shows that gms is working and the problem is somewhere higher up in the
stack. Before you do, though, and my apologies if this is obvious -- did
you deploy your app with the "availabilityenabled" flag set to true? You
can verify that HA is on for you app in the admin console as well.

One more comment below:
>
> *asadmin list-instances -l*
> NAME HOST PORT PID
> CLUSTER STATE
> portal-instance1 localhost 24848 6343 portal-cluster
> running
> portal-instance3 fss-portal3 24848 4628 portal-cluster running
> Command list-instances executed successfully.
>
> *asadmin get-health portal-cluster*
> portal-instance1 started since Wed Jun 22 17:17:20 MSD 2011
> portal-instance3 started since Wed Jun 22 17:17:21 MSD 2011
> Command get-health executed successfully.
>
>
> When i'm running asadmin validate-multicast --multicastaddress
> 224.0.0.251
> --multicastport 24567 --bindaddress 172.17.12.172 --timeout 45 on
> second node
> in logs appears these error:
>
> [#|2011-06-22T17:35:16.680+0400|WARNING|glassfish3.1|ShoalLogger|_ThreadID=29;_ThreadName=Thread-1;|GMS1071:
>
> damaged multicast packet discarded
> java.lang.IllegalArgumentException: magic number is not valid
> at
> com.sun.enterprise.mgmt.transport.MessageImpl.parseHeader(MessageImpl.java:172)
>
> at
> com.sun.enterprise.mgmt.transport.BlockingIOMulticastSender$MessageProcessTask.run(BlockingIOMulticastSender.java:349)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
> at java.lang.Thread.run(Thread.java:662)

That means you're receiving messages from some other running server
while the validate-multicast check is going on. You can see what those
messages are by running with the --verbose flag. You might want to check
your server.log as well to see if there are similar errors. If you have
other processes using the same multicast channels, there's a chance they
could be interfering with each other, though that's not supposed to
happen in GF 3.1.

Cheers,
Bobby

tmueller
Offline
Joined: 2005-10-31

The get-health command provides the status from a GMS perspective, which
requires that GMS be enabled for the cluster and that multicast is
working between the servers. There is an "asadmin validate-multicast"
command that can be used to see if the multicast is setup properly.

Tom

On 6/15/2011 6:05 AM, jihad r hamza wrote:
> Hi ,
>
> I am a software developer and new to glassfish , currently trying to
> implement one glassfish clulster with three physical machines ( DAS,
> node1, node2).
> I am able to create the cluster and able to start and stop it.
> Am Running this command to list out the instances,
>
> ./asadmin list-instances -l
>
> it shows ,
> NAME HOST PORT PID CLUSTER STATE
> instance1 xxx.xxx.xxx.xxx 24848 15316 cluster1 running
> instance2 xxx.xxx.xxx.xxx 24848 13194 cluster1 running
>
> But, when am running the command
>
> get-health cluster1,
>
> it shows,
> instance1 not started
> instance2 not started
>
> Can you please guide me on this
>
> --
> *Thanks and Regards, *
> *Jihad R Hamza*
>

bbissett
Offline
Joined: 2003-06-16

On 6/17/11 8:50 AM, Tom Mueller wrote:
> The get-health command provides the status from a GMS perspective,
> which requires that GMS be enabled for the cluster and that multicast
> is working between the servers. There is an "asadmin
> validate-multicast" command that can be used to see if the multicast
> is setup properly.

As Tom writes, the 'validate-multicast' command is your tool for
debugging setup and/or network issues that prevent the cluster from
working properly. See this blog for a full description of how to use the
tool to troubleshoot:

http://blogs.oracle.com/bobby/entry/validating_multicast_transport_where_d

Cheers,
Bobby

jihadrh
Offline
Joined: 2011-06-19

Hi,

I tried validate-muliticast with default and with the address and port as per the domain.xml.

Getting this response,

Received no multicast data
Command validate-multicast failed.

Any idea on this ?

bbissett
Offline
Joined: 2003-06-16

Have you tried increasing time-to-live and specifying the networks
adapter(s) on your system? Those are step 2 and 3 of the blog:

http://blogs.oracle.com/bobby/entry/validating_multicast_transport_where_d

If that doesn't help you may need to talk with the network administrator
to verify that UDP multicast transport is available.

Cheers,
Bobby

On 6/20/11 1:55 AM, forums@java.net wrote:
> Hi,
>
> I tried validate-muliticast with default and with the address and port
> as per
> the domain.xml.
>
> Getting this response,
>
> *Received no multicast data
> Command validate-multicast failed.*
>
>
> Any idea on this ?
>
>
>
>
> --
>
> [Message sent by forum member 'jihadrh']
>
> View Post: http://forums.java.net/node/813204
>
>

jihadrh
Offline
Joined: 2011-06-19

Hi all, thanks for the response.

I checked with network team, and modified iptables.

Now am getting loopback multicast message from the same node. But other node not responding.

When I hit this comnad ./asadmin validate-multicast --multicastport=2379 --multicastaddress=228.9.54.73 --timeout 45 --verbose,

am getting , Unexpected exception occurred: java.lang.StringIndexOutOfBoundsException: String index out of range: -1

But node 1 is responding back, since that node in the same machine.

any idea?

thanks in advance.

bbissett
Offline
Joined: 2003-06-16

On 6/21/11 6:26 AM, forums@java.net wrote:
> Hi all, thanks for the response.
>
> I checked with network team, and modified iptables.
>
> Now am getting loopback multicast message from the same node. But
> other node
> not responding.

That's progress! :)

>
> When I hit this comnad *./asadmin validate-multicast --multicastport=2379
> --multicastaddress=228.9.54.73 --timeout 45 --verbose*,
>
> am getting ,* Unexpected exception occurred:
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1*

I think you must be running GlassFish at the same time as the tool, and
so the tool is getting messages from the app server that it can't
understand. It's documented not to do this, but it was still a bug in
the tool that I didn't account for it, now fixed in GlassFish 3.1.1:
http://java.net/jira/browse/SHOAL-114

Make sure you're not running your app server at the same time and see if
you're getting the results you want. You can also run the
validate-multicast command with the --verbose flag and that will output
every message that it receives, showing you what's coming in. The output
you're looking for will be something like:

McastReceiver: received
'228.9.3.1|hostname|edc691d1-81c2-4dcb-85b6-18b4eda7fdc6'

Cheers,
Bobby

jihadrh
Offline
Joined: 2011-06-19

Hi,

Thanks Bobby..

I tried with glassfish3.2 . Now am not getting that StringIndexOutOfBoundsException. And When I hit command,

./asadmin validate-multicast --multicastport=30593 --multicastaddress=228.9.50.202 --timeout 45

Timeout set to 45 seconds
Will use port 30593
Will use address 228.9.50.202
Will use bind interface null
Will use wait period 2,000 (in milliseconds)

Listening for data...
Sending message with content "SRVR3" every 2,000 milliseconds
Received data from SRVR3 (loopback)

Exiting after 45 seconds. To change this timeout, use the --timeout command line option.
Command validate-multicast executed successfully.

And for the health command,

./asadmin get-health cluster1


instance1 not started
instance2 started since Wed Jun 22 06:21:21 UTC 2011
Command get-health executed successfully.

Am not getting the response from SRVR4. I tried by stopping IP Tables in both servers.

Any idea on this,

Thanks

Jihadrh

bbissett
Offline
Joined: 2003-06-16

This may be an obvious question, but you're running the command on both
machines at the same time, right? Does SRVR4 see its own loopback message?

If they're both seeing their own loopback messages, but not each other,
then make sure they're on the same subnet. If so, then you could try
increasing time-to-live. Beyond that, you'll have to work with the
network admin to get it sorted out.

Cheers,
Bobby

On 6/22/11 2:35 AM, forums@java.net wrote:
> Hi,
>
> Thanks Bobby..
>
> I tried with glassfish3.2 . Now am not getting that
> StringIndexOutOfBoundsException. And When I hit command,
>
> *./asadmin validate-multicast --multicastport=30593
> --multicastaddress=228.9.50.202 --timeout 45*
>
> /Timeout set to 45 seconds
> Will use port 30593
> Will use address 228.9.50.202
> Will use bind interface null
> Will use wait period 2,000 (in milliseconds)
> Listening for data...
> Sending message with content "SRVR3" every 2,000 milliseconds
> Received data from SRVR3 (loopback)/
>
> /Exiting after 45 seconds. To change this timeout, use the --timeout
> command
> line option.
> Command validate-multicast executed successfully./
>
> But Am not getting the response from SRVR4. I stopped IP Tables in both
> servers.
>
> Any idea on this,
>
>
>
> Thanks
>
> Jihadrh
>
>
>
>
> --
>
> [Message sent by forum member 'jihadrh']
>
> View Post: http://forums.java.net/node/813204
>
>

jihadrh
Offline
Joined: 2011-06-19

hi bbissett,

I am getting loopback message from both servers, SRVR3 and SRVR4. And am running the multicast check same time...

When I check in the SRVR4 with tcpdump command, it shows packets recevied.

And I am able to ping both servers with ping command from each other. Will that be sufficient for knowing the subnet relevant question. More on this, I will check with

network team. Both servers are under same firewall.

When I check the log, am able to see logs with this,

"discarded damaged multicast , magic number is not valid".

I tried with --verbose command.. its sending message SRVR3-<some keys>|ip|..... from SRVR3

and SRVR4-<somekeys>|ip|.... from SRVR4

I tried with --timeout 45 also..

bbissett
Offline
Joined: 2003-06-16

On 6/22/11 11:10 PM, forums@java.net wrote:
> hi bbissett,
>
> I am getting loopback message from both servers, SRVR3 and SRVR4. And am
> running the multicast check same time...
>
> When I check in the SRVR4 with tcpdump command, it shows packets
> recevied.
>
> And I am able to ping both servers with ping command from each other.
> Will
> that be sufficient for knowing the subnet relevant question. More on
> this, I
> will check with
>
> network team. Both servers are under same firewall.

Yes, you'll have to check with the network team to make sure both
machines are on the same subnet and that the routers/switches allow UDP
traffic. Networks setups are all different, so I can't help any more
than to say that both machines seem to be doing the right thing --
they're just not seeing each other.

> When I check the log, am able to see logs with this,
>
> *"discarded damaged multicast , magic number is not valid".*

This is the GlassFish server log? Since you're running the tool and the
server at the same time, using the same multicast info, GlassFish is
receiving these messages that it doesn't understand. I'd recommend not
running the tool and the server at the same time, but it doesn't
actually hurt anything (except in earlier versions of the tool that
failed with unexpected info, but I think you already saw that).

Cheers,
Bobby