Skip to main content

Two node cluster in-memory session replication not working in 2.1.1.b31g

12 replies [Last post]
chinesco
Offline
Joined: 2010-05-14

I setup a two node cluster with identical hardware on each server and the following software:

- Ubuntu 9.10 karmic
- Glassfish v2.1.1-b31g-linux
- Java 1.6.0_20-b02
- Openntpd

I followed the regular instructions, including this tutorial: http://www.randombugs.com/java/glassfish/how-to-install-and-configure-a-....

The installation goes very well and the cluster start with no problems, but in-memory session replication never works for the clusterjsp sample app, I always get different session Ids when I hit each node's url.

I verified already the obvious issues such as:
- Server name resolution, all hosts can see each other, hardcoded names in each hosts file
- Mutlticast communication working properly in all nodes, verified using the shoal tests testing as sniffer and as client on each side, I do receive the nine expected messages.
- Servers are in the same subnet and also in the same switch under the same router
- OpenNTP daemon is up and running on each server to keep the clocks in synch.

Below is the relevant output I get from the server.log on the first instance that is started by the first agent. Once the agent finishes starting up replication is disabled and the app turns to memory only.

This instance is running in the same server that acts as the DAS. I also tried running the first agent in a different server and had the cluster running with no agents in the same machine. I got the same result from the remote agent, see below:

[#|2010-05-21T14:08:23.606-0700|INFO|sun-appserver2.1|javax.enterprise.system.container.web|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;/clusterjsp;replicated;web-method;session;|WEB0130: Enabling ha-based persistence for web module [/clusterjsp]'s sessions: persistence-type = [replicated] / persistenceFrequency = [web-method] / persistenceScope = [session]|#]

[#|2010-05-21T14:08:33.159-0700|WARNING|sun-appserver2.1|javax.enterprise.system.container.web|_ThreadID=18;_ThreadName=Thread-46;_RequestID=ff6b9def-a602-4328-b199-bcbc0afdd96d;|ReplicationHealthChecker:replication health now ok: currentPartner: node1-instance|#]

[#|2010-05-21T14:08:33.161-0700|INFO|sun-appserver2.1|com.sun.enterprise.ee.web.sessmgmt.pipe|_ThreadID=18;_ThreadName=Thread-46;|bound pipes: 5 brokenClosedPipes:0 pipepool.size=5 total pipes:5|#]

[#|2010-05-21T14:08:39.193-0700|INFO|sun-appserver2.1|com.sun.enterprise.ee.web.sessmgmt.pipe|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;|JxtaSenderPipeManager::pipePoolCount decremented = 4|#]

[#|2010-05-21T14:08:40.196-0700|INFO|sun-appserver2.1|com.sun.enterprise.ee.web.sessmgmt.pipe|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;|JxtaSenderPipeManager::pipePoolCount decremented = 3|#]

[#|2010-05-21T14:08:41.198-0700|INFO|sun-appserver2.1|com.sun.enterprise.ee.web.sessmgmt.pipe|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;|JxtaSenderPipeManager::pipePoolCount decremented = 2|#]

[#|2010-05-21T14:08:42.200-0700|INFO|sun-appserver2.1|com.sun.enterprise.ee.web.sessmgmt.pipe|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;|JxtaSenderPipeManager::pipePoolCount decremented = 1|#]

[#|2010-05-21T14:08:43.202-0700|INFO|sun-appserver2.1|com.sun.enterprise.ee.web.sessmgmt.pipe|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;|JxtaSenderPipeManager::pipePoolCount decremented = 0|#]

[#|2010-05-21T14:08:43.203-0700|WARNING|sun-appserver2.1|javax.enterprise.system.container.web|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;_RequestID=4ed6cd02-2870-4981-8b47-2780a3d25d48;|Out of pipes in JxtaReplicationSender pipepool. Disabling replication.|#]

[#|2010-05-21T14:08:43.207-0700|INFO|sun-appserver2.1|com.sun.enterprise.ee.web.sessmgmt.pipe|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;|beginning attempt to reconnect|#]

[#|2010-05-21T14:08:49.214-0700|INFO|sun-appserver2.1|com.sun.enterprise.ee.web.sessmgmt.pipe|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;|attempt to reconnect succeeded|#]

[#|2010-05-21T14:08:50.216-0700|INFO|sun-appserver2.1|com.sun.enterprise.ee.web.sessmgmt.pipe|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;|DIAGNOSTIC Message: JxtaReplicationSender.putPipeWrapper => about to call decrementPipePoolCount because thePipeWrapper is null|#]

[#|2010-05-21T14:08:50.217-0700|INFO|sun-appserver2.1|com.sun.enterprise.ee.web.sessmgmt.pipe|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;|JxtaSenderPipeManager::pipePoolCount decremented = -1|#]

[#|2010-05-21T14:08:50.383-0700|INFO|sun-appserver2.1|com.sun.enterprise.ee.web.sessmgmt|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;|com.sun.enterprise.ee.web.sessmgmt.JxtaSocketChannel created server socket successfully|#]

[#|2010-05-21T14:08:50.394-0700|INFO|sun-appserver2.1|com.sun.enterprise.ee.web.sessmgmt|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;|PlainSocketChannel created server socket at /0.0.0.0:9999, PublishAddress = {LISTENPORT=9999, HOSTIP=192.168.18.107}|#]

[#|2010-05-21T14:08:50.398-0700|WARNING|sun-appserver2.1|javax.enterprise.system.container.web|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;_RequestID=4ed6cd02-2870-4981-8b47-2780a3d25d48;|Default Removal size threshold: 1|#]

[#|2010-05-21T14:08:50.405-0700|WARNING|sun-appserver2.1|javax.enterprise.system.container.web|_ThreadID=17;_ThreadName=RMI TCP Connection(7)-192.168.18.107;_RequestID=4ed6cd02-2870-4981-8b47-2780a3d25d48;|Default Removal interval threshold: 15000|#]

Any help or additional pointer is greatly appreciated.

Thanks.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
mk111283
Offline
Joined: 2005-03-29

Ignore my previous post. You have got the right build.

Could you run with logging level set to FINE and post the server.log

Thanks.

chinesco
Offline
Joined: 2010-05-14

Thanks for your reply,

Attached you will find the server.log with Log level in FINE since the agent instance was started until the session replication was disabled by the cluster

This is the command I used to deploy the clusterjsp ear file:

sudo ./asadmin deploy --target portal-cluster --port 4848 --availabilityenabled=true ../samples/quickstart/clusterjsp/clusterjsp.ear

Also note this error occurred in the Agent instance deployed in the same server where the DAS is installed.

I get the same result if I deploy it in an agent instance in a different machine.

mk111283
Offline
Joined: 2005-03-29

From the server.log it looks like only DAS and node-instance1 only from the following lines:

[#|2010-05-25T09:49:58.113-0700|INFO|sun-appserver2.1|ShoalLogger|_ThreadID=13;_ThreadName=ViewWindowThread:portal-cluster;|GMS View Change Received for group portal-cluster : Members in view for ADD_EVENT(before change analysis) are :
1: MemberId: server, MemberType: SPECTATOR, Address: urn:jxta:uuid-59616261646162614A787461503250332DE658F932AB436995B78E0CB3E080DA03
2: MemberId: node1-instance, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033568A5452017A416495CB80D9BEE9620D03
|#]

The replication sub system needs atleast two instances to work properly.

Could you ensure that there are indeed two server instances running other than DAS (and both should be able to communicate between themselves). Ensure that these are all in the same subnet.

Please send us the server.log of both instances and the DAS server.log.

(Thanks Joe Fialli for the help)

chinesco
Offline
Joined: 2010-05-14

Thanks again,

Logs included are from both physical servers:

- server1.log from the machine that has the DAS, one node agent and one instance
- server2.log from the machine that has only one node agent and one instance

Both logs contain data since the agents were started until right after the clusterjsp app was deployed and the session test was executed on each node URL, still the session is not replicated.

Both machines can see each other, multicast is working and validated with shoal tests

I see only some CORBA related errors but nothing specific to the session.

Shreedhar Ganapathy

Just wanted to ensure this has been done - have you set the availability
enabled attribute to true for the clusterjsp application?
That tells the appserver that the app's state needs to be replicated.

On 5/25/10 3:04 PM, glassfish@javadesktop.org wrote:
> Thanks again,
>
> Logs included are from both physical servers:
>
> - server1.log from the machine that has the DAS, one node agent and one instance
> - server2.log from the machine that has only one node agent and one instance
>
> Both logs contain data since the agents were started until right after the clusterjsp app was deployed and the session test was executed on each node URL, still the session is not replicated.
>
> Both machines can see each other, multicast is working and validated with shoal tests
>
> I see only some CORBA related errors but nothing specific to the session.
> [Message sent by forum member 'chinesco']
>
> http://forums.java.net/jive/thread.jspa?messageID=471430
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

chinesco
Offline
Joined: 2010-05-14

Yes, please see my second post above, the command I used to deploy was:

asadmin deploy --target portal-cluster --port 4848 --availabilityenabled=true ../samples/quickstart/clusterjsp/clusterjsp.ear

Actually if I have multiple agents in the same physical server, the replication works fine across the instances, but never works when deployed in another machine even when the logs show communication between the servers.

mk111283
Offline
Joined: 2005-03-29

These logs look a lot better than the previous server.log

I do see a couple of valveSave from server.log and I also see couple of
"receiving id: 15759421730ffb40ae4f3cb70b00[ver:1]" messages in the server.log which shows that data is indeed getting replicated.

I think the problem is that when you hit the second server, the correct JSESSIONID is not passed as a cookie. Front ending the setup with an LB will solve the issue.

chinesco
Offline
Joined: 2010-05-14

Thanks for reviewing.

I thought the clustering replication was not dependent at all with an LB in front of the nodes, could you please clarify this.

I will setup a Load Balancer in front, run the test again and then bring down one box to see if the fail over to the other node carries with it the session in order to verify your suggestion. I will post my results as soon as I am done.

Thank you.

Shreedhar Ganapathy

On 5/26/10 3:04 PM, glassfish@javadesktop.org wrote:
> Thanks for reviewing.
>
> I thought the clustering replication was not dependent at all with an LB in front of the nodes, could you please clarify this.
>
When instances are on the same machine, as you move from one instance to
another, the hostname stays the same, and as a result the cookie sent by
the browser includes the same bits of information including hostname,
session id, etc as the other instance that originally served the
request. So you can see replication working.

When you have instances on more than one machine, and when you point
your browser client from one server on one machine to another server on
another machine to verify session replication, the browser sent cookie
cannot be used by the container to retrieve the session from the second
server as the hostname parameter would have changed.

Fronting the two instances with a load balancer would remove this
problem as the LB would do the required translation when the request
fails over to the other instance on the second machine. You will have to
stop the instance from which the initial request was responded to, in
order to let the LB fail over to the second instance.

Hope this is helpful.

> I will setup a Load Balancer in front, run the test again and then bring down one box to see if the fail over to the other node carries with it the session in order to verify your suggestion. I will post my results as soon as I am done.
>
> Thank you.
> [Message sent by forum member 'chinesco']
>
> http://forums.java.net/jive/thread.jspa?messageID=471596
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

chinesco
Offline
Joined: 2010-05-14

Thank you both for the support on this issue. In fact I tested again with a Load Balancer in front of the Cluster and now the replication works as expected.

The explanation above clarified my problem a lot, I think the server's cluster tutorial and / or documentation should include a note about this, even if it is obvious I think is worth mentioning it, specially for Cluster newbies like me:

"In order for session replication to work in a cluster with two or more physical servers, the cluster should always be fronted by a load balancer."

Thanks.

Shreedhar Ganapathy

Thanks. That is good feedback. Will have the docs team update the doc to
say so.

Cheers
Shreedhar

On 5/26/10 6:03 PM, glassfish@javadesktop.org wrote:
> Thank you both for the support on this issue. In fact I tested again with a Load Balancer in front of the Cluster and now the replication works as expected.
>
> The explanation above clarified my problem a lot, I think the server's cluster tutorial and / or documentation should include a note about this, even if it is obvious I think is worth mentioning it, specially for Cluster newbies like me:
>
> "In order for session replication to work in a cluster with two or more physical servers, the cluster should always be fronted by a load balancer."
>
> Thanks.
> [Message sent by forum member 'chinesco']
>
> http://forums.java.net/jive/thread.jspa?messageID=471613
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

mk111283
Offline
Joined: 2005-03-29

We had a similar issue but it got fixed in 9.1 final release. Could you try this on b58g which is the final release build
https://glassfish.dev.java.net/downloads/v2-b58g.html

Also, could you run with log level set to FINE?