Glassfish cluster + Remote instance fails to recover EJB timers

Joined: 2013-01-24

Our setup is: Glassfish version -

1. DAS and instance-1 running on the same machine, while instance-2 is running on another machine in the same network as config node.
2. We have set up transaction logging in a shared directory as per the Glassfish High Availability Guide:
3. We are using unicast configuration for cluster communication since we have Network Load Balancer running in multicast mode in the network.
4. Our application (.ear containing multiple .war) has 2 persistent timers (since we need only one timer instance per timer at a time in the cluster).

When instance-1 (or instance-2) is shut down normally, the other instance recovers up the timers from the shut-down instance as expected. When instance-2 crashes or goes offline abnormally, instance-1 recovers its timers (again, as expected). But when instance-1 crashes, instance-2 does not seem to recover its timers as expected.

As far as I can see from the logs, instance-2 receives proper failover message for instance-1 and starts the recovery, but finishes it without recovering any transactions or timers for the failed instance.

Can anyone tell me what the problem can be? (Should I provide any more information?)

Joined: 2013-01-24

After 2 weeks or so of work, we have finally found the problem.

It seems when an instance in a cluster goes down, the recovery instance checks if the instance is still up by trying to access the "node-host":"admin-node-port" of the downed instance. If you are using the standard created node on the DAS (as we were), the node-host is set to "localhost".

So, instance-2 was trying to see if instance-1 is down by trying to connect to "localhost", instead of "instance-1-ip" as it should have been. Since it could connect to localhost, the instance-1 was falsely marked as running and the recovery didn't go ahead.

We had to change the node-host for instance-1 node in domain config.xml to fix this, since the configuration of default localhost- cannot be changed through asadmin or admin console.

Joined: 2005-05-05

Hi ameyc,

I'm trying to create a glassfish cluster with:

Node A into Host A (1 instance here)


Node B into Host B (1 instance here)

Then I'm deploying an EJB Timer (@scehdule) but when I stop the instance where the timer is running, the recovery process is not working.

Also, I've configured the EJB timer service using a XA Datasource.

Could you please shared the domain.xml you've used in order to get this?

I know this post is very old, but you are the only one I see could achieve this.

Thanks in advance,

Joined: 2013-01-24

Hi Luciano,

I don't have access to the config.xml right now. Can you tell me if there is any error in the server log while trying the recovery?