Glassfish cluster + Remote instance fails to recover EJB timers
Our setup is: Glassfish version 126.96.36.199 -
1. DAS and instance-1 running on the same machine, while instance-2 is running on another machine in the same network as config node.
2. We have set up transaction logging in a shared directory as per the Glassfish High Availability Guide: http://docs.oracle.com/cd/E18930_01/html/821-2416/gjjpy.html#gaxim
3. We are using unicast configuration for cluster communication since we have Network Load Balancer running in multicast mode in the network.
4. Our application (.ear containing multiple .war) has 2 persistent timers (since we need only one timer instance per timer at a time in the cluster).
When instance-1 (or instance-2) is shut down normally, the other instance recovers up the timers from the shut-down instance as expected. When instance-2 crashes or goes offline abnormally, instance-1 recovers its timers (again, as expected). But when instance-1 crashes, instance-2 does not seem to recover its timers as expected.
As far as I can see from the logs, instance-2 receives proper failover message for instance-1 and starts the recovery, but finishes it without recovering any transactions or timers for the failed instance.
Can anyone tell me what the problem can be? (Should I provide any more information?)