Skip to main content

JMS HA has issues

Posted by nikolaf on April 11, 2013 at 12:44 PM PDT

Configuration:

glassfish1.corp.local

  • cluster
  • ssh node glassfish1
  • instance1

glassfish2.corp.local

  • ssh node glassfish2
  • instance2

Firstly, I started with a conventional cluster with master broker, but if you lose the master broker (i.e. power it off), the rest of the instances on the other servers accept messages, but they never reach the MDB (onMessage), see comment http://www.java.net/forum/topic/glassfish/glassfish/orb-not-listening-se....

Secondly, I tried a conventional cluster of peer brokers using HA-JDBC and Derby on both GlassFish servers to provide the JMS configuration store with HA as well. When I power off the glassfish2 server, messages will send (via ORB or direct) without error, but never be picked up from the queue. You can see on glassfish1 that instance2 is in stopped state, so it detects the host going down.

This is the code I use in my unit test to see if the queue is empty:

    public final void waitForEmptyQueue(final QueueSession queueSession,
            final Queue queue) throws Exception {
        log.debug("waitForEmptyQueue");
        // Wait for queue to empty
        boolean hasMoreItem = true;
        while (hasMoreItem) {
            final QueueBrowser browser = queueSession.createBrowser(queue);
            hasMoreItem = browser.getEnumeration().hasMoreElements();
            browser.close();
            try {
                Thread.sleep(100);
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }
    }

Then if I power on glassfish2, but do not start GlassFish then things work as expected. This is a big issue since you cannot guarantee a host will come back online in a timely manner. Is there some type of configuration that will prevent this zombie condition?

I also tried this with glassfish1 (leaving glassfish2 running) and the hang only occurs going direct:

13:47:49.705 [main] WARN  javax.jms - [C4003]: Error occurred on connection creation [glassfish1.corp.local:27676]. - cause: java.net.ConnectException: Connection refused
13:47:52.936 [main] WARN  javax.jms - [C4003]: Error occurred on connection creation [glassfish1.corp.local:27676]. - cause: java.net.ConnectException: Connection refused
13:47:56.051 [main] DEBUG com.bhn.services.jms.ClusterTest - waitForEmptyQueue

When I power glassfish1 back on things start working again. It appears that the host must be up even if GlassFish is not running on it.