Skip to main content

JMS HA has issues 3

Please note these forums are being decommissioned and use the new and improved forums at
Posted by fnikola on April 11, 2013 at 12:57 PM PDT



  • cluster
  • ssh node glassfish1
  • instance1


  • ssh node glassfish2
  • instance2

Firstly, I started with a conventional cluster with master broker, but if you lose the master broker (i.e. power it off), the rest of the instances on the other servers accept messages, but they never reach the MDB (onMessage), see comment

Secondly, I tried a conventional cluster of peer brokers using HA-JDBC and Derby on both GlassFish servers to provide the JMS configuration store with HA as well. When I power off the glassfish2 server, messages will send (via ORB or direct) without error, but never be picked up from the queue. You can see on glassfish1 that instance2 is in stopped state, so it detects the host going down.

This is the code I use in my unit test to see if the queue is empty:

    public final void waitForEmptyQueue(final QueueSession queueSession,
            final Queue queue) throws Exception {
        // Wait for queue to empty
        boolean hasMoreItem = true;
        while (hasMoreItem) {
            final QueueBrowser browser = queueSession.createBrowser(queue);
            hasMoreItem = browser.getEnumeration().hasMoreElements();
            try {
            } catch (InterruptedException e) {
                throw new RuntimeException(e);

Then if I power on glassfish2, but do not start GlassFish then things work as expected. This is a big issue since you cannot guarantee a host will come back online in a timely manner. Is there some type of configuration that will prevent this zombie condition?

I also tried this with glassfish1 (leaving glassfish2 running) and the hang only occurs going direct:

13:47:49.705 [main] WARN  javax.jms - [C4003]: Error occurred on connection creation [glassfish1.corp.local:27676]. - cause: Connection refused
13:47:52.936 [main] WARN  javax.jms - [C4003]: Error occurred on connection creation [glassfish1.corp.local:27676]. - cause: Connection refused
13:47:56.051 [main] DEBUG - waitForEmptyQueue

When I power glassfish1 back on things start working again. It appears that the host must be up even if GlassFish is not running on it.