Skip to main content

Deadlock in Transaction Recovery in Glassfish 3.1.2.2

4 replies [Last post]
sbarlabanov
Offline
Joined: 2011-09-24
Points: 0

Hi,
we just discovered a deadlock between several Glassfish threads when starting the server.
We have a simple @Singleton @Startup EJB with a @PostConstruct method. The code inside this @PostConstruct method tries to read some data from the database using injected JPA EntityManager. The Glassfish main thread hangs exactly while executing this singleton. The deadlock occurs while it tries to obtain a connection from a pool and tries invoke RecoveryManager.waitForResync. There are another two threads taking part on the deadlock: JTS Resync Thread (waiting on EventSemaphore inside of RecoveryManager.proceedWithXARecovery) and Recovery Helper Thread (blocked on com.sun.hk2.component.SingletonInhabitant). The main thread is waiting on EventSemaphore inside of RecoveryManager.waitForResync. So it seems that the main thread and Recovery Helper Thread are blocking each other. Seems that Recovery Helper Thread has to do something and than notify all parties waiting on EventSemaphore, but it can't complete its tasks since it is blocked on SingletonInhabitant.get, which is owned by the main thread.
We configured the transaction service with the following properties in our domain.xml:

      <transaction-service tx-log-dir="${com.sun.aas.instanceRoot}/logs" 
              keypoint-interval="2048" automatic-recovery="true">
        <property name="pending-txn-cleanup-interval" value="600"></property>
      </transaction-service>

The stack traces of the main thread and the two JTS threads are attached to the topic.
Is it a bug or did we misconfigured something?

Best regards,
Sergiy

AttachmentSize
stacktraces.txt11.91 KB

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
mvatkina
Offline
Joined: 2005-04-04
Points: 0

Do you have a RAR in your app? If yes, can you make it a stand-alone RAR
instead? Another option is to disable auto-recovery on startup (but that
would also disable the delegated recovery in a cluster).

-marina

forums@java.net wrote:
> Hi, we just discovered a deadlock between several Glassfish threads when
> starting the server. We have a simple @Singleton @Startup EJB with a
> @PostConstruct method. The code inside this @PostConstruct method
> tries to
> read some data from the database using injected JPA EntityManager. The
> Glassfish main thread hangs exactly while executing this singleton. The
> deadlock occurs while it tries to obtain a connection from a pool and
> tries
> invoke RecoveryManager.waitForResync. There are another two threads
> taking
> part on the deadlock: JTS Resync Thread (waiting on EventSemaphore
> inside of
> RecoveryManager.proceedWithXARecovery) and Recovery Helper Thread
> (blocked on
> com.sun.hk2.component.SingletonInhabitant). The main thread is waiting on
> EventSemaphore inside of RecoveryManager.waitForResync. So it seems
> that the
> main thread and Recovery Helper Thread are blocking each other. Seems
> that
> Recovery Helper Thread has to do something and than notify all parties
> waiting on EventSemaphore, but it can't complete its tasks since it is
> blocked on SingletonInhabitant.get, which is owned by the main thread. We
> configured the transaction service with the following properties in our
> domain.xml: The stack traces of the main thread and the two JTS
> threads are
> attached to the topic. Is it a bug or did we misconfigured something?
> Best
> regards, Sergiy
>
> --
>
> [Message sent by forum member 'sbarlabanov']
>
> View Post: http://forums.java.net/node/892882
>
>
>

sbarlabanov
Offline
Joined: 2011-09-24
Points: 0

No. We do not have any RAR in our application. Our application is a JEE6 WAR.
And to disable auto-recovery is not an option for us. Actually we've just switched it on, because we need it.
For me it looks like a bug in auto-recovery implementation of Glassfish, since it is a classic deadlock inside Glassfish system threads. With this bug autorecovery is not really useful with JEE6 startup singletons accessing XA resources. And I do not think that it's a rare case to have such singletons in a JEE application.

Best regards,
Sergiy

mvatkina
Offline
Joined: 2005-04-04
Points: 0

Without a RAR involved, we can never reproduce the problem (even with a RAR it's not easily reproducible). Our own devtest creates a timer (timers are stored in an XA resource) in a PostConstruct of a @Startup @Singleton on a clustered instance and it never fails.

If you can create a reproducible test case, please file a JIRA issue.

sbarlabanov
Offline
Joined: 2011-09-24
Points: 0

We investigated the case and found some piece of code in Glassfish causing the deadlock. Details are here: http://www.java.net/forum/topic/glassfish/glassfish/deadlock-appserverst....