Skip to main content

Glassfish hangs after a few days

24 replies [Last post]
kolmis
Offline
Joined: 2006-12-07
Points: 0

We're experiencing glassfish v2ur1 hanging. A few webapps and web services are deployed on our installation. What happens is that the server stops responding - we are able to establish a connection to the server, but that as far as we get, no response is received, the connection just waits for input.

We noticed that glassfish cpu utilization is at 100% when the "hang" happens. The server is not able to process any requests afterwards, can run in this state for hours (utilizing 100% cpu time) The server log doesn't contain any ERROR/WARNING messages or outofmemory exceptions (as was hapenning with v1).

We have acceptor threads count set to 1, request processing threads max set to 60.

We suspect that it's one of our poorly written webapps (a webapp for large table reports, it uses apache tiles templates with huge string contents) that is causing these "freezes". So it's probably an enduser triggered problem.

JVM used is 1.6.0_02, the system runs linux.

Does anyone have a suggestion on tracking the source of this problem ?

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
ggierer
Offline
Joined: 2007-12-09
Points: 0

There were issues with v2u1 (we had exactly the same problems). Try v2u2 or better still v2.1 with a newer jdk. This should fix any problems.

britton_laroche
Offline
Joined: 2009-06-16
Points: 0

[b]HTTP Connection Hanging on Glassfish[/b]
----------------------------------------------------------

I believe the problem is in the Glassfish thread management system. Glassfish uses the grizzly project under the covers. Thread management issues made it difficult for a server to scale to thousands of users. The Grizzly NIO and Web framework has been designed to help developers to take advantage of the Java™ NIO API. The goal of Grizzly is to help developers to build scalable and robust servers using NIO and extended framework components such as the Web Framework (HTTP/S).

It is here in the grizzly web framework component that we encounter the following bugs.

Bug 547 seems to explain the http connection hanging.

Grizzly Bug 547
-----------------
https://grizzly.dev.java.net/issues/show_bug.cgi?id=547
Blocking close(): Grizzly makes Glassfish hang if HTTP clients hang

[b]HTTPS connection hanging on Glassfish[/b]
----------------------------------------------------------

We've noticed a similar https hang on Glassfish as well.

We are running on Glassfish v2.1 and we have 1000 Acceptor Threads configured for https. After about 2 weeks, all of the https connections hang. We cannot connect to the Glassfish application server via https when this happens. We can connect via http, it works fine. Leading up to this problem we see intermittent broken pipes and JDBC connectivity issues in the server.log over a period of time. The theory is that the https connections are not disconnecting when an network related error occurs. This eventually consumes all 1000 acceptor connector threads.

After investigating the https connectivity issues we found Grizzly bug number 680 and we think it may be related to the https hang.

Grizzly Bug 680
-------------------
https://grizzly.dev.java.net/issues/show_bug.cgi?id=680
SSL client may hang with 1.9.16 when send series packets to server then wait for a reply
Fixed in Grizly 1.9.17 (August 12th 2009)
Fixed Issue List: http://tinyurl.com/ohyjs4

Additionally, we may have a seperate unrelated application error consuming heap space. We found bug number 730 that may be related to the heap space memory error.

Grizzly Bug 730
-------------------
https://grizzly.dev.java.net/issues/show_bug.cgi?id=730
ServletAdapter.destroy() doesn't destroy wrapped HttpServlet

We think upgrading to grizzly version 1.9.18a will solve this problem. We'd like to know if replacing the grizzly files in our current configuration would be supported.
Netstat shows about 5,000 https connections in CLOSE_WAIT state.

TCP: IPv4
Local Address Remote Address Swind Send-Q Rwind Recv-Q State
-------------------- -------------------- ----- ------ ----- ------ -----------
172.18.244.177.443 172.18.244.5.9144 17520 0 33304 0 CLOSE_WAIT
172.18.244.177.443 172.18.244.6.41429 17520 0 33304 0 CLOSE_WAIT
172.18.244.177.443 172.18.244.5.51464 17520 0 33304 0 CLOSE_WAIT
172.18.244.177.443 172.18.244.6.20586 17520 0 33304 0 CLOSE_WAIT
172.18.244.177.443 172.18.244.6.55477 17520 0 33304 0 CLOSE_WAIT

The logs show us the following types of errors:

(1) javax.enterprise.system.stream.err in the httpSSLWorkerThread from (com.sun.enterprise.web.connector.grizzly.ssl.SSLWorkerThread.run)
Caused by: java.net.SocketException: Broken pipe

(2) javax.enterprise.system.stream.err in the httpSSLWorkerThread (grizzly.ssl.SSLWorkerThread)
java.lang.NullPointerException

(3) javax.enterprise.system.stream.out in the httpSSLWorkerThread
java.net.ConnectException: Connection refused

(4) java.lang.OutOfMemoryError: Java heap space

(5) java.net.SocketException: Invalid argument

[b]Possible fix, upgrade to Grizzly 1.9.18a (or latest version of Grizzly)[/b]
----------------------------------------------------------

Grizzly 1.9.18a
----------------
https://grizzly.dev.java.net/
https://grizzly.dev.java.net/issues/buglist.cgi?Submit+query=Submit+quer...

Jeanfrancois Arcand

Salut,

glassfish@javadesktop.org wrote:
> We are running on glassfish v2.1 and we have 1000 Acceptor Threads configured for https.

Is there any reason why you set up 1000 Acceptor Threads. Usually that
value should be based on the number of core/processor. Instead, you
should configure the thread-count value of request-processing element in
domain.xml to 1000.

After about 2 weeks, all of the https connections hang. We cannot
connect to the Glassfish application server via https when this happens.
We can connect via http, it works fine. Leading up to this problem we
see intermittent broken pipes and JDBC connectivity issues in the
server.log over a period of time. The theory is that the https
connections are not disconnecting when an network related error occurs.
This eventually consumes all 1000 acceptor connector threads.
>
> After investigating the https connectivity issues we found Grizzly bug number 680 and we think it may be related to the https hang.
>
> Additionally, we may have a seperate unrelated application error consuming heap space. We found bug number 730 that may be related to the heap space memory error.
>

Hum the issue you are refering only apply to Grizzly 1.9 and up. The
Grizzly you are using use grizzly 1.0.x, which doesn't have those issues.

> Grizzly Bug 730
> -------------------
> https://grizzly.dev.java.net/issues/show_bug.cgi?id=730
> ServletAdapter.destroy() doesn't destroy wrapped HttpServlet

TRhe grizzly Servlet Container is *not* used by GlassFish v2/v3.
GlassFish uses a modified version of Tomcat 5.

>
> Grizzly Bug 680
> -------------------
> https://grizzly.dev.java.net/issues/show_bug.cgi?id=680
> SSL client may hang with 1.9.16 when send series packets to server then wait for a reply
> Fixed in Grizly 1.9.17 (August 12th 2009)
> Fixed Issue List: http://tinyurl.com/ohyjs4

>
> Grizly 1.9.18a
> ----------------
> https://grizzly.dev.java.net/
> https://grizzly.dev.java.net/issues/buglist.cgi?Submit+query=Submit+quer...
>
>
> We think upgrading to grizzly version 1.9.18a will solve this problem. We'd like to know if replacing the grizzly files in our current configuration would be supported.
> Netstat shows about 5,000 https connections in CLOSE_WAIT state.

Yes that's could be an issue, which JDK are you using? You should try
with the latest Grizzly 1.0.30 jar to see if the problem still occurs.
You can download it from here

* http://is.gd/3llfs

Just add the jar to your classpath-prefix attribute in domain.xml

>
> TCP: IPv4
> Local Address Remote Address Swind Send-Q Rwind Recv-Q State
> -------------------- -------------------- ----- ------ ----- ------ -----------
> 172.18.244.177.443 172.18.244.5.9144 17520 0 33304 0 CLOSE_WAIT
> 172.18.244.177.443 172.18.244.6.41429 17520 0 33304 0 CLOSE_WAIT
> 172.18.244.177.443 172.18.244.5.51464 17520 0 33304 0 CLOSE_WAIT
> 172.18.244.177.443 172.18.244.6.20586 17520 0 33304 0 CLOSE_WAIT
> 172.18.244.177.443 172.18.244.6.55477 17520 0 33304 0 CLOSE_WAIT
>
> The logs show us the following types of errors:
>
> (1) javax.enterprise.system.stream.err in the httpSSLWorkerThread from (com.sun.enterprise.web.connector.grizzly.ssl.SSLWorkerThread.run)
> Caused by: java.net.SocketException: Broken pipe
>
> (2) javax.enterprise.system.stream.err in the httpSSLWorkerThread (grizzly.ssl.SSLWorkerThread)
> java.lang.NullPointerException

Do you have the complete stack trace?

>
> (3) javax.enterprise.system.stream.out in the httpSSLWorkerThread
> java.net.ConnectException: Connection refused
>
> (4) java.lang.OutOfMemoryError: Java heap space

That one won't help for sure.

>
> (5) java.net.SocketException: Invalid argument
> [Message sent by forum member 'britton_laroche' (britton_laroche@yahoo.com)]

This is a JDK issue:

* http://bugs.sun.com/view_bug.do?bug_id=6693490

A+

-- Jeanfrancois

>
> http://forums.java.net/jive/thread.jspa?messageID=364125
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

britton_laroche
Offline
Joined: 2009-06-16
Points: 0

Jeanfrancois,

Thank you so much for your clarification. I do not know where or how to find out what version of grizzly we have. We are using java jdk 1.6.0_10 32 bit version on 64 bit Solaris 10 Sun X4100 server. Where is grizzly jar located in the domain.xml file exactly? I cant find it. How exactly would we replace the grizzly jar files?

Jar files listed in our domain.xml
-----------------------------------------------
-Dcom.sun.enterprise.taglibs=appserv-jstl.jar,jsf-impl.jar
-Dcom.sun.enterprise.taglisteners=jsf-impl.jar

Acceptor thread configuration
-----------------------------------------------








You suggest 1000 for thread-count="5".
i.e. thread-count="1000"

Should we also increase the keep alive thread-count setting to 1000?

What about the http-listener acceptor-threads="1000" should we lower that? I'm thinking 1,000 is a bit high for a two processor dual core 2280 (2.4GHz) on a Sun X4100, perhaps 100 or 250 max?

Stack Traces
-----------------------------------------------------
Attached is a file with some stack trace examples of things we are seeing in the server.log

Jeanfrancois Arcand

Salut,

glassfish@javadesktop.org wrote:
> Jeanfrancois,
>
> Thank you so much for your clarification. I do not know where or how to find out what version of grizzly we have.

Probably 1.0.23 which is included in lib/appserv-rt.jar

>
> We are using java jdk 1.6.0_10.
>
> Where is grizzly jar located in the domain.xml file exactly? I cant find it. How would we replace the grizzly jar files?

Just add the following:

(look for java-config element)

>
> Jar files listed in our domain.xml
> -----------------------------------------------
> -Dcom.sun.enterprise.taglibs=appserv-jstl.jar,jsf-impl.jar
> -Dcom.sun.enterprise.taglisteners=jsf-impl.jar
>
>
>
>
> Acceptor thread configuration
> -----------------------------------------------
>
>
>
>
>

>
>
>

>
>
>
>

How many core/processor are you running on? It would be better to set
acceptor-threads="1/NumberOfCore" and instead set the 1000 with:

> > crement="1"/>

But 1000 is really really high IMO. I would suspect you will get much
better performance by setting the value between 80 and 120.

> Stack Traces
> -----------------------------------------------------
> Attached is a file with some stack trace examples of things we are seeing in the server.log

Thanks...will take a look!

-- Jeanfrancois

> [Message sent by forum member 'britton_laroche' (britton_laroche@yahoo.com)]
>
> http://forums.java.net/jive/thread.jspa?messageID=364307
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

britton_laroche
Offline
Joined: 2009-06-16
Points: 0

Salut,

Thanks again for your help, Jeanfrancois. We will try the classpath-prefix to see if that helps...

Is there a reason we should not try the 1.9.18a jar file instead?

Thanks,

Britton

Jeanfrancois Arcand

Salut,

glassfish@javadesktop.org wrote:
> Salut,
>
> Thanks again for your help, Jeanfrancois. We will try the classpath-prefix to see if that helps...
>
> >
> Is there a reason we should not try the 1.9.18a jar file instead?

The API are not compatible, e.g in 1.9 we use com.sun.grizzly, and v2
expect com.sun.enterprise.web.connector.grizzly.

We still maintains Grizzly 1.0.x and do frequent release under the
http://grizzly.dev.java.net project.

A+

-- Jeanfrancois

>
> Thanks,
>
> Britton
> [Message sent by forum member 'britton_laroche' (britton_laroche@yahoo.com)]
>
> http://forums.java.net/jive/thread.jspa?messageID=364466
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

britton_laroche
Offline
Joined: 2009-06-16
Points: 0

Jeanfrancois,

You are awesome, one last question

Acceptor Threads Setting
--------------------------
I assume the metric is 1 x the number of cores. We have two processor dual core 2280 (2.4GHz) on a Sun X410. That is 4 cores.

So our acceptor thread setting should be 4?

Https Hang
--------------------------

We are able to consistently reproduce the https hang. When we run enough load to cause an application time out, (and intermittently disable network connectivity to other servers like MySQL) we get close_waits on port 443 that eventually cleanup, but we cannot connect via https after that. We are testing the latest grizzly-1.0.30.jar right now.

I will let you know the results.

Thanks,

Britton

Jeanfrancois Arcand

Salut,

glassfish@javadesktop.org wrote:
> Jeanfrancois,
>
> You are awesome, one last question
>
> Acceptor Threads Setting
> --------------------------
> I assume the metric is 1 x the number of cores. We have two processor dual core 2280 (2.4GHz) on a Sun X410. That is 4 cores.
>
> So our acceptor thread setting should be 4?

Yes.

>
> We are able to consistently reproduce the https hang.

Great. If you can share a test case I can play with, I will come with a fix.

When we run enough load to cause an application time out, we get
close_waits on port 443 that never clean up.

Yes that's an issue for sure.

We are testing the latest grizzly-1.0.30.jar right now.
>
> I will let you know the results.

OK if you can try with acceptor-thread="1" (default) and 4 that would be
good and will help me found the issue.

Thanks!

-- Jeanfrancois

>
> Thanks,
>
> Britton
> [Message sent by forum member 'britton_laroche' (britton_laroche@yahoo.com)]
>
> http://forums.java.net/jive/thread.jspa?messageID=364483
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

britton_laroche
Offline
Joined: 2009-06-16
Points: 0

Salut,

Its a difficult process to repeat. We only get it to completely hang when its correlated with an out of heap space error. We may have a memory leak in our application that we believe is related to exception handling. We cannot determine if its our memory leak alone that is causing the problem.

Regardless we can reproduce the problem. The difficulty we currently face is in determining whether its our issue alone, or if grizzly has a problem too.

Our time is limited so our focus will shift to cleaning up our application. If there is an issue in grizzly it may be possible to reproduce it with intermittent network errors through JDBC.

A simple jsp page requesting a large data set from a jdbc data source would be the first step. Then run requests through https against the jsp page with load from jmeter. While the test is running, modify the local firewall policy on the box to drop connectivity to the database sever, and then bring it back up and down repeatedly. I think this would do the trick. The jdbc connection pooling should be set to table based polling. This way the connection pool will be refreshed. If there is an issue in grizzly you'd see the close_waits, and eventually you'd see an https freeze that would not be recoverable. We don't have the time to code a simple jdbc app and do testing against it.

Right now I'm suspecting its our application alone that is the root cause.

Thanks,

Britton

jfarcand
Offline
Joined: 2003-06-10
Points: 0

Salut,

I did a test today to see if I can reproduce your issue. One thing that scared me is the fact you are experiencing CLOSE_WAIT socket. I created an OOM and see how GF was reacting. It's possible that the out of memory error is causing threads to terminate (with the uncaught exception) with the result that they aren't calling the close method (connections can stay in the CLOSE_WAIT forever until the close method is invoked). I'm not sure I can fix that in Grizzly, but I suspect that's the only way the socket can stay in the CLOSE_WAIT state. So if you fix the application OOM I suspect sockets will always get closed.

A+

-- Jeanfrancosi

tmpsa
Offline
Joined: 2010-02-01
Points: 0

We have the same kind of problem. After some time, netstat on our server shows a lot of sockets in CLOSE_WAIT state, and they never go away. All have a non-empty Recv-Q. After a while, the server webapp hangs completely; no more sockets, I guess.

We run Glassfish v2.1 on Linux. The webapp uses HTTPS.
The connection is not terribly fast, and the clients upload images and other data.

Any ideas on how to tackle this?

Oleksiy Stashok

Please let me know which exactly GF version (build, patch) you're using.

Thank you.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

tmpsa
Offline
Joined: 2010-02-01
Points: 0

The Administration Console says:

Sun GlassFish Enterprise Server v2.1.1 ((v2.1 Patch06)(9.1_02 Patch12)) (build b31g-fcs)

Jeanfrancois Arcand

Salut,

glassfish@javadesktop.org wrote:
> We're experiencing glassfish v2ur1 hanging. A few webapps and web services are deployed on our installation. What happens is that the server stops responding - we are able to establish a connection to the server, but that as far as we get, no response is received, the connection just waits for input.
>
> We noticed that glassfish cpu utilization is at 100% when the "hang" happens. The server is not able to process any requests afterwards, can run in this state for hours (utilizing 100% cpu time) The server log doesn't contain any ERROR/WARNING messages or outofmemory exceptions (as was hapenning with v1).
>
> We have acceptor threads count set to 1, request processing threads max set to 60.
>
> We suspect that it's one of our poorly written webapps (a webapp for large table reports, it uses apache tiles templates with huge string contents) that is causing these "freezes". So it's probably an enduser triggered problem.
>
> JVM used is 1.6.0_02, the system runs linux.
>
> Does anyone have a suggestion on tracking the source of this problem ?

Yes, next time it happens do a jstack and post it here. Will be
easy to determine who is eating the thread (because this is what
happening, all the thread gets busy and new requests gets queued by
Grizzly, waiting for a thread to execute them).

Thanks!

-- Jeanfrancois

> [Message sent by forum member 'kolmis' (kolmis)]
>
> http://forums.java.net/jive/thread.jspa?messageID=256946
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

kolmis
Offline
Joined: 2006-12-07
Points: 0

I'm attaching the jstack log, I'm not sure it's complete. this time there was an outofmemory (Java heap space) error logged. Can anything be deducted from the log ?

sdo
Offline
Joined: 2005-05-23
Points: 0

There are lots of idle grizzly threads in there, so it doesn't appear to be a container issue. In fact, when you say you can make new connections but not get a response, do you mean not get a response from your app? If you can visit http://hostname:port/index.html and see the standard appserver index.html (which given this stack trace is what ought to happen), that's more confirmation that it's an application error (though the application error I'm about to describe could cause all grizzly threads to hang as well, so if that doesn't work, it's not necessarily that there is a container issue either).

See in the stack in the trace where there is a thread doing some postgres work? That might be blocking some database resource that everything else in your app needs. I'd take jstacks over a few minutes and see if you ever break out of that; it's possible that query is just taking a really long time to process and has blocked the rest of your app. Or perhaps if you examine further stacks over time, you can deduce what's hanging up your app.

kolmis
Offline
Joined: 2006-12-07
Points: 0

glassfish is not able to serve even static context when in this state, e.g. that is browsing http://serverhost stalls (not sure for how long though). it's possible that the server would recover from this state after a few days/weeks.

yes, i noticed the postgres thread, but could make anything out of it.

i need to get more acquainted with jstack and find a way of simulating this behavior to get rid of this hanging.

chilak
Offline
Joined: 2006-04-10
Points: 0

Did you figure out what it was?

kolmis
Offline
Joined: 2006-12-07
Points: 0

Not yet. It seems to me that our glassfish installation will need some JVM garbage collector tweaking.

Let's say a single web request to our web app generates a 100MB sized object and it takes some time to create it and serve it (through a JSP page). A series of these requests is made to the web app. Which garbage collector would be suitable for being able to handle this situation ? What gc options are to be considered ?

philestine
Offline
Joined: 2008-11-13
Points: 0

Did you guys ever get to the bottom of this.

I am experiencing similar issues. My glassfish hangs after a few weeks of uptime.
Thanks,
Phil

phyl_martin
Offline
Joined: 2009-09-10
Points: 0

Any further discussions on this? I encountered the same problem.

Legolas wood

Can you let me know where I can configure accepter thread count and
request processing thread count?
Thanks

glassfish@javadesktop.org wrote:
> We're experiencing glassfish v2ur1 hanging. A few webapps and web services are deployed on our installation. What happens is that the server stops responding - we are able to establish a connection to the server, but that as far as we get, no response is received, the connection just waits for input.
>
> We noticed that glassfish cpu utilization is at 100% when the "hang" happens. The server is not able to process any requests afterwards, can run in this state for hours (utilizing 100% cpu time) The server log doesn't contain any ERROR/WARNING messages or outofmemory exceptions (as was hapenning with v1).
>
> We have acceptor threads count set to 1, request processing threads max set to 60.
>
> We suspect that it's one of our poorly written webapps (a webapp for large table reports, it uses apache tiles templates with huge string contents) that is causing these "freezes". So it's probably an enduser triggered problem.
>
> JVM used is 1.6.0_02, the system runs linux.
>
> Does anyone have a suggestion on tracking the source of this problem ?
> [Message sent by forum member 'kolmis' (kolmis)]
>
> http://forums.java.net/jive/thread.jspa?messageID=256946
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

kolmis
Offline
Joined: 2006-12-07
Points: 0

Admin web app -> Configuration -> HTTP service

for request processing tuning go to the RequestProcessing tab, for acceptor thread modifications choose a http-listener and modify the Acceptor Threads value. (or request-processinging tag in domain.xml)

check this blog entry for a few tips on how to set these values (and an explanation)
http://weblogs.java.net/blog/sdo/archive/2007/12/a_glassfish_tun.html

> Can you let me know where I can configure accepter
> thread count and
> request processing thread count?