Skip to main content

Memory Leak on 9.1_02?

28 replies [Last post]
jfaldmo
Offline
Joined: 2008-05-02

I have been fighting Application Server performance issues for a couple of weeks now. Our Application Performance Monitoring tool is showing that the heap size is dropping for the application in the clustered environment. Please see attached image. This application is also on Application Server 7 and its heap size stays at 1024.

Can someone confirm if there is known memory leak for Sun Java System Application Server 9.1_02?

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
jfaldmo
Offline
Joined: 2008-05-02

I saw this in the log file.
[#|2009-01-10T11:57:56.678-0700|SEVERE|sun-appserver9.1|javax.enterprise.system.tools.deployment|_ThreadID=15;_ThreadName=pool-1-thread-17;param-name;fork;_RequestID=c4963d00-3de3-4837-8ca1-45a04297ff68;|"DPL8007: Invalid Deployment Descriptors element param-name value fork"|#]

[#|2009-01-10T11:57:56.698-0700|SEVERE|sun-appserver9.1|javax.enterprise.system.tools.deployment|_ThreadID=15;_ThreadName=pool-1-thread-17;param-value;false;_RequestID=c4963d00-3de3-4837-8ca1-45a04297ff68;|"DPL8007: Invalid Deployment Descriptors element param-value value false"|#]

jluehe
Offline
Joined: 2004-12-01

Looks like you did not wrap the "fork" param inside an element.
If you did, then the complete declaration of the JspServlet in default-web.xml would look as follows:


jsp
org.apache.jasper.servlet.JspServlet

xpoweredBy true

fork false

3

jfaldmo
Offline
Joined: 2008-05-02

You're right, I put it in the wrong location. Thanks for the complete declaration.

Jan Luehe

On 01/07/09 15:26, glassfish@javadesktop.org wrote:
> Here is a explanation of the issue my co-worker sent to the powers that be within my company. Please reply if you agree or disagree with his assessment.
> ---------------------------------
>
> Overview:
> Upon server startup there is a finite number of threads available to service user requests. When a thread comes in requesting a page that has not yet been viewed since the last time the server started, the server will block all other requestors to that page until the page has been compiled.
>
> In order to compile the page the server forks a new process (not an in-process thread) that executes the compile. Upon completion of the compile (successful return code from the fork) the initiating thread is released to service the page request, as are all the additional blocked threads for that same page.
>
> The error occurs when the forked process fails to return. From the system's perspective it is still running. It is important to differentiate this from a full failure in the compile process which would release the lock and display an error message. In this case the process does not return at all and the lock persists indefinitely.
>
> Overtime additional requests for this orphaned page will each take a thread out of the available pool and place it in line behind the lock. Eventually all threads from the finite available pool are consumed waiting for the lock and the server becomes unresponsive and must be restarted.
> Ugly Details follow:
>
> Likely root cause: Known issue with JVM on Solaris 10 (exec hangs): http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6671051
>
> Likely solution: upgrade to at least java 1.6 update 7 (update 11 is latest version)
>
> Possible work around: Precompiling JSP files will reduce but not eliminate calls to UnixProcess.forkAndExec.
>

Great analysis! Thanks!

Another workaround would be to set the JspServlet init parameter named
"fork" to false
(default is true), by adding the following snippet to the declaration of
the JspServlet in your
domain's default-web.xml:

fork false

Can you please give this a try and report back to us?

Thanks,

Jan

> Obviously we will have to do a great deal of testing in order to upgrade the JVM, but I think that an upgrade is the only actual solution. The calls to UnixProcess.forkAndExec are not isolated to JSP compilation, but are used throughout the application server and this bug can result in numerous other unexplained process hangs.
>
> Analyzing the threads available on the application server we see the following:
> ##############################################################################
> 1 Thread executing the forked process:
> Thread t@4193: (state = IN_NATIVE)
> - java.lang.UNIXProcess.forkAndExec(byte[], byte[], int, byte[], int, byte[], boolean, java.io.FileDescriptor, java.io.FileDescriptor, java.io.FileDescriptor) @bci=1350864168 (Interpreted frame)
>
> 1 Thread in a WAITING state waiting on the return of the forked process:
> Thread "http38183-Processor11" thread-id 4,133 thread-stateWAITINGWaiting on lock: java.lang.UNIXProcess@1aa11a8
>
> 498 Threads in a BLOCKED state waiting for the lock to be released by the WAITING thread:
> Thread "http38183-Processor12" thread-id 4,134 thread-stateBLOCKEDWaiting on lock:
> Owned by: http38183-Processor11 Id: 4,133
> ##############################################################################
>
> This situation persisted over the course of several hours with no change.
>
> Here is the location of the lock in Sun's source code (JSPServletWrapper.service()):
> if (!options.getUsePrecompiled()
> && (options.getDevelopment() || firstTime)) {
> // END S1AS 6181923
> synchronized (this) {
> firstTime = false;
>
> // The following sets reload to true, if necessary
> ctxt.compile();
> }
> } else {
> if (compileException != null) {
> // Throw cached compilation exception
> throw compileException;
> }
> }
>
> It is important to note that the lock itself is entirely appropriate. It would be incorrect to allow two simultaneous threads to compile the same jsp page twice. At issue is the fact that the forked process behind the lock does not return at all.
> [Message sent by forum member 'jfaldmo' (jfaldmo)]
>
> http://forums.java.net/jive/thread.jspa?messageID=324653
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

jfaldmo
Offline
Joined: 2008-05-02

There problems yesterday and again today. Yesterday does not appear to be a forked process problem but todays problem does. I had a difficult time getting it started up again but I enabled the setting and got it working again just by restarting multiple times.

jfaldmo
Offline
Joined: 2008-05-02

Here is a explanation of the issue my co-worker sent to the powers that be within my company. Please reply if you agree or disagree with his assessment.
---------------------------------

Overview:
Upon server startup there is a finite number of threads available to service user requests. When a thread comes in requesting a page that has not yet been viewed since the last time the server started, the server will block all other requestors to that page until the page has been compiled.

In order to compile the page the server forks a new process (not an in-process thread) that executes the compile. Upon completion of the compile (successful return code from the fork) the initiating thread is released to service the page request, as are all the additional blocked threads for that same page.

The error occurs when the forked process fails to return. From the system's perspective it is still running. It is important to differentiate this from a full failure in the compile process which would release the lock and display an error message. In this case the process does not return at all and the lock persists indefinitely.

Overtime additional requests for this orphaned page will each take a thread out of the available pool and place it in line behind the lock. Eventually all threads from the finite available pool are consumed waiting for the lock and the server becomes unresponsive and must be restarted.
Ugly Details follow:

Likely root cause: Known issue with JVM on Solaris 10 (exec hangs): http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6671051

Likely solution: upgrade to at least java 1.6 update 7 (update 11 is latest version)

Possible work around: Precompiling JSP files will reduce but not eliminate calls to UnixProcess.forkAndExec.

Obviously we will have to do a great deal of testing in order to upgrade the JVM, but I think that an upgrade is the only actual solution. The calls to UnixProcess.forkAndExec are not isolated to JSP compilation, but are used throughout the application server and this bug can result in numerous other unexplained process hangs.

Analyzing the threads available on the application server we see the following:
##############################################################################
1 Thread executing the forked process:
Thread t@4193: (state = IN_NATIVE)
- java.lang.UNIXProcess.forkAndExec(byte[], byte[], int, byte[], int, byte[], boolean, java.io.FileDescriptor, java.io.FileDescriptor, java.io.FileDescriptor) @bci=1350864168 (Interpreted frame)

1 Thread in a WAITING state waiting on the return of the forked process:
Thread "http38183-Processor11" thread-id 4,133 thread-stateWAITINGWaiting on lock: java.lang.UNIXProcess@1aa11a8

498 Threads in a BLOCKED state waiting for the lock to be released by the WAITING thread:
Thread "http38183-Processor12" thread-id 4,134 thread-stateBLOCKEDWaiting on lock:
Owned by: http38183-Processor11 Id: 4,133
##############################################################################

This situation persisted over the course of several hours with no change.

Here is the location of the lock in Sun's source code (JSPServletWrapper.service()):
if (!options.getUsePrecompiled()
&& (options.getDevelopment() || firstTime)) {
// END S1AS 6181923
synchronized (this) {
firstTime = false;

// The following sets reload to true, if necessary
ctxt.compile();
}
} else {
if (compileException != null) {
// Throw cached compilation exception
throw compileException;
}
}

It is important to note that the lock itself is entirely appropriate. It would be incorrect to allow two simultaneous threads to compile the same jsp page twice. At issue is the fact that the forked process behind the lock does not return at all.

Jeanfrancois Arcand

Thanks!!

From the log you sent me, it is pretty clear that all thread dead end here:

> Thread t@22429: (state = BLOCKED)
> - org.apache.jasper.servlet.JspServletWrapper.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, boolean) @bci=99, line=341 (Interpreted frame)
> - org.apache.jasper.servlet.JspServlet.serviceJspFile(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, java.lang.String, java.lang.Throwable, boolean) @bci=190, line=470 (Interpreted frame)
> - org.apache.jasper.servlet.JspServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=427, line=364 (Interpreted frame)
> - javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=831 (Compiled frame)
> - sun.reflect.GeneratedMethodAccessor87.invoke(java.lang.Object, java.lang.Object[]) @bci=48 (Compiled frame)
> - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=25 (Compiled frame)
> - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame)
> - javax.security.auth.Subject.doAsPrivileged(javax.security.auth.Subject, java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=60, line=517 (Compiled frame)
> - org.apache.catalina.security.SecurityUtil.doAsPrivilege(java.lang.String, javax.servlet.Servlet, java.lang.Class[], java.lang.Object[], java.security.Principal) @bci=73, line=192 (Compiled frame)
> - org.apache.catalina.core.ApplicationDispatcher.doInvoke(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=444, line=855 (Interpreted frame)
> - org.apache.catalina.core.ApplicationDispatcher.invoke(javax.servlet.ServletRequest, javax.servlet.ServletResponse, org.apache.catalina.core.ApplicationDispatcher$State) @bci=68, line=703 (Interpreted frame)

In GlassFish v3 we will have support for timing out "dead thread'
(forcing them to complete) so those situation will be easier to track. I
will forward your email to the jsp leads to see if they think about
something similar for GlassFish 2.x.

A+

-- jeanfrancois

glassfish@javadesktop.org wrote:
> Here is a explanation of the issue my co-worker sent to the powers that be within my company. Please reply if you agree or disagree with his assessment.
> ---------------------------------
>
> Overview:
> Upon server startup there is a finite number of threads available to service user requests. When a thread comes in requesting a page that has not yet been viewed since the last time the server started, the server will block all other requestors to that page until the page has been compiled.
>
> In order to compile the page the server forks a new process (not an in-process thread) that executes the compile. Upon completion of the compile (successful return code from the fork) the initiating thread is released to service the page request, as are all the additional blocked threads for that same page.
>
> The error occurs when the forked process fails to return. From the system's perspective it is still running. It is important to differentiate this from a full failure in the compile process which would release the lock and display an error message. In this case the process does not return at all and the lock persists indefinitely.
>
> Overtime additional requests for this orphaned page will each take a thread out of the available pool and place it in line behind the lock. Eventually all threads from the finite available pool are consumed waiting for the lock and the server becomes unresponsive and must be restarted.
> Ugly Details follow:
>
> Likely root cause: Known issue with JVM on Solaris 10 (exec hangs): http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6671051
>
> Likely solution: upgrade to at least java 1.6 update 7 (update 11 is latest version)
>
> Possible work around: Precompiling JSP files will reduce but not eliminate calls to UnixProcess.forkAndExec.
>
> Obviously we will have to do a great deal of testing in order to upgrade the JVM, but I think that an upgrade is the only actual solution. The calls to UnixProcess.forkAndExec are not isolated to JSP compilation, but are used throughout the application server and this bug can result in numerous other unexplained process hangs.
>
> Analyzing the threads available on the application server we see the following:
> ##############################################################################
> 1 Thread executing the forked process:
> Thread t@4193: (state = IN_NATIVE)
> - java.lang.UNIXProcess.forkAndExec(byte[], byte[], int, byte[], int, byte[], boolean, java.io.FileDescriptor, java.io.FileDescriptor, java.io.FileDescriptor) @bci=1350864168 (Interpreted frame)
>
> 1 Thread in a WAITING state waiting on the return of the forked process:
> Thread "http38183-Processor11" thread-id 4,133 thread-stateWAITINGWaiting on lock: java.lang.UNIXProcess@1aa11a8
>
> 498 Threads in a BLOCKED state waiting for the lock to be released by the WAITING thread:
> Thread "http38183-Processor12" thread-id 4,134 thread-stateBLOCKEDWaiting on lock:
> Owned by: http38183-Processor11 Id: 4,133
> ##############################################################################
>
> This situation persisted over the course of several hours with no change.
>
> Here is the location of the lock in Sun's source code (JSPServletWrapper.service()):
> if (!options.getUsePrecompiled()
> && (options.getDevelopment() || firstTime)) {
> // END S1AS 6181923
> synchronized (this) {
> firstTime = false;
>
> // The following sets reload to true, if necessary
> ctxt.compile();
> }
> } else {
> if (compileException != null) {
> // Throw cached compilation exception
> throw compileException;
> }
> }
>
> It is important to note that the lock itself is entirely appropriate. It would be incorrect to allow two simultaneous threads to compile the same jsp page twice. At issue is the fact that the forked process behind the lock does not return at all.
> [Message sent by forum member 'jfaldmo' (jfaldmo)]
>
> http://forums.java.net/jive/thread.jspa?messageID=324653
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

jfaldmo
Offline
Joined: 2008-05-02

My brilliant co-worker developer was able to read the threaddumps/jstack files. He also download the glassfish code to see the line that was blocking the threads. He was able to determine that the block was happening when the JVM was trying to compile a jsp file. I killed the forked process (saw another process start and then it went away) and the server appears to be responding again. A new jsp file was in the generated directory structure.

There are still 138 blocked threads. Here are the two messages from the thread dump. I have also atteched the files if you are interested.
Thread "RMI ConnectionExpiration-[192.168.58.45:50336,com.sun.appserv.management.client.AdminRMISSLClientSocketFactory@144719c]" thread-id 28,307 thread-stateTIMED_WAITING
at: java.lang.Thread.sleep(Native Method)
at: sun.rmi.transport.tcp.TCPChannel$Reaper.run(TCPChannel.java:446)
at: java.lang.Thread.run(Thread.java:595)

Thread "http38183-Processor32" thread-id 26,502 thread-stateWAITINGWaiting on lock: org.apache.tomcat.util.threads.ThreadPool$ControlRunnable@c37726
at: java.lang.Object.wait(Native Method)
at: java.lang.Object.wait(Object.java:474)
at: org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:639)
at: java.lang.Thread.run(Thread.java:595)

jfaldmo
Offline
Joined: 2008-05-02

Here are the files I mentioned in my previous post.

jfaldmo
Offline
Joined: 2008-05-02

I ran jstack against both pids and wrote them to a file. I also copied and pasted what was printed to the screen in its own files. Last of all I did a generate-jvm-report --type thread.

I'll look at these files, not knowing what I am looking for, and see what I can learn. :)

Jeanfrancois Arcand

glassfish@javadesktop.org wrote:
> I ran jstack against both pids and wrote them to a file. I also copied and pasted what was printed to the screen in its own files. Last of all I did a generate-jvm-report --type thread.
>
> I'll look at these files, not knowing what I am looking for, and see what I can learn. :)

Just send me those files direclty if you can't share them. I will take a
look.

A+

-- jeanfrancois

> [Message sent by forum member 'jfaldmo' (jfaldmo)]
>
> http://forums.java.net/jive/thread.jspa?messageID=324589
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

jfaldmo
Offline
Joined: 2008-05-02

How can I send them to you directly? I don't see an option using the forum.

Jeanfrancois Arcand

glassfish@javadesktop.org wrote:
> How can I send them to you directly? I don't see an option using the forum.

jfarcand@apache.org

-- Jeanfrancois

> [Message sent by forum member 'jfaldmo' (jfaldmo)]
>
> http://forums.java.net/jive/thread.jspa?messageID=324624
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

jfaldmo
Offline
Joined: 2008-05-02

By the way, when I switched to coyote I also increased the thread count to 500 and the keep alive thread count to 1.

From what I can gather from your comments you would recommend I go back to grizzly and follow Jeanfrancois suggestions on his blog to optimize it. I'll do that in our QA environment and have my co-worker run some load tests against it.

Thanks again for your help.

jfaldmo
Offline
Joined: 2008-05-02

So 581 out of 601 threads are blocked, mostly with a message like this--
Thread "http38183-Processor117" thread-id 7,020 thread-stateBLOCKEDWaiting on lock: org.apache.jasper.servlet.JspServletWrapper@6bca60

I don't know what that implies though...

I still have an Sun App Server 7 running with java version "1.4.2_08" for this application (We are trying to migrate everyone to App Server 9 but we are running into this problem). The developer would like to do a thread dump "generate-jvm-report" on App Server 7 but it doesn't appear to be an option. Is there a way to get a thread dump on App Server 7?

jfaldmo
Offline
Joined: 2008-05-02

I found out how to do a thread dump in App Server 7-- kill -3
Then it gets written to the server.log file.

swamyv
Offline
Joined: 2005-11-16

If you are using sun jdk 1.6 or higher then you can also use jstack
. jstack is available in jdk bin directory.

Jeanfrancois Arcand

Salut,

catching up...

glassfish@javadesktop.org wrote:
> Thanks for the quick reply. The situation is a bit complicated but if you have time to review it I would appreciate it.
>
> Environment--
> Sun Java System Application Server 9.1_02
> 2 physical machines
> 4 Applications deployed to 4 different clusters. One is heavily used, one is moderately used, and the other two are minimally used.
> 2 Sun Web Server 6 for load balancing. (Soon to be removed from environment.)
> https passthrough enabled.
> F5 big-ip in Front of Web Servers, for load balancing to the Web Servers. (Will load balance the applications directly soon.)
>
> The heavily used application will usually slows down in the mornings, M-F, until it becomes unresponsive, usually there is just a blank white page and the browser keeps working on the requests. (On the weekends the application usage is cut in half and we do not see the problem.) The other applications don't appear to be affected. If I go to the application directly (through a proxy the fronts our DMZ) on the http listener then there is no problem with the application. Unfortunately our proxy didn't allow https on the non-standard ports the applications were using. I have since changed this so I can check if the https listener is having problems when users say the application is slow. The logs don't show anything that would point to a root cause from what I can see. When I shutdown the cluster instances there are errors but the instances do shutdown. I don't know if that is normal as I have not watched the logs while shutting them down in the past.
>
> Initially I thought it was a problem with the web server. I was able to increase the performance significantly by changing these settings on the web server--
> upped the Acceptor Threads from 1 to 4.
> increased the Max Queue length 8182 (I don't think that mattered now as the peak never reached very high.)
> increased the RqThrottle to 512.
>
> But improving the performance on the web server didn't fix the problem. There were some application code changes which didn't help either. The last change I made was changing the http "engine" from grizzly to coyote. This was done last Friday. There have not been any confirmed problems since then.

You might not have configured enough threads. The difference between
Coyote and Grizzly is that Coyote magically creates 150 threads if you
switch to it.

(This morning there was one site that said the application was slow
again but I wasn't able to duplicate the problem.

That probably means the application's execution is blocked, most of the
time by a database call that never ends up. When you observe a slow
down, would you be able to grab a thread dump so I can see where the
lock happens? Mainly, all Grizzly worker threads (5 by default) are
blocked and all other requests are pilling in a queue, waiting to be
executed. When you are getting blank page, it just means Grizzly start
rejecting requests because its internal queue is full. You can increase
the queue's size by increasing the connection-pool element of
domain.xml, but that will not fix your issue. You either need to
increase the number of threads, or understand why all threads deadlock.
If you can send me a

${java.home}/bin/jstack (using JDK 6 if you can), It will tells us
where it locks.

When asked for more details they said the performance issues went away.)

Yes because Coyote have more threads :-)

>
> So hopefully by changing the http listener or engine (what is the proper term?) to coyote fixed the problem.

It will eventually lock also with Coyote, just need a much more heavier
load.

But since there have been no errors in the logs that would say were the
problem is I am still looking into other possible explanations as to why
this is/was happening.
>
> That is why I am thinking it is a Memory Leak in the App Server. The other possibility I am thinking of is a JDBC connection leak.

I would say lock ( I doubt it is related to a memory leak).

I am looking into enabling monitoring it since it doesn't appear to be
enabled by default.
>
> Anyway, if you have read until here then thank you for your time.

You are welcome.

A+

-- Jeanfrancois

> [Message sent by forum member 'jfaldmo' (jfaldmo)]
>
> http://forums.java.net/jive/thread.jspa?messageID=324113
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

jfaldmo
Offline
Joined: 2008-05-02

What type of errors, if any, would I expect to see in the logs if there are not enough threads? That is one thing I was looking for but didn't see any errors in the logs that would indicate a problem.

Jeanfrancois Arcand

Salut,

glassfish@javadesktop.org wrote:
> What type of errors, if any, would I expect to see in the logs if there are not enough threads?

No errors.

That is one thing I was looking for but didn't see any errors in the
logs that would indicate a problem.

The only way right now (fixed in v3) is to look at the error messages
returned to the browser, which contains information about the internal
queue status (full). I know that sucks :-). I can produce a patch that
output more logs if that can help.

A+

-- Jeanfrancois

> [Message sent by forum member 'jfaldmo' (jfaldmo)]
>
> http://forums.java.net/jive/thread.jspa?messageID=324442
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

jfaldmo
Offline
Joined: 2008-05-02

It hit us again this morning. One instance in the cluster is fine but the other is not responding on its https listener. I took the instance out of the load balancer so users are running on one instance. Do you have suggestions on what I can try see where the problem is while it is having the problem?

By the way on the Sun Web Server 6.1 .perf screen I see a whole bunch of response service-passthrough. When it is performing normally I usually only see a handful.

Jeanfrancois Arcand

Salut,

glassfish@javadesktop.org wrote:
> It hit us again this morning. One instance in the cluster is fine but the other is not responding on its https listener. I took the instance out of the load balancer so users are running on one instance. Do you have suggestions on what I can try see where the problem is while it is having the problem?

With Coyote or with Grizzly?

When it hangs, can you grab a jstack PID? That's the first piece of
information we can use to help.

A+

-- Jeanfrancois

> [Message sent by forum member 'jfaldmo' (jfaldmo)]
>
> http://forums.java.net/jive/thread.jspa?messageID=324564
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

jfaldmo
Offline
Joined: 2008-05-02

Coyote is being used.

When I went looking for the pid I found two which I didn't expect! It looks like one is a parent. I first ran this command /opt/SUNWappserver/scripts /usr/ucb/ps -auxww | grep ionyx-amerigo to get the pids.

I then ran ps -ef | grep 16533 and got the following results.

root 20262 16533 0 Jan 06 ? 0:00 /usr/jdk/instances/jdk1.5.0_14/bin/java -Dcom.sun.aas.instanceRoot=/opt/SUNWapp
root 16533 29643 0 Jan 05 ? 287:04 /usr/jdk/instances/jdk1.5.0_14/bin/java -Dcom.sun.aas.instanceRoot=/opt/SUNWapp

What would cause this?

rpetruzzelli
Offline
Joined: 2007-11-08

This appears related... in the java forum:
Java Virtual Machine (JVM) - jstack shows a thread waiting for monitor entry but it's already acquired:
http://forums.sun.com/thread.jspa?threadID=5234863

rpetruzzelli
Offline
Joined: 2007-11-08

Heap size shrinking is not necessarily a bad thing. Your jpg activity looks okay to me.
Where it could be leaking is in the native threads -- outside the java heap.

jfaldmo
Offline
Joined: 2008-05-02

Thanks for the quick reply. The situation is a bit complicated but if you have time to review it I would appreciate it.

Environment--
Sun Java System Application Server 9.1_02
2 physical machines
4 Applications deployed to 4 different clusters. One is heavily used, one is moderately used, and the other two are minimally used.
2 Sun Web Server 6 for load balancing. (Soon to be removed from environment.)
https passthrough enabled.
F5 big-ip in Front of Web Servers, for load balancing to the Web Servers. (Will load balance directly to the application clusters soon.)

The heavily used application usually slows down in the mornings, M-F, until it becomes unresponsive, usually there is just a blank white page and the browser keeps working on the requests. (On the weekends the application usage is cut in half and we do not see the problem.) The other applications don't appear to be affected. If I go to the application directly (through a proxy that fronts our DMZ) on the http listener then there is no problem with the application. Unfortunately our proxy didn't allow https on the non-standard ports the applications were using. I have since changed this so I can check if the https listener is having problems when users say the application is slow. The logs don't show anything that would point to a root cause from what I can see. When I shutdown the cluster instances there are errors but the instances do shutdown. I don't know if that is normal as I have not watched the logs while shutting them down in the past. But a restart helps usually until the next day.

Initially I thought it was a problem with the Web server. I was able to increase the performance significantly on the Web server by changing these settings on the web server--
upped the Acceptor Threads from 1 to 4.
increased the Max Queue length 8182 (I don't think that mattered now as the peak never reached very high.)
increased the RqThrottle to 512.

But improving the performance on the web server didn't fix the problem. There were some application code changes which didn't help either. The last change I made was changing the http "engine" from grizzly to coyote. This was done last Friday. There have not been any confirmed problems since then. (This morning there was one site that said the application was slow again but I wasn't able to duplicate the problem. When asked for more details they said the performance issues went away.)

So hopefully by changing the http listener or engine (what is the proper term?) to coyote fixed the problem. But since there have been no errors in the logs that would say were the problem is I am still looking into other possible explanations as to why this is/was happening.

That is why I am thinking it is a Memory Leak in the App Server. The other possibility I am thinking of is a JDBC connection leak. I am looking into enabling monitoring it since it doesn't appear to be enabled by default.

Anyway, if you have read until here then thank you for your time.

------
Edited some grammar.
Message was edited by: jfaldmo

Jacob Kessler

The image that you attached doesn't look like you have a memory leak at
all: each of your collections is bringing you from ~700MB to ~250MB, and
you're doing that less than once an hour. A memory leak would show up as
the average heap used after a garbage collection (the drops in average
heap used) climbing over time until it reached the total heap size and
your servers started spending all of their time collecting garbage
(garbage collections every few seconds with no noticeable drop in heap
size) and performance died.

Your comment about the heavily used application slowing down while the
others remain responsive would seem to point to an issue specific to
that application, rather than a general cluster problem. Increasing the
acceptor threads is probably a good thing, you might also look into the
number of worker threads you are using, it should be in domain.xml as
something like

| request-timeout-in-seconds="30" thread-count="130" thread-increment="10"/>|

In general, thread-count should be around the peak number of concurrent
requests that you expect to be serving. If your app is running out of
worker threads (because it's unable to complete requests as fast as they
are coming in, for example), you could very well see a performance issue
like that as requests pile up in the queue. More tips on making grizzly
perform better can be found on Jean-Francois's blog at
http://weblogs.java.net/blog/jfarcand/archive/2007/03/configuring_gri_2....

glassfish@javadesktop.org wrote:
> Thanks for the quick reply. The situation is a bit complicated but if you have time to review it I would appreciate it.
>
> Environment--
> Sun Java System Application Server 9.1_02
> 2 physical machines
> 4 Applications deployed to 4 different clusters. One is heavily used, one is moderately used, and the other two are minimally used.
> 2 Sun Web Server 6 for load balancing. (Soon to be removed from environment.)
> https passthrough enabled.
> F5 big-ip in Front of Web Servers, for load balancing to the Web Servers. (Will load balance the applications directly soon.)
>
> The heavily used application will usually slows down in the mornings, M-F, until it becomes unresponsive, usually there is just a blank white page and the browser keeps working on the requests. (On the weekends the application usage is cut in half and we do not see the problem.) The other applications don't appear to be affected. If I go to the application directly (through a proxy the fronts our DMZ) on the http listener then there is no problem with the application. Unfortunately our proxy didn't allow https on the non-standard ports the applications were using. I have since changed this so I can check if the https listener is having problems when users say the application is slow. The logs don't show anything that would point to a root cause from what I can see. When I shutdown the cluster instances there are errors but the instances do shutdown. I don't know if that is normal as I have not watched the logs while shutting them down in the past.
>
> Initially I thought it was a problem with the web server. I was able to increase the performance significantly by changing these settings on the web server--
> upped the Acceptor Threads from 1 to 4.
> increased the Max Queue length 8182 (I don't think that mattered now as the peak never reached very high.)
> increased the RqThrottle to 512.
>
> But improving the performance on the web server didn't fix the problem. There were some application code changes which didn't help either. The last change I made was changing the http "engine" from grizzly to coyote. This was done last Friday. There have not been any confirmed problems since then. (This morning there was one site that said the application was slow again but I wasn't able to duplicate the problem. When asked for more details they said the performance issues went away.)
>
> So hopefully by changing the http listener or engine (what is the proper term?) to coyote fixed the problem. But since there have been no errors in the logs that would say were the problem is I am still looking into other possible explanations as to why this is/was happening.
>
> That is why I am thinking it is a Memory Leak in the App Server. The other possibility I am thinking of is a JDBC connection leak. I am looking into enabling monitoring it since it doesn't appear to be enabled by default.
>
> Anyway, if you have read until here then thank you for your time.
> [Message sent by forum member 'jfaldmo' (jfaldmo)]
>
> http://forums.java.net/jive/thread.jspa?messageID=324113
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

Ryan de Laplante

FYI I recently had to diagnose a memory leak in our application running
on SJSAS 9.1 UR2. It took two months of examining heap dumps but I
finally found the cause. There was a case where it was possible to
cause a JPA named query to execute with empty string parameters, which
caused it to return all results in the DB. The DB is many gigabytes so
JPA never finished eagerly loading the entities before we got OOME.

I also found that TopLink Essentials doesn't have cache expiry. Your
options are:

1) cache every single entity you've ever loaded forever
2) disable caching completely
3) selectively enable/disable cache per entity.

We had tons of entities cached in memory that were only needed once, so
should not have been cached.

Ryan

Jacob Kessler wrote:
> The image that you attached doesn't look like you have a memory leak
> at all: each of your collections is bringing you from ~700MB to
> ~250MB, and you're doing that less than once an hour. A memory leak
> would show up as the average heap used after a garbage collection (the
> drops in average heap used) climbing over time until it reached the
> total heap size and your servers started spending all of their time
> collecting garbage (garbage collections every few seconds with no
> noticeable drop in heap size) and performance died.
>
> Your comment about the heavily used application slowing down while the
> others remain responsive would seem to point to an issue specific to
> that application, rather than a general cluster problem. Increasing
> the acceptor threads is probably a good thing, you might also look
> into the number of worker threads you are using, it should be in
> domain.xml as something like
>
> | > initial-thread-count="10"
> request-timeout-in-seconds="30" thread-count="130"
> thread-increment="10"/>|
>
> In general, thread-count should be around the peak number of
> concurrent requests that you expect to be serving. If your app is
> running out of worker threads (because it's unable to complete
> requests as fast as they are coming in, for example), you could very
> well see a performance issue like that as requests pile up in the
> queue. More tips on making grizzly perform better can be found on
> Jean-Francois's blog at
> http://weblogs.java.net/blog/jfarcand/archive/2007/03/configuring_gri_2....
>
>
> glassfish@javadesktop.org wrote:
>> Thanks for the quick reply. The situation is a bit complicated but if
>> you have time to review it I would appreciate it.
>>
>> Environment--
>> Sun Java System Application Server 9.1_02
>> 2 physical machines
>> 4 Applications deployed to 4 different clusters. One is heavily used,
>> one is moderately used, and the other two are minimally used.
>> 2 Sun Web Server 6 for load balancing. (Soon to be removed from
>> environment.)
>> https passthrough enabled. F5 big-ip in Front of Web Servers, for
>> load balancing to the Web Servers. (Will load balance the
>> applications directly soon.)
>>
>> The heavily used application will usually slows down in the mornings,
>> M-F, until it becomes unresponsive, usually there is just a blank
>> white page and the browser keeps working on the requests. (On the
>> weekends the application usage is cut in half and we do not see the
>> problem.) The other applications don't appear to be affected. If I go
>> to the application directly (through a proxy the fronts our DMZ) on
>> the http listener then there is no problem with the application.
>> Unfortunately our proxy didn't allow https on the non-standard ports
>> the applications were using. I have since changed this so I can check
>> if the https listener is having problems when users say the
>> application is slow. The logs don't show anything that would point to
>> a root cause from what I can see. When I shutdown the cluster
>> instances there are errors but the instances do shutdown. I don't
>> know if that is normal as I have not watched the logs while shutting
>> them down in the past.
>> Initially I thought it was a problem with the web server. I was able
>> to increase the performance significantly by changing these settings
>> on the web server--
>> upped the Acceptor Threads from 1 to 4. increased the Max Queue
>> length 8182 (I don't think that mattered now as the peak never
>> reached very high.)
>> increased the RqThrottle to 512.
>>
>> But improving the performance on the web server didn't fix the
>> problem. There were some application code changes which didn't help
>> either. The last change I made was changing the http "engine" from
>> grizzly to coyote. This was done last Friday. There have not been any
>> confirmed problems since then. (This morning there was one site that
>> said the application was slow again but I wasn't able to duplicate
>> the problem. When asked for more details they said the performance
>> issues went away.)
>> So hopefully by changing the http listener or engine (what is the
>> proper term?) to coyote fixed the problem. But since there have been
>> no errors in the logs that would say were the problem is I am still
>> looking into other possible explanations as to why this is/was
>> happening.
>> That is why I am thinking it is a Memory Leak in the App Server. The
>> other possibility I am thinking of is a JDBC connection leak. I am
>> looking into enabling monitoring it since it doesn't appear to be
>> enabled by default.
>> Anyway, if you have read until here then thank you for your time.
>> [Message sent by forum member 'jfaldmo' (jfaldmo)]
>>
>> http://forums.java.net/jive/thread.jspa?messageID=324113
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
>> For additional commands, e-mail: users-help@glassfish.dev.java.net
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net