Skip to main content

Glassfish 3.1.2.2 randomly stops logging to access logs - GRIZZLY0023?

6 replies [Last post]
arosso
Offline
Joined: 2010-02-19
Points: 0

Hi,

We are running into an issues with Glassfish 3.1.2.2 where it seems to randomly stop logging to the access logs. We have enabled access logging as follows:

<http-service access-logging-enabled="true">
        <access-log buffer-size-bytes="0" write-interval-seconds="0" format="%client.name% %datetime% %request% %status% %response.length% %header.referer% %header.user-agent% %header.x-forwarded-for% %header.host%" rotation-interval-in-minutes="60"></access-log>
        <virtual-server id="server" log-file="/apps/weblogs/www/server.log" default-web-module="docroot" access-log="/apps/weblogs/www/access" network-listeners="http-listener-1,http-listener-2">
       </virtual-server>
</http-service>

The issue seems to happen randomly. We'll get full logs for weeks and then 1 day it'll stop logging at some point during the day. Seems like the only way to get it to start again is to restart glassfish. We have a farm of glassfish servers and this happens randomly across them.

We cannot correlate when this happens with any events. There doesn't seem to be errors in any log files. The only thought is that it might be due to load or some threading issue. We currently get around 100 req/s on a glassfish server all requests mostly hitting simple Java Web Apps (not EJBs, or EARs).

We run on JDK 7 on RHEL 6.

Anyone have any thoughts what could be causing this?

Thanks.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
arosso
Offline
Joined: 2010-02-19
Points: 0

I've found that this happens when the following gets logged in the server.log:

[#|2013-12-15T19:33:07.097+0000|WARNING|glassfish3.1.2|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=20;_ThreadName=Thread-2;|GRIZZLY0023: Interrupting idle Thread: http-thread-pool-80(12).|#]

Not all GRIZZLY0023 messages cause the access logs to stop but all cases recently of the access logs stopping have one of these interrupt messages right around the same time.

Is there any way to stop Grizzly from not interrupting idle threads?

oleksiys
Offline
Joined: 2006-01-25
Points: 0

You can set request-timeout-seconds = -1 on the listener, where you see this behavior. For example to set this value on http-listener-1 you can use a command like [1].

WBR,
Alexey.

[1] asadmin set
server-config.network-config.protocols.protocol.http-listener-1.http.request-timeout-seconds=-1

arosso
Offline
Joined: 2010-02-19
Points: 0

I've made a few changes to the config and am testing this.

I've first set the min-thread-size on the http thread pool to something much higher than the default of 2. I've set this to 200. From what I could tell, some sort of cleaner process was going through and periodically killing/interrupting threads that were not in use to try to get things back to 2. To try to stop this I've set the min to 200 which should be higher than we need at the moment.

After making the change, the issue was still happening with requests that were taking more than 15 minutes. I have bumped up the request-timeout-seconds to 3600s. I'm a bit afraid to completely disable the timeout behaviour as I do want to have something take care of 'stuck' requests if we run into any.

I'm testing these changes out and will see if they've improved the situation.

I would like to understand why the interrupting of 1 thread in the thread pool completely kills the logging. I have found that by running:

set configs.config.server-config.http-service.access-logging-enabled=false
set configs.config.server-config.http-service.access-logging-enabled=true

I can get the logging restarted if this situation happens but we still miss request when this happens.

Thanks.

bluehadlow
Offline
Joined: 2013-12-16
Points: 0

Hi Arosso

Have seen this problem a lot lately. Seems to be related to a bug in Grizzly, in turn related to Corba and HTTP modules.

Try this:
https://www3.wipo.int/confluence/display/wipoimd/3.3.2.1.-+GlassFish+3.1.2.2+(build+5)+patches

We are busy testing it at the moment so let me know if it works.

arosso
Offline
Joined: 2010-02-19
Points: 0

Hi bluehadlow,

The error description for this doesn't seem quite the same as what I'm seeing. Are you seeing the access logs also stopping when this happens? I'm not seeing a flood of these errors happen. Glassfish seems to work fine, it just stops logging when these thread interrupts happen every so often.

I'll take a closer look at the patch. Maybe it fixes this issue too.

Thanks.

bluehadlow
Offline
Joined: 2013-12-16
Points: 0

Hi Arosso

A few problems could cause this. We found it to be one of 3 problems:
1. Network related most often (seeing any broken pipes?),
2. Sometimes PoolingException: In-use connections equal max-pool-size and expired max-wait-time. We suspect this may also be caused by a network dip causing our connections to become invalid and run forever trying to create new ones.
3. Long running processes on the DB (I think we have fixed most of these by now).

In all of the above cases, the common timing seems to be a process that takes longer than 15 minutes (900 seconds). This seems to correlate to (asadmin dotted names):
1. server.thread-pools.thread-pool.http-thread-pool.idle-thread-timeout-seconds=900
2. server.network-config.protocols.protocol.http-listener-1.http.request-timeout-seconds=900 (changed this to -1)
3. server.network-config.protocols.protocol.admin-listener.http.request-timeout-seconds=900

We managed to replicate the problem by simply changing #3 to 10 seconds, and trying to open the update page from the admin console. It then produces the same error but on the admin thread pool. This error repeats constantly.

----

When our servers (4) are running, and one of the problems (1-3) occur, the first thing we notice is an exception matching the problem, then the

"GRIZZLY0023: Interrupting idle Thread: http-thread-pool-8080(37).|#]"

about 15 minutes later. We also notice that thread 37 will keep logging to the log file (it could be any thread number). If the problem (1-3) persists for any number of seconds, this will effect all our threads (and they all keep logging the same message). Every time this happens to a thread, it becomes non-responsive and will not output any data. The client machines calling us often report no data coming back when they call our web-service. If any threads made it through the problem, they may still respond, but very often, if all threads are locked, our server will not respond to any requests and eventually the log stops...

----

I see now that maybe you have another problem too, but i suspect the thread death may be effecting you if you are seeing this particular error.

Cheers,
Blue