Skip to main content

Hang up: Too many open files

21 replies [Last post]
hammoud
Offline
Joined: 2005-06-20

My application server, SJSAS 9.0 b48 with Apache in front, hang up sometime and nothing is available anymore.
After some time (10-30min) or after an restart everything works like before.

Did lot of discussion with admin of my ISP and he told me, that he increase the maximum open file size.
But problem still exist and is not reproducible (for me) - happens on different hours/days.

Here is depending the trace from logs:
[#|2007-08-23T15:29:33.456+0200|SEVERE|sun-appserver-pe9.0|org.apache.jk.common.HandlerRequest|_ThreadID=22;_ThreadName=TP-Processor16;_RequestID=6b9bb923-e0cb-408d-b3ac-2f086e5cfe42;|Error decoding request
java.io.IOException
at org.apache.jk.common.JkInputStream.receive(JkInputStream.java:252)
at org.apache.jk.common.HandlerRequest.decodeRequest(HandlerRequest.java:523)
at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:363)
at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:745)
at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:675)
at org.apache.jk.common.SocketConnection.runIt(ChannelSocket.java:868)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:653)
at java.lang.Thread.run(Thread.java:595)
|#]

[#|2007-08-23T15:29:33.457+0200|WARNING|sun-appserver-pe9.0|org.apache.jk.common.ChannelSocket|_ThreadID=22;_ThreadName=TP-Processor16;_RequestID=6b9bb923-e0cb-408d-b3ac-2f086e5cfe42;|processCallbacks status 2|#]

[#|2007-08-23T15:29:47.641+0200|WARNING|sun-appserver-pe9.0|org.apache.jk.common.ChannelSocket|_ThreadID=32;_ThreadName=TP-Processor4;_RequestID=356fbe7c-1553-40ad-829c-186a2cfab1a7;|Exception executing accept
java.io.IOException: Too many open files
at sun.nio.ch.IOUtil.initPipe(Native Method)
at sun.nio.ch.PollSelectorImpl.(PollSelectorImpl.java:40)
at sun.nio.ch.PollSelectorProvider.openSelector(PollSelectorProvider.java:18)
at com.sun.enterprise.server.ss.provider.ASSelectorProvider.openSelector(ASSelectorProvider.java:75)
at java.nio.channels.Selector.open(Selector.java:209)
at com.sun.enterprise.server.ss.provider.ASOutputStream.(ASOutputStream.java:60)
at com.sun.enterprise.server.ss.provider.ASClientSocketImpl.getOutputStream(ASClientSocketImpl.java:153)
at java.net.Socket$3.run(Socket.java:801)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.Socket.getOutputStream(Socket.java:798)
at org.apache.jk.common.ChannelSocket.accept(ChannelSocket.java:313)
at org.apache.jk.common.ChannelSocket.acceptConnections(ChannelSocket.java:638)
at org.apache.jk.common.SocketAcceptor.runIt(ChannelSocket.java:849)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:653)
at java.lang.Thread.run(Thread.java:595)
|#]

...
more of the "Too many open files" Exception

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
paulr5930
Offline
Joined: 2007-01-16

For me the problem happened during the day and was reasonably associated with an increase in activity.
By the way, I'm using glassfish v1 build b02-p01 (SJSAS 9.0_01) at the moment, although I expect to move to glassfish v2 after the final is released.

About the morning hours - is there associated activity in the Glassfish access logs or Apache access logs? Is Apache httpd on the same box or is there a firewall between Apache and Glassfish? Do you have access to Apache's mod_jk.log if we need to look there?

I guess the first thing I'm looking for is if there is an avalanche of requests coming in at that time.
-Paul

hammoud
Offline
Joined: 2005-06-20

> is there associated activity in the Glassfish access logs or Apache access logs?
ISP: incoming request is normal, no attacks or so
Statistic: between 5-6 is an lower request time (4000 requests per hour)
Dont know how to determine associations

> Is Apache httpd on the same box or is there a firewall between Apache and Glassfish?
yes there is a firewall.

> Do you have access to Apache's mod_jk.log if we need to look there?
i have attached some files

paulr5930
Offline
Joined: 2007-01-16

> Dont know how to determine associations
I just meant associated by time, a noticeable spike in accesses in a small time frame, like maybe for 10 minutes or so, possibly some automated systems updating them selves from your web site at a time they believed to be off hours.
>
> > Is Apache httpd on the same box or is there a
> firewall between Apache and Glassfish?
> yes there is a firewall.
>
> > Do you have access to Apache's mod_jk.log if we
> need to look there?
> i have attached some files

I was asking about this because I recently had to deal with an issue that was caused by firewall interaction with mod_jk. The error messages in mod_jk.log were similar to yours, but I don't believe the causes to be the same. I think your error messages from mod_jk are indeed caused by Glassfish no longer responding, where mine were caused by the firewall deciding that a connection was no longer being used. Not the same thing.

Here's a possible scenario that I think is worth trying to prove or disprove as being the root cause of your problem:
1. A cron job kicks off on your Glassfish box at nearly the same time every day. This cron job is very resource-intensive in some way, probably disk I/O. It might be a daily backup, or a file-indexing program, something that pretty much eats all of some resource that Glassfish needs in order to repond quickly. I think a disk backup being run by your ISP is a likely candidate here.
2. Glassfish slows down, can't respond to requests quickly enough to clear them. Requests stack up, clients start timing out. Files remain open in Glassfish until Glassfish can deal with the request - and I think some file handles are not released until finalizers are called during garbage collection (based on watching the rise and fall of open file counts.
3. Your open-file limits, still too low for demanding conditions, are exceeded and everything comes to a grinding screeching halt until Glassfish can be restarted.

Such a scenario would explain the timing, and I've seen backups drag a box down before.
I would suggest trying to find out what else besides Glassfish is running at that time, preferably by personal observation because I tend not to trust what I am told when it comes to troubleshooting - all too often it has been either wrong or misleading.
Look at the manpage for a command called "vmstat". I assume you are also familiar with "top". And there's good old "ps". These tools can help you see if a process is draining resources needed by Glassfish at the critical time.

-Paul

hammoud
Offline
Joined: 2005-06-20

Your possible scenario is really good!

I asked the ISP serveral times,if there are any schedules - but they denied this.
But indeed the do backups, so they must have some scheduls.

And now i also remember the webalizer statistic page shows an timestamp of 05:51.
Which is exactly in the problematic time range, perhaps this is reason for.

i will send this all immediatly to ISP
thanx

Jeanfrancois Arcand

Hi,

glassfish@javadesktop.org wrote:
> My application server, SJSAS 9.0 b48 with Apache in front, hang up sometime and nothing is available anymore.
> After some time (10-30min) or after an restart everything works like before.
>
> Did lot of discussion with admin of my ISP and he told me, that he increase the maximum open file size.
> But problem still exist and is not reproducible (for me) - happens on different hours/days.
>
> Here is depending the trace from logs:
> [#|2007-08-23T15:29:33.456+0200|SEVERE|sun-appserver-pe9.0|org.apache.jk.common.HandlerRequest|_ThreadID=22;_ThreadName=TP-Processor16;_RequestID=6b9bb923-e0cb-408d-b3ac-2f086e5cfe42;|Error decoding request
> java.io.IOException
> at org.apache.jk.common.JkInputStream.receive(JkInputStream.java:252)
> at org.apache.jk.common.HandlerRequest.decodeRequest(HandlerRequest.java:523)
> at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:363)
> at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:745)
> at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:675)
> at org.apache.jk.common.SocketConnection.runIt(ChannelSocket.java:868)
> at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:653)
> at java.lang.Thread.run(Thread.java:595)
> |#]

Hum...your ISP might have too many applications running on the same
machine (and that's probably why you can't reproduce the problem). Do
you know if the machine is shared with a lot of mod_jk instance?

I'm not aware of any leak with mod_jk. Which version are they using?

Thanks

-- Jeanfrancois

>
> [#|2007-08-23T15:29:33.457+0200|WARNING|sun-appserver-pe9.0|org.apache.jk.common.ChannelSocket|_ThreadID=22;_ThreadName=TP-Processor16;_RequestID=6b9bb923-e0cb-408d-b3ac-2f086e5cfe42;|processCallbacks status 2|#]
>
> [#|2007-08-23T15:29:47.641+0200|WARNING|sun-appserver-pe9.0|org.apache.jk.common.ChannelSocket|_ThreadID=32;_ThreadName=TP-Processor4;_RequestID=356fbe7c-1553-40ad-829c-186a2cfab1a7;|Exception executing accept
> java.io.IOException: Too many open files
> at sun.nio.ch.IOUtil.initPipe(Native Method)
> at sun.nio.ch.PollSelectorImpl.(PollSelectorImpl.java:40)
> at sun.nio.ch.PollSelectorProvider.openSelector(PollSelectorProvider.java:18)
> at com.sun.enterprise.server.ss.provider.ASSelectorProvider.openSelector(ASSelectorProvider.java:75)
> at java.nio.channels.Selector.open(Selector.java:209)
> at com.sun.enterprise.server.ss.provider.ASOutputStream.(ASOutputStream.java:60)
> at com.sun.enterprise.server.ss.provider.ASClientSocketImpl.getOutputStream(ASClientSocketImpl.java:153)
> at java.net.Socket$3.run(Socket.java:801)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.Socket.getOutputStream(Socket.java:798)
> at org.apache.jk.common.ChannelSocket.accept(ChannelSocket.java:313)
> at org.apache.jk.common.ChannelSocket.acceptConnections(ChannelSocket.java:638)
> at org.apache.jk.common.SocketAcceptor.runIt(ChannelSocket.java:849)
> at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:653)
> at java.lang.Thread.run(Thread.java:595)
> |#]
>
> ...
> more of the "Too many open files" Exception
> [Message sent by forum member 'hammoud' (hammoud)]
>
> http://forums.java.net/jive/thread.jspa?messageID=232310
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

hammoud
Offline
Joined: 2005-06-20

Hi Jeanfrancois,

>your ISP might have too many applications running on the same machine

No its an own managed server, i do not share this server with others.
Currenly running 5 applications on it, maximum request per hour:
15000 (incoming)
1000 (outgoing - send by my apps)

> I'm not aware of any leak with mod_jk.

Are you sure, that this exception is triggered from mod_jk?
If so, i could exclude that this error comes from one of my application or from glassfish.
So i could tell the ISP, that they should check their mod_jk settings/version.

> Which version are they using?

Not possible to see the mod_jk version, have to ask ISP admin if its neccessary.
Just know the use Apache 1.3.X

thanx

Jeanfrancois Arcand

Hi,

glassfish@javadesktop.org wrote:
> Hi Jeanfrancois,
>
>
>> your ISP might have too many applications running on the same machine
>
> No its an own managed server, i do not share this server with others.
> Currenly running 5 applications on it, maximum request per hour:
> 15000 (incoming)
> 1000 (outgoing - send by my apps)
>
>
>> I'm not aware of any leak with mod_jk.
>
> Are you sure, that this exception is triggered from mod_jk?

Not at all :-) Do you know if the 5 installed applications open file
descriptor (socket, database connection, etc.)?

> If so, i could exclude that this error comes from one of my application or from glassfish.
> So i could tell the ISP, that they should check their mod_jk settings/version.

Are you able to log on that machine? Doing a netstat -an | grep
mod_jk_port (8009)?

I don't suspect a problem with GlassFish as such exception is usualy
from grizzly, but grizzly is not used with mod_jk.

Thanks

-- Jeanfrancois

>
>
>> Which version are they using?
>
> Not possible to see the mod_jk version, have to ask ISP admin if its neccessary.
> Just know the use Apache 1.3.X
>
>
> thanx
> [Message sent by forum member 'hammoud' (hammoud)]
>
> http://forums.java.net/jive/thread.jspa?messageID=232376
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

hammoud
Offline
Joined: 2005-06-20

> Do you know if the 5 installed applications open file descriptor (socket, database connection, etc.)?

Every application has connection pool to mysql 5.0
The bigger one use dbcp with hardcoded creation of pool, the smaller once use mysql configured inside glassfish.

One application use htmlparser which does http connection to spider some other sites every minute.

This worked well over half an year.
Suddenly it starts that "Too many open files exception".
In most cases the monitoring of my database connections suddenly exploded 100%.
So ISP increase maximum connections to 100.

During the exception of today, which i logged in this thread, the mysql connections did not increase anymore, which was the reason i start asking in glassfish forum.

> If so, i could exclude that this error comes from one of my application or from glassfish.
> So i could tell the ISP, that they should check their mod_jk settings/version.

> Are you able to log on that machine? Doing a netstat -an | grep mod_jk_port (8009)?

Can login with putty.
But no permission for netstat.
I will ask my ISP admin for this (takes some time)
Would be good find the source of this exception somehow.

Binod

Are you in linux OS and using JDK 1.5?
Then JDK nio uses a lot of file descriptors. You can move to JDK 1.6 and
that would solve the problem.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

hammoud
Offline
Joined: 2005-06-20

Hi Binod,

yes your right with linux and 1.5.
Switching to 1.6 is not so easy - must check all apps and also if this is possible from ISP.
Want to do alltogether next month, when GlassFish V2 is finished.

Additional i think it would be good, if someone would start a central page perhaps in Wiki, that describes all places, which must be configured to enlarge maximum open files.
Mean everything from OS, Database, Apache+mod_jk, GlassFish.

thanx

hammoud
Offline
Joined: 2005-06-20

We find out an possible reason for one hang up.
The time ISP did update and restart Apache was synchron to the time of the error logs.
Perhaps GlassFish does not like that.

But the main problem still exist.
Currently every day between 5:30 to 6:30 CET GlassFish hang up with this exceptions.
ISP increase max prozesses to 22000 and mysql connections, but nothing did help.

Seams they also have no idea, what they can do more:(
Dont understand, that it is so difficult for the admins to just determine which service (java,mysql,glassfish,...) opens so much files.

Binod

If you cant move to use 1.6, try setting quickstartup to false.
com.sun.enterprise.server.ss.ASQuickStartup=false

This would reduce the number of NIO selectors used by glassfish and
hopefully
good enough to handle your spike that happen between 5.30 and 6.30...

1.6 doesnt suffer this as the default provider is epoll
http://blogs.sun.com/alanb/entry/epoll

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

hammoud
Offline
Joined: 2005-06-20

thanx for your tip.
Think i have to try this if we do not find the problem.

But would prefere first to find, what is the reason for this spike of open files between this special time range.
Its absolutly unlocigal for me, cause statistical traffic spike is between 19:00 and 20:00 and at this time no exceptions occur.

And also unclear why we see this messages only in GlassFish logs and applications which sends http request or get mail from pop3.
ISP did not find something on OS level.

hammoud
Offline
Joined: 2005-06-20

My ISP enabled JDK 1.6.0_01 now, but this did not solve the Problem.
This morning outage was between 05:47and 06:21 (CET) with many exceptions in GlassFish and application logs:

Application Server: Too many open files
Email-Scheduler: UnknownHostException: pop3.mydomain.de
Spider-Scheduler: UnknownHostException: www.somedomain.de

ISP Admin says:
1. UnknownHostException could occur because name could not be found when too many files open
2. The Linux-System does not have any log messages, which means that OS has enough resources available

Conclusion must be that the problem depends to GlassFish or Java.
There must be an limitation for open files inside GlassFish or Java.

paulr5930
Offline
Joined: 2007-01-16

I had this problem once a few months ago. The default limit on open files in Linux is rather low, 1024 I think it was.

Check your open files limit with
ulimit -Hn # shows hard limit
ulimit -Sn # shows soft limit

To fix the problem I added two lines to the /etc/security/limits.conf file.

hard nofile 32768
soft nofile 32768

Obviously change to the value that is appropriate for your installation.

I don't remember if I rebooted the box or not. I don't think so. I think starting a new shell afterwards was sufficient. Check your limits again with ulimit to know that your change was successful:

[paulr@pascrpdapp02a ~]$ ulimit -Sn
open files (-n) 32768

The number-of-open-files limit was increased to 32768 and I haven't had a problem since. After I increased the limit I looked for a leak but there wasn't one. Glassfish just uses a lot of open files.

paulr5930
Offline
Joined: 2007-01-16

I didn't read previous posts carefully enough. I see now that the ISP sysadmin says he increased the number of open files limit. But perhaps it wasn't increased enough, or perhaps it was done in a way that wasn't useful to glassfish. Or maybe he just did something wrong and his change had no effect.

It may or may not be helpful, but here's a little script I used to watch my open file count to get an idea of what was really happening.

#!/bin/bash

while true; do
pid=$(ps -A | awk '/java/ && !/awk/ { print $1; }')
ls /proc/$pid/fd | wc -l
sleep 5
done

hammoud
Offline
Joined: 2005-06-20

Hi Paul,

last two month i wrote plenty of mails to my ISP.
I often ask to increase maximum file descriptors and also put lots of example settings of other people which get this exceptions (ulimt, limits.con, ...).

Answer of ISP was, that they increased maximum processes of my system once again.
But if i check ulimit -a it always shows me 1024.
Do not now this OS and such things, so i thought this value depends to my currently opened ssh session and java/glassfish/... has different settings.
Or is this an global value?

Nevertheless i sent example settings once again (your settings).
And now they increased it to 2048, 32768 is to high in their opinion.
Now my SSH console shows me 2048 too.

Dont understand what ISP increased before and why they increased it now, even i wrote many emails about that settings in ulimit:(
Will watch it next day if this helps.

thanx
J. Hammoud

paulr5930
Offline
Joined: 2007-01-16

My understanding may be wrong in some way, but I think the limits settings work something like this:

ulimit is a command that is built into the shell. It can only have effect on the current shell process and child processes.

Also, ulimit can't go beyond what the system security settings allow, so if you tried to increase the number of open files beyond the default 1024 allowed by the security system a single running process will still only be allowed up to 1024 open files. And therefore the security system must be told that more open files are to be allowed. Typically in Linux the security system's limit settings are to be changed by editing /etc/security/limits.conf.

Limit settings in /etc/security/limits.conf may be applied on either a per-user basis or per-group basis or globally. In the example that I provided I was trying to apply the limit increase only to the user id that would benefit glassfish, not globally.

I'm concerned that a limit of 2048 open files is still not enough. Glassfish can really use a *lot* of open files when it's busy. The number is shockingly high. But you have to remember that this is a single process running a lot of threads, providing a lot of functionality, and generally being very busy. But it's all in one process, it's not in a parent process and a bunch of forked child processes like, say, a typical Apache httpd server on Linux.

I would suggest trying to get 8096 from your ISP. I admit that 32768 seems high, but it's an absolutely-never-worry-about-it-again setting, where as 8096 is a probably-don't-worry-about-it-again setting and 2048 is definitely-worry-about-it-again-when-the-server-is-busy setting.

I would also think about trying to get another ISP if that might be an option, because I get the feeling that your current ISP may coming up a bit short, both in terms of technical ability and in terms of customer responsiveness. Hopefully they'll get better, but if I were dealing with them, right now in my mind they would be "on probation".

Glassfish needs what it needs, and nobody's opinion can change that.

-Paul

hammoud
Offline
Joined: 2005-06-20

First thanx for the detailed explanation - understand this now little bit more.

But i am still wondering, why this only occurs in the morning hours.
When did this occur on your system?

> to get another ISP

I think also about that - but dont now how to check an new ISP before:)
Perhaps i find some, which setups only my configuration and does individual support to OS and my services with an dedicated admin.

Witold Szczerba

Hello there,
I wanted to join this thread, because I have same problem, but in my
case I can do everything with server (root privileges).

This is AMD box with 32bit Ubuntu 7.10 Server, Java 6, Glassfish v2
b45 right now.

At first, I had IOExceptions because of too many opened files. I
changed limit to 2048 and server could work longer, but limit was hit,
so I changed to 4096 (still didn't help), finally I changed to about
32000... well now it can work even several days, but strange thing
happens: the OpenFileDescriptorCount is growing...growing... and when
it is somewhere between 6000 and 8000 server starts acting strange,
for example, Java WebStart is launching our application (in ACC) and
after user logs in nothing happens, the main window never shows up,
like server was frozen. I have to restart it and everything is fine
again.

It looks like some leak, the OpenFileDescriptorCount chart (made by
jconsole) is showing that during day the number is growing (in the
night no one is using application and descriptor count is not growing,
but it never goes down. So, eventually it will hit any limit (but in
my case strange things happens around 6-8k).

I would fill an issue, but I have no idea how to reproduce it. Our
application is used by... I am not sure, but it can be something like
50 users simultaneously (this is a testing time, in few weeks the
number of desktop connected can be around 150). It happens only when
such a great number of users are using it. Each client application is
injecting about 60 stateless session beans (I don't know, maybe that
info is useful).

Regards,
Witold Szczerba

2007/9/12, glassfish@javadesktop.org :
> First thanx for the detailed explanation - understand this now little bit more.
>
> But i am still wondering, why this only occurs in the morning hours.
> When did this occur on your system?
>
>
> > to get another ISP
>
> I think also about that - but dont now how to check an new ISP before:)
> Perhaps i find some, which setups only my configuration and does individual support to OS and my services with an dedicated admin.
> [Message sent by forum member 'hammoud' (hammoud)]
>
> http://forums.java.net/jive/thread.jspa?messageID=235015
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

paulr5930
Offline
Joined: 2007-01-16

I agree, your case looks like a leak.

The first thing I would do if I were you would be to update to the most recent v2 build, which is 58g (release candidate 8) at the moment. A lot of bugs have been fixed since build 45; we can hope yours is one of them. It might not be, but at least those bugs fixed since build 45 will no longer be a source of concern.

-Paul