Skip to main content

Too many open files issue

19 replies [Last post]
jsl123
Offline
Joined: 2006-07-20
Points: 0

Hi, I've just upgraded to 3.1.2.2 from 3.1 and I'm now experiencing a
problem (especially with the admin console). I started getting a "too
many open files" exception on many operations. A quick look at the open
files shows over 4500 transaction extent files open. Restarting
glassfish doesn't clear these up, only physically deleting them when
glassfish is stopped.

Is this normal? It strikes me that these should be temporary files and
closed reasonably quickly. Any idea where to look to see what is causing
this? The server runs mainly a number of web apps which are based around
JSR using EJBs and CDI beans.

Thanks

John

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Lachezar Dobrev

Ohhh...
It seems I am not the only one...

Would you please state your OS, OS Version and JVM.

How often do you get this issue?
Did you manage to find a specific way to reproduce the problem?

In Linux one can use 'lsof -p ' to list the open files. I have
seen extensive 'anon_inode' and 'pipe' descriptors. Please check if
you're having the same issue, or a different one.

Alexey Stashok is helping me pin-point the issue, and it would
probably be very beneficent if we had your input on this.

2012/10/11 John Lister :
> Hi, I've just upgraded to 3.1.2.2 from 3.1 and I'm now experiencing a
> problem (especially with the admin console). I started getting a "too many
> open files" exception on many operations. A quick look at the open files
> shows over 4500 transaction extent files open. Restarting glassfish doesn't
> clear these up, only physically deleting them when glassfish is stopped.
>
> Is this normal? It strikes me that these should be temporary files and
> closed reasonably quickly. Any idea where to look to see what is causing
> this? The server runs mainly a number of web apps which are based around JSR
> using EJBs and CDI beans.
>
> Thanks
>
> John

julesbowden
Offline
Joined: 2010-12-30
Points: 0

I can confirm the behaviour. GF 3.1.1 build 12, RHEL 5.5 64bit & Java 1.6.0_21-b06

I have two domains running, neither is clustered. One has ~800 such files, the other around 1900.

Lachezar Dobrev

It is normal for an application server to have a large number of
open files. It is normal for different installations of the same
server to have different number of open files. Remember, that an open
file (or two, or three) is kept for every library file in every
application's WEB-INF/lib directory. Also every connection made to and
from the server is represented by an open file descriptor. That is not
a symptom of a problem.
It would be symptomatic if the number of open files continuously
rises without a visible reason for that.
I have one server with ~750 open files, but it has no applications
installed (just the administrative interface). The other server has
more applications and has ~1K open files.

Use the lsof tool to see what files are open.
They can be grouped in:
- JVM open files (/usr/lib/jvm/*, *.so, and some configuration)
- Glassfish files (~glassfish without the applications)
- Application files (~glassfish/domains/*/applications/*)
- TCP listener connections (*:* (LISTEN))
- TCP connected connections (*:*->*:* (ESTABLISHED))
- TCP connections for Database Access (similar to above)
- *other*

It takes a bit of grep-ing to divide the open files into groups, but
once done one can asses which is the dominating file descriptor sink,
and monitor it for rogues or leakage.

2012/10/17 :
> I can confirm the behaviour. GF 3.1.1 build 12, RHEL 5.5 64bit & Java
> 1.6.0_21-b06 I have two domains running, neither is clustered. One has ~800
> such files, the other around 1900.
>
> --
>
> [Message sent by forum member 'julesbowden']
>
> View Post: http://forums.java.net/node/891367
>
>

jsl123
Offline
Joined: 2006-07-20
Points: 0

Hi, I'm running the latest ubuntu (precise) 32bit on jdk 1.7u7. (linux 3.2.0-31 and 1.7.0_07-b10 jdk to be exact)

All was working fine on ubuntu oneiric and glassfish 3.1.1 (b12) (same jdk) until I upgraded everything this week. I don't know what circumstances caused it but will monitor the situation to see if it occurs again.

I can't see any open files as you describe, but the majority (4000+) files opened by glassfish at the time it started to die were of the following form (taken from lsof)

java 12241 gf 419u REG 8,1 65536 2999196 /srv/glassfishv3/glassfish/domains/domain1/logs/server/tx/extent.85S

everything else looked normal, usual jar files, loads of osgi stuff, etc Btw, there are a few anon_inode and pipe handles lying around but not many.

A couple of things struck me as odd about the extent files:
Firstly, the part after the dot was an alphanumeric sequence that bizarrely started at 854 whereas clearing the tx directory seems to reset to 0 (I cleaned the logs, etc before the upgrade in case that matters)
Secondly the files are repeated, each file is opened 7 times

Unfortunately the server was urgently needed so I took what information I thought was relevant and then deleted the tx directory before restarting.

if anything else is of interest let me know and I will try to obtain it if it happens again.

John

mvatkina
Offline
Joined: 2005-04-04
Points: 0

There shouldn't be that many tx log files. But it is safe to remove the tx directory when the server is down *and* there is no need to recover any resources upon restart.

-marina

jsl123
Offline
Joined: 2006-07-20
Points: 0

Restarting the server regularly isn't really practical unfortunately. I've left the server running overnight and I can see that the number of extent files has crept up (by only 4 files). Would you expect old files to be removed or recycled? Is there anyway to find out what their content is or what caused them to be created and therefore left hanging?

Thanks

mvatkina
Offline
Joined: 2005-04-04
Points: 0

The older files should be removed. Do you have automatic-recovery enabled? This setting starts a background thread for periodic recovery and extent removal.

jsl123
Offline
Joined: 2006-07-20
Points: 0

Automatic recovery wasn't enabled. I've enabled it now and will restart shortly to monitor if it improves things. At the minute we are getting a new extend file every 10min or so.
This doesn't explain why upgrading glassfish has caused this problem if as you say the transaction code hasn't changed?

Thanks

John

jsl123
Offline
Joined: 2006-07-20
Points: 0

I've enabled the automatic recovery and initially it looked to have worked, when I restarted glassfish after the change, 99% of the extent files were deleted - so at least I shouldn't have to remember to do this manually. However after running for 2 days, they have built up again:(
So either
- the background thread has died/stopped working
- the minimum number of extent files hasn't been reached
- there is a bug somewhere

Any ideas? I can send the config files or any other further information if required..

Thanks

John

mvatkina
Offline
Joined: 2005-04-04
Points: 0

How many parallel transactions are running at the same time?

There should be an INFO message in the server.log saying "Asynchronous thread for incomplete tx is enabled with interval ..." - what's the interval (it's in seconds)?

There is also a check on the size of incomplete transactions, but it suppose to be cleaned up at the end of each transaction, so it seems like a bug. Can you file it under 'jts' component with all details that you can share?

thanks,
-marina

mvatkina
Offline
Joined: 2005-04-04
Points: 0

Can you try to configure GF to store transaction logs in a database as a work around to your problem? See "To Store Transaction Logs in a Database" under http://docs.oracle.com/cd/E26576_01/doc.312/e24928/transactions.htm#beanq

thanks,
-marina

jsl123
Offline
Joined: 2006-07-20
Points: 0

Hi Marina, thanks for the suggestions, however it looks like the background thread is working, if a little slower than I originally expected. The transaction files are cleaned up about every 2-3 days, I've waited before replying to confirm this and I now believe this is done automatically rather than manually as the removal doesn't coincide with any restarts. I searched the logs and couldn't find a reference to the "asynchronous thread" entries so can't tell you the delay, though I would expect less than 2 days if it is in seconds?

I'm still confused as to why previous versions worked out of the box and why I need to turn on the background thread. For a comparison, our 3.1.2.2 box currently has about 100 open extent files, whereas another 3.1.2 box (which probably has equal or more activity) has usually less than 5 open files (1 currently)

Thanks

arjavdesai
Offline
Joined: 2011-10-17
Points: 0

As Marina suggested can you please file a bug in http://java.net/jira/browse/GLASSFISH against component JTS? Also, can you please provide us with domain.xml, we would like to reproduce it in local environment.

mvatkina
Offline
Joined: 2005-04-04
Points: 0

John,

When you are filing the bug, please attach your domain.xml (remove any private data). The INFO message will be in the 1st server.log around the time transaction service was being started after instance restart.

thanks,
-marina

smsiebe
Offline
Joined: 2009-02-19
Points: 0

We also had this problem, RHEL 6 x64.

We fixed it by telling linux to increase the maximum number of files
allowed to be open by 1) the system and 2) the glassfish user. Our server
has been cranking away for 2 days now without that nasty exception and
resulting 404 to our users, which was previously being thrown at a rate of
several per minute.

We found these links to be useful:

http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-f...
http://www.dedoimedo.com/computers/lsof.html

Hope this helps,

S

On Thu, Oct 25, 2012 at 9:12 AM, wrote:

> As Marina suggested can you please file a bug in
> http://java.net/jira/browse/**GLASSFISHagainst component JTS? Also, can you
> please provide us with domain.xml, we would like to reproduce it in local
> environment.
>
> --
>
> [Message sent by forum member 'arjavdesai']
>
> View Post: http://forums.java.net/node/**891367
>
>
>

baze985
Offline
Joined: 2009-05-07
Points: 0

Hi,

just to give a small feedback on this one:)
We expiriance the same problem on GF3.1.2.2 build5...

[#|2012-10-04T15:53:52.118+0200|WARNING|glassfish3.1.2|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=12;_ThreadName=Thread-3;|GRIZZLY0006: Exception accepting channel
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
at com.sun.grizzly.TCPSelectorHandler.acceptWithoutRegistration(TCPSelectorHandler.java:745)
at com.sun.enterprise.v3.services.impl.monitor.MonitorableSelectorHandler.acceptWithoutRegistration(MonitorableSelectorHandler.java:99)
at com.sun.grizzly.http.SelectorThreadHandler.onAcceptInterest(SelectorThreadHandler.java:99)
at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKey(SelectorHandlerRunner.java:301)
at com.sun.grizzly.SelectorHandlerRunner.handleSelectedKeys(SelectorHandlerRunner.java:263)
at com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:200)
at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:132)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
|#]

Any attempt of the application to open socket ended with this kind of error,
because happened during night and Im not a linux guru:) I didnt know how to clean the tx files,
so I tried restarting the machine and starting gf again but no luck, then comes the fun part I stop again glasfish and start gf 3.0.1 and this one worked without any problem...

Blaze

mvatkina
Offline
Joined: 2005-04-04
Points: 0

That's close to impossible. Extents handling hadn't been touched for ages! :(

-marina

cetina
Offline
Joined: 2008-01-30
Points: 0

Hi to all, I remember this issue time ago, I don't know if is the same,
when I reported this issue the problem was in the log, I remember that I
enabled an option something like "write the console info in a log file" (I
don't remember exactly) and that was the problem because glassfish was open
a new file every time and never close the opened file. The option what I
mention wasn't enable by default, I reported this bug like 2 years ago. My
solution in that moments was disabled the checked option.

What I can tell you that the problem was when glassfish write content on
the file, glassfish never close the file.

Sorry my english.
El 12/10/2012 04:27, escribió:

> Hi, just to give a small feedback on this one:) We expiriance the same
> problem on GF3.1.2.2 build5...
> [#|2012-10-04T15:53:52.118+**0200|WARNING|glassfish3.1.2|**
> com.sun.grizzly.config.**GrizzlyServiceListener|_**
> ThreadID=12;_ThreadName=**Thread-3;|GRIZZLY0006:
> Exception accepting channel java.io.IOException: Too many open files at
> sun.nio.ch.**ServerSocketChannelImpl.**accept0(Native Method) at
> sun.nio.ch.**ServerSocketChannelImpl.**accept(**
> ServerSocketChannelImpl.java:**145)
> at
> com.sun.grizzly.**TCPSelectorHandler.**acceptWithoutRegistration(**
> TCPSelectorHandler.java:745)
> at
> com.sun.enterprise.v3.**services.impl.monitor.**
> MonitorableSelectorHandler.**acceptWithoutRegistration(**
> MonitorableSelectorHandler.**java:99)
> at
> com.sun.grizzly.http.**SelectorThreadHandler.**onAcceptInterest(**
> SelectorThreadHandler.java:99)
> at
> com.sun.grizzly.**SelectorHandlerRunner.**handleSelectedKey(**
> SelectorHandlerRunner.java:**301)
> at
> com.sun.grizzly.**SelectorHandlerRunner.**handleSelectedKeys(**
> SelectorHandlerRunner.java:**263)
> at
> com.sun.grizzly.**SelectorHandlerRunner.**doSelect(**
> SelectorHandlerRunner.java:**200)
> at com.sun.grizzly.**SelectorHandlerRunner.run(**
> SelectorHandlerRunner.java:**132)
> at
> java.util.concurrent.**ThreadPoolExecutor$Worker.**
> runTask(ThreadPoolExecutor.**java:886)
> at
> java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
> ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.**java:619) |#] Any attempt of the
> application
> to open file ended with this kind of error, because happened during night
> and
> Im not a linux guru:) I didnt know how to clean the tx files, so I tried
> restarting the machine and starting gf again but no luck, then comes the
> fun
> part I stop again glasfish and start gf 3.0.1 and this one worked without
> any
> problem... Blaze
>
> --
>
> [Message sent by forum member 'baze985']
>
> View Post: http://forums.java.net/node/**891367
>
>
>

Roel_D

Please consider that this might not be a bug but an improvement.

The old GF version might not have been CAPABLE of storing this amount of files, and the new version does.

Search your code for errors or forgotten "file close" statements.

Kind regards,

The out-side

Op 12 okt. 2012 om 10:43 heeft forums@java.net het volgende geschreven:

> Hi, just to give a small feedback on this one:) We expiriance the same
> problem on GF3.1.2.2 build5...
> [#|2012-10-04T15:53:52.118+0200|WARNING|glassfish3.1.2|com.sun.grizzly.config.GrizzlyServiceListener|_ThreadID=12;_ThreadName=Thread-3;|GRIZZLY0006:
> Exception accepting channel java.io.IOException: Too many open files at
> sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
> at
> com.sun.grizzly.TCPSelectorHandler.acceptWithoutRegistration(TCPSelectorHandler.java:745)
> at
> com.sun.enterprise.v3.services.impl.monitor.MonitorableSelectorHandler.acceptWithoutRegistration(MonitorableSelectorHandler.java:99)
> at
> com.sun.grizzly.http.SelectorThreadHandler.onAcceptInterest(SelectorThreadHandler.java:99)
> at
> com.sun.grizzly.SelectorHandlerRunner.handleSelectedKey(SelectorHandlerRunner.java:301)
> at
> com.sun.grizzly.SelectorHandlerRunner.handleSelectedKeys(SelectorHandlerRunner.java:263)
> at
> com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:200)
> at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:132)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619) |#] Any attempt of the application
> to open file ended with this kind of error, because happened during night and
> Im not a linux guru:) I didnt know how to clean the tx files, so I tried
> restarting the machine and starting gf again but no luck, then comes the fun
> part I stop again glasfish and start gf 3.0.1 and this one worked without any
> problem... Blaze
>
> --
>
> [Message sent by forum member 'baze985']
>
> View Post: http://forums.java.net/node/891367
>
>