Skip to main content

File Descriptors leaking

Please note these java.net forums are being decommissioned and use the new and improved forums at https://community.oracle.com/community/java.
2 replies [Last post]
Anonymous

Hello colleagues,

Recently we switched from Tomcat to Glassfish.
However I noticed, that at certain point (unknown as of yet) the
Glassfish server stops responding. I can't even stop it correctly
(asadmin stop-domain hangs!).

- Ubuntu Server - 12.04 (precise)
- Intel Xeon (x64 arch)
- java version "1.7.0_03"
- OpenJDK 64-Bit Server VM (build 22.0-b10, mixed mode)
- Glassfish 3.1.2 (no upgrades pending)

The server serves via a JK Connector with a façade Apache server using mod_jk.

The server runs only three applications (and the admin interface).
All applications use Spring Framework. One uses JPA to a PostgreSQL on
the local host, one uses an ObjectDB JPA, two use JDBC pool
connections to a remote Microsoft SQL Server.

The culprit seems to be some kind of File Descriptor leak.
Initially the server died within a day or two. I had to increase the
open files limit (s1024/h4096) to (s65536/h65536) thinking this may be
just because too many files need to be opened. However that just
postponed the server death to about one week uptime.

I was able to make some checks at the latest crash, since I was
awake in 3AM. What I found out was that there were an unbelievable
number of lost (unclosed) pipes:

> java 30142 glassfish 467r FIFO 0,8 0t0 4659245 pipe
> java 30142 glassfish 468w FIFO 0,8 0t0 4659245 pipe
> java 30142 glassfish 469u 0000 0,9 0 6821 anon_inode
> java 30142 glassfish 487r FIFO 0,8 0t0 4676297 pipe
> java 30142 glassfish 488w FIFO 0,8 0t0 4676297 pipe
> java 30142 glassfish 489u 0000 0,9 0 6821 anon_inode

The logs show a very long quiet period, just before the failure the
log shows a normal log line from the actual server working (one of the
applications).
Then the log rolls and starts rolling every second. The failures
start with (attached error_one.txt)

The only line that has been obfuscated is the one with .... in it.
The com.planetj... is a filter used to implement gzip compression
(input and output) since I could not find how to configure that in
Glassfish.
The org.springframework... is obviously the Spring Framework.

The log has an enormous amount (2835 for 19 seconds) of those
messages. The messages are logged from within the same thread (same
_ThreadID and _ThreadName), which leads me to believe all messages are
a result of the processing of a single request.
Afterwards the server begins dumping a lot of messages like
(attached error_two.txt).
The server is effectively blocked from that time on.

At that point lsof shows 64K open files from Glassfish, the enormous
majority being open popes (three descriptors each).

I am at a loss here... The server currently needs either a periodic
restart, or I need to 'kill' it when it blocks.

I've been digging for this error around the Internet, and the
closest I've seen has been due to not closing (leaking) Selectors.
Please advise!

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Lachezar Dobrev

Hello all...
I have received no responses on this problem.

I am still having this issue once or twice a week.

After a number of searches in the past weeks I've gained little in
terms of understanding what happens.

In my search I found out a defect report against Oracle's JVM, that
might be connected to the issue:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7118373
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=2223521

I also came up with a mention in some blog:

http://blog.fuseyism.com/index.php/category/openjdk/
(sorry, could not come up with a more legit source).

From the blog post I can see, that the mentioned defect is noted in
'Release 2.3.0 (2012-08-15)', which is a funny two days after my post.
May I have your comments? Does this sound like a OpenJDK defect? Is it
possible that it has been fixed in the meantime? From the looks on my
machine it seems it still uses OpenJDK 2.1.1 (openjdk-7-jdk
7~u3-2.1.1~pre1-1ubuntu3).

Please advise!

2012/8/29 Lachezar Dobrev :
> Hello colleagues,
>
> Recently we switched from Tomcat to Glassfish.
> However I noticed, that at certain point (unknown as of yet) the
> Glassfish server stops responding. I can't even stop it correctly
> (asadmin stop-domain hangs!).
>
> - Ubuntu Server - 12.04 (precise)
> - Intel Xeon (x64 arch)
> - java version "1.7.0_03"
> - OpenJDK 64-Bit Server VM (build 22.0-b10, mixed mode)
> - Glassfish 3.1.2 (no upgrades pending)
>
> The server serves via a JK Connector with a façade Apache server using mod_jk.
>
> The server runs only three applications (and the admin interface).
> All applications use Spring Framework. One uses JPA to a PostgreSQL on
> the local host, one uses an ObjectDB JPA, two use JDBC pool
> connections to a remote Microsoft SQL Server.
>
> The culprit seems to be some kind of File Descriptor leak.
> Initially the server died within a day or two. I had to increase the
> open files limit (s1024/h4096) to (s65536/h65536) thinking this may be
> just because too many files need to be opened. However that just
> postponed the server death to about one week uptime.
>
> I was able to make some checks at the latest crash, since I was
> awake in 3AM. What I found out was that there were an unbelievable
> number of lost (unclosed) pipes:
>
>> java 30142 glassfish 467r FIFO 0,8 0t0 4659245 pipe
>> java 30142 glassfish 468w FIFO 0,8 0t0 4659245 pipe
>> java 30142 glassfish 469u 0000 0,9 0 6821 anon_inode
>> java 30142 glassfish 487r FIFO 0,8 0t0 4676297 pipe
>> java 30142 glassfish 488w FIFO 0,8 0t0 4676297 pipe
>> java 30142 glassfish 489u 0000 0,9 0 6821 anon_inode
>
> The logs show a very long quiet period, just before the failure the
> log shows a normal log line from the actual server working (one of the
> applications).
> Then the log rolls and starts rolling every second. The failures
> start with (attached error_one.txt)
>
> The only line that has been obfuscated is the one with .... in it.
> The com.planetj... is a filter used to implement gzip compression
> (input and output) since I could not find how to configure that in
> Glassfish.
> The org.springframework... is obviously the Spring Framework.
>
> The log has an enormous amount (2835 for 19 seconds) of those
> messages. The messages are logged from within the same thread (same
> _ThreadID and _ThreadName), which leads me to believe all messages are
> a result of the processing of a single request.
> Afterwards the server begins dumping a lot of messages like
> (attached error_two.txt).
> The server is effectively blocked from that time on.
>
> At that point lsof shows 64K open files from Glassfish, the enormous
> majority being open popes (three descriptors each).
>
> I am at a loss here... The server currently needs either a periodic
> restart, or I need to 'kill' it when it blocks.
>
> I've been digging for this error around the Internet, and the
> closest I've seen has been due to not closing (leaking) Selectors.
> Please advise!

oleksiys
Offline
Joined: 2006-01-25

Hi,

can you pls. try GF version 3.1.2.2?
Also, if it's possible, pls. attach GF domain.xml.

Thanks.

WBR,
Alexey.

On 09/10/2012 10:45 AM, Lachezar Dobrev wrote:
> Hello all...
> I have received no responses on this problem.
>
> I am still having this issue once or twice a week.
>
> After a number of searches in the past weeks I've gained little in
> terms of understanding what happens.
>
> In my search I found out a defect report against Oracle's JVM, that
> might be connected to the issue:
>
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7118373
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=2223521
>
> I also came up with a mention in some blog:
>
> http://blog.fuseyism.com/index.php/category/openjdk/
> (sorry, could not come up with a more legit source).
>
> From the blog post I can see, that the mentioned defect is noted in
> 'Release 2.3.0 (2012-08-15)', which is a funny two days after my post.
> May I have your comments? Does this sound like a OpenJDK defect? Is it
> possible that it has been fixed in the meantime? From the looks on my
> machine it seems it still uses OpenJDK 2.1.1 (openjdk-7-jdk
> 7~u3-2.1.1~pre1-1ubuntu3).
>
> Please advise!
>
> 2012/8/29 Lachezar Dobrev :
>> Hello colleagues,
>>
>> Recently we switched from Tomcat to Glassfish.
>> However I noticed, that at certain point (unknown as of yet) the
>> Glassfish server stops responding. I can't even stop it correctly
>> (asadmin stop-domain hangs!).
>>
>> - Ubuntu Server - 12.04 (precise)
>> - Intel Xeon (x64 arch)
>> - java version "1.7.0_03"
>> - OpenJDK 64-Bit Server VM (build 22.0-b10, mixed mode)
>> - Glassfish 3.1.2 (no upgrades pending)
>>
>> The server serves via a JK Connector with a façade Apache server using mod_jk.
>>
>> The server runs only three applications (and the admin interface).
>> All applications use Spring Framework. One uses JPA to a PostgreSQL on
>> the local host, one uses an ObjectDB JPA, two use JDBC pool
>> connections to a remote Microsoft SQL Server.
>>
>> The culprit seems to be some kind of File Descriptor leak.
>> Initially the server died within a day or two. I had to increase the
>> open files limit (s1024/h4096) to (s65536/h65536) thinking this may be
>> just because too many files need to be opened. However that just
>> postponed the server death to about one week uptime.
>>
>> I was able to make some checks at the latest crash, since I was
>> awake in 3AM. What I found out was that there were an unbelievable
>> number of lost (unclosed) pipes:
>>
>>> java 30142 glassfish 467r FIFO 0,8 0t0 4659245 pipe
>>> java 30142 glassfish 468w FIFO 0,8 0t0 4659245 pipe
>>> java 30142 glassfish 469u 0000 0,9 0 6821 anon_inode
>>> java 30142 glassfish 487r FIFO 0,8 0t0 4676297 pipe
>>> java 30142 glassfish 488w FIFO 0,8 0t0 4676297 pipe
>>> java 30142 glassfish 489u 0000 0,9 0 6821 anon_inode
>> The logs show a very long quiet period, just before the failure the
>> log shows a normal log line from the actual server working (one of the
>> applications).
>> Then the log rolls and starts rolling every second. The failures
>> start with (attached error_one.txt)
>>
>> The only line that has been obfuscated is the one with .... in it.
>> The com.planetj... is a filter used to implement gzip compression
>> (input and output) since I could not find how to configure that in
>> Glassfish.
>> The org.springframework... is obviously the Spring Framework.
>>
>> The log has an enormous amount (2835 for 19 seconds) of those
>> messages. The messages are logged from within the same thread (same
>> _ThreadID and _ThreadName), which leads me to believe all messages are
>> a result of the processing of a single request.
>> Afterwards the server begins dumping a lot of messages like
>> (attached error_two.txt).
>> The server is effectively blocked from that time on.
>>
>> At that point lsof shows 64K open files from Glassfish, the enormous
>> majority being open popes (three descriptors each).
>>
>> I am at a loss here... The server currently needs either a periodic
>> restart, or I need to 'kill' it when it blocks.
>>
>> I've been digging for this error around the Internet, and the
>> closest I've seen has been due to not closing (leaking) Selectors.
>> Please advise!

Lachezar Dobrev

1. Can I «upgrade» to 3.1.2.2 using the update manager?
If not, where can I read about the procedure for upgrading a
production server?

2. domain.xml attached.
Please be advised, that sensitive information inside has been
obfuscated. Only values have been obfuscated. Structure has not been
altered.

2012/9/10 Oleksiy Stashok :
> Hi,
>
> can you pls. try GF version 3.1.2.2?
> Also, if it's possible, pls. attach GF domain.xml.
>
> Thanks.
>
> WBR,
> Alexey.
>
>
> On 09/10/2012 10:45 AM, Lachezar Dobrev wrote:
>>
>> Hello all...
>> I have received no responses on this problem.
>>
>> I am still having this issue once or twice a week.
>>
>> After a number of searches in the past weeks I've gained little in
>> terms of understanding what happens.
>>
>> In my search I found out a defect report against Oracle's JVM, that
>> might be connected to the issue:
>>
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7118373
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=2223521
>>
>> I also came up with a mention in some blog:
>>
>> http://blog.fuseyism.com/index.php/category/openjdk/
>> (sorry, could not come up with a more legit source).
>>
>> From the blog post I can see, that the mentioned defect is noted in
>> 'Release 2.3.0 (2012-08-15)', which is a funny two days after my post.
>> May I have your comments? Does this sound like a OpenJDK defect? Is it
>> possible that it has been fixed in the meantime? From the looks on my
>> machine it seems it still uses OpenJDK 2.1.1 (openjdk-7-jdk
>> 7~u3-2.1.1~pre1-1ubuntu3).
>>
>> Please advise!
>>
>> 2012/8/29 Lachezar Dobrev :
>>>
>>> Hello colleagues,
>>>
>>> Recently we switched from Tomcat to Glassfish.
>>> However I noticed, that at certain point (unknown as of yet) the
>>> Glassfish server stops responding. I can't even stop it correctly
>>> (asadmin stop-domain hangs!).
>>>
>>> - Ubuntu Server - 12.04 (precise)
>>> - Intel Xeon (x64 arch)
>>> - java version "1.7.0_03"
>>> - OpenJDK 64-Bit Server VM (build 22.0-b10, mixed mode)
>>> - Glassfish 3.1.2 (no upgrades pending)
>>>
>>> The server serves via a JK Connector with a façade Apache server using
>>> mod_jk.
>>>
>>> The server runs only three applications (and the admin interface).
>>> All applications use Spring Framework. One uses JPA to a PostgreSQL on
>>> the local host, one uses an ObjectDB JPA, two use JDBC pool
>>> connections to a remote Microsoft SQL Server.
>>>
>>> The culprit seems to be some kind of File Descriptor leak.
>>> Initially the server died within a day or two. I had to increase the
>>> open files limit (s1024/h4096) to (s65536/h65536) thinking this may be
>>> just because too many files need to be opened. However that just
>>> postponed the server death to about one week uptime.
>>>
>>> I was able to make some checks at the latest crash, since I was
>>> awake in 3AM. What I found out was that there were an unbelievable
>>> number of lost (unclosed) pipes:
>>>
>>>> java 30142 glassfish 467r FIFO 0,8 0t0 4659245 pipe
>>>> java 30142 glassfish 468w FIFO 0,8 0t0 4659245 pipe
>>>> java 30142 glassfish 469u 0000 0,9 0 6821 anon_inode
>>>> java 30142 glassfish 487r FIFO 0,8 0t0 4676297 pipe
>>>> java 30142 glassfish 488w FIFO 0,8 0t0 4676297 pipe
>>>> java 30142 glassfish 489u 0000 0,9 0 6821 anon_inode
>>>
>>> The logs show a very long quiet period, just before the failure the
>>> log shows a normal log line from the actual server working (one of the
>>> applications).
>>> Then the log rolls and starts rolling every second. The failures
>>> start with (attached error_one.txt)
>>>
>>> The only line that has been obfuscated is the one with .... in it.
>>> The com.planetj... is a filter used to implement gzip compression
>>> (input and output) since I could not find how to configure that in
>>> Glassfish.
>>> The org.springframework... is obviously the Spring Framework.
>>>
>>> The log has an enormous amount (2835 for 19 seconds) of those
>>> messages. The messages are logged from within the same thread (same
>>> _ThreadID and _ThreadName), which leads me to believe all messages are
>>> a result of the processing of a single request.
>>> Afterwards the server begins dumping a lot of messages like
>>> (attached error_two.txt).
>>> The server is effectively blocked from that time on.
>>>
>>> At that point lsof shows 64K open files from Glassfish, the enormous
>>> majority being open popes (three descriptors each).
>>>
>>> I am at a loss here... The server currently needs either a periodic
>>> restart, or I need to 'kill' it when it blocks.
>>>
>>> I've been digging for this error around the Internet, and the
>>> closest I've seen has been due to not closing (leaking) Selectors.
>>> Please advise!
>
>