Skip to main content

Too many open files with jxta 2.4.1 on nfs. Any solution?

20 replies [Last post]
lutey
Offline
Joined: 2007-08-30
Points: 0

Hello.

I am developing a distributed computing network using jxta.
The directory that the application and jxta is located in is mounted via nfs.

The problem is that in the .jxta/cm/uuid-SOMETHING/srdi/.nfsXXXX files keep building up, and are never removed, leading to a "too many open files" exception after the ulimit of open files (1024 in my case) is reached.

According to an nfs faq (http://nfs.sourceforge.net/ ;see 'D2. What is a "silly rename"? Why do these .nfsXXXXX files keep showing up?'):
"Unix applications often open a scratch file and then unlink it. They do this so that the file is not visible in the file system name space to any other applications, and so that the system will automatically clean up (delete) the file when the application exits. This is known as "delete on last close", and is a tradition among Unix applications.
Because of the design of the NFS protocol, there is no way for a file to be deleted from the name space but still remain in use by an application. Thus NFS clients have to emulate this using what already exists in the protocol. If an open file is unlinked, an NFS client renames it to a special name that looks like ".nfsXXXXX". This "hides" the file while it remains in use. This is known as a "silly rename." Note that NFS servers have nothing to do with this behavior."

My question is now:
does anyone know of where in the jxta src such file operations would take place, or if it's know that jxta does maybe not close filestreams properly?
Where would changes of that behaviour have to be made
Or might there be another solution to that problem (because increasing ulimit would just lead to a later "death")?

Thanx very much in advance.
Andreas

Message was edited by: lutey

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
adamman71
Offline
Joined: 2007-01-31
Points: 0

I have opened an enhancement for this issue: https://jxta-jxse.dev.java.net/issues/show_bug.cgi?id=137.

turbogeek
Offline
Joined: 2003-06-10
Points: 0

Great analysis! I think that we need to add the fix for removing peer groups too, but it would improve things quite a bit. I think it should let you pick any database via configuration. Peers generally don't need a lot of database while RDV need to be more efficient.

I agree about cleaning up the objects over time. You need to be sure that any short-lived data is both removed from the database and related memory. It could be an issue with some of the embedded databases.

The next question, do you have time to fix it? I'm a little swamped, but that has been true for about five years :o)

Had not heard of Prevayler. Thanks for the tip!

boylejohnr
Offline
Joined: 2008-10-27
Points: 0

I will probably fix this in next three weeks. However, not sure how to contribute back, and my concern is that I only likely have time to test for my purposes, although being a relatively straight forward refactor a unit test should cover most... Currently project working on has a decision point, the original design was to use peergroups for better isolation and security. Unfortunately there are other significant overheads with peergroups, namely threads. So need to look at this as a complete picture before proceeding. Expect that will fix in the next three weeks based a decision in a couple of weeks on architecture.

Will post here anything that I implement, trying to workout how can extend current impl without core change, but CM as final class is a pain.

For me JXTA JXSE needs some love to become a lightweight embeddable library to become truely powerful for our purposes, and will be giving this some focus in general.

turbogeek
Offline
Joined: 2003-06-10
Points: 0

Normally the process is to create an issue and attach diff or files to the issue. These are approved by at least one commiter and if there are vetos from others that look at it, we put it on the trunk branch for the release.

Of course, there are plenty of us that would be happy to have the files posted here :o) It is a great way for a person interested in this thread to try the files. There are plenty of committers that would do the above work.

Of course, as long as you follow the rules, one of the project owners can give you commit rights.

I'd be happy to test too as soon as you have something.

boylejohnr
Offline
Joined: 2008-10-27
Points: 0

Simple enough. Guess will be getting on board for 2.6 as well, want to make this a leaner meaner product. I need to scale to 100,000's peer groups in one process. Or think of something else to use.

As for testing, appreciate the support. Thanks.

As mentioned, in next few weeks will tackle.

exocetrick
Offline
Joined: 2003-08-11
Points: 0

In order to address some of these deployment issues the cache manager in jxta-c is configurable. The jxta-c implementation can be configured to use no files or tuned by using multiple files. It's dependent on the application/deployment.

The current jxta-c implementation uses an embedded SQLite database(s). A SQLite database creates one file. By default, a separate database is created for each group and this creates one file for every group. However, it can be configued to share one database amongst all groups so the JXTA peer would use only one file regardless of the number of groups.

There is also the abilitity to create separate databases (AddressSpaces) to store certain advertisement types (NameSpaces). Since not all NameSpaces are created equal this gives the application/deployment more control over performance and reliability.

One advantage is a database can be configured as ephemeral or persistent. An ephemeral database can use an in-memory database that SQLite provides for high performance. All NameSpaces that don't require persistence would be directed to this database creating a high performance/low file usage deployment. Persistent resources would be directed to a database that can also be tuned for reliability. For example, an application may choose to store all JXTA NameSpaces (jxta:PA, jxta:RA, etc.) in an ephemeral space and all of its custom advertisements in a persistent space or it can store some JXTA NameSpaces in an ephemeral space (jxta:RA) and some in a persistent space (jxta:PipeAdvertisement).

The configurable options are easily modified using the PlatformConfig.

hamada
Offline
Joined: 2003-06-12
Points: 0

Avoid running jxta homed in an nfs mounted directory. JXTA caches advertisements and indexes onto disk, doing so on an nfs mounted directory which is frequently accessed will affect the running node.

turbogeek
Offline
Joined: 2003-06-10
Points: 0

Part of the problem is just the BTree database. I'd call this a fairly ancient problem. As you can see, BTree is file handle heavy. Another implementation, like an in-memory database like JDB could be used. I'd prefer something even lighter.

This might be elevated if you change the database you use. Supposedly the interface to the database is isolated and this can be done, but I have never tried it.

You might ask why? This is because there is the belief that persisting this information between runs will save you on discovering information about the network. However this isn't always true, depending on the use of the network and its stability.

There are a few issues I have encountered that make this worse. For example, the peer ID of a new group ends up being new each time I create an instance in the group. Because the peer info is part of these files, the information is worthless.

Part of the problem is that the cache is not a true cache. It is devilishly hard to control. Just take pipe advertisements. You end up with a lot of them if you are using the random pipe ID method. Pipe advertisements become stale almost immediately. But the lifetime causes them to stick around. You don't have much recourse other than to ensure all pipe adverts have a short lifetime. Well-known pipes don't have the same issue, but a pipe is still effectively stale when not in use.

A secondary issue is just the nature of the system. You might end up with the same peer ID, but your info is always re-written because the peer info changes because your IP changes.

It is my belief (untested), that you could do away with the persistence of advertisements for most applications. This is untested because I have not created a non-persistent version of Jxta to compare it to. However, by observation there is no human noticeable lag between starting up a peer with an existing database verses starting one that has the cache emptied.

The other issue you are encountering is simple reality of how the database was designed. On the one hand it is efficient for one peer group, on the other it is very silly for large numbers of user groups. This is because a new group creates a new set of tables. I get a minimum of 18 files per group. A redesign could greatly increase the efficiency by moving to a simpler static database design where there is a table of peer groups rather than tables per peer group.

One last issue. The current design can possibly cause issues with too much of your time accessing the disk. Lots of new advertisements and working with groups can affect performance because of file access. Some of this has been eleviated over time, but your patterns of use or hardware can cause issues. Btree is efficient, but it is still a file-based database.

boylejohnr
Offline
Joined: 2008-10-27
Points: 0

I have taken a more detailed look. And provisionally I think this is a simple thing to fix. It would at first glance look like everything going through CM object. This uses BTreeFiler. There is a MemFiler based on hash maps already there, plus there is the Filer interface. It would appear that Cm uses BTreeFiler methods. So would think the path to happiness is to create an interface for BTreeFiler and use MemFiler underneath. With the BTreeFiler Interface in place we can then use a factory to choose existing, in memory or third party like oracle Berkeley DB if you want kicking performance or other third party or use to map into JDB (Seems heavy weight) or JPA Spec and pick our vendor. Or the cheap and cheerful Prevayler, which does way is says on the tin, but would need to aggressively manage memory because judging by the number and total size of files Java memory would be come an issue.

adamman71
Offline
Joined: 2007-01-31
Points: 0

I think this is an interesting idea.

There is an incident (https://jxta-jxse.dev.java.net/issues/show_bug.cgi?id=1) mentioning that:

"Once a peer group has been instantiated the cache metadata that gets created never expires. Specifically there is nothing to remove old CM directories for unused groups."

J.

hamada
Offline
Joined: 2003-06-12
Points: 0

In JXTA we don't create scratch files. while the application running, inspect the process' list of open files and see if that shed more light on the issue.

hamada
Offline
Joined: 2003-06-12
Points: 0

Also, try doing so by running the SrdiIndex unit test.

lutey
Offline
Joined: 2007-08-30
Points: 0

Hello.

The SrdiIndex unit test completes without any failure.

But: I just discovered that the same thing also happens on a non nfs filesystem:
The number of opened files grows till reaching the ulimit.
Interestingly, most of the files are marked "deleted", but still the pointer to the files are held.

The attachments show the output of ls -l /proc/pid/fd for jxta on non nfs and on nfs partitions.
Might that rather be a system problem then one of jxta?

Any suggestions or help would be great.
Andreas

hamada
Offline
Joined: 2003-06-12
Points: 0

This maybe a problem specific to the unit test, I'll check.

lutey
Offline
Joined: 2007-08-30
Points: 0

Sorry if I didn't point it out clear:
It did not happen during the unit test, but during "normal" use of jxta.

hamada
Offline
Joined: 2003-06-12
Points: 0

ok, I will look into this further. I would appreciate it if logged an issue on this.

Thanks

enygma2002
Offline
Joined: 2008-12-22
Points: 0

I am experiencing the "Too many open files" while running unit tests for the project I am working on running on a Linux system (Fedora and others too).

Rising the value of ulimit is not a good solution because it implies root access and it that makes the build process flawed.

Is there any progress on this issue or are there pointers to reducing the number of opened files jxta uses? (a cleanup method perhaps?)

I am running on jxta 2.5 right now.

boylejohnr
Offline
Joined: 2008-10-27
Points: 0

Also having this problem on 2.5 now, Seems a like 10 open files per peer group, coupled with the pipes we are opening. This is resulting in an significant inability to scale our application.

I there anyway to tune the number of files the cache keeps open?

I think longer should move to something like berkely db, but this has obvious license restrictions.

bondolo
Offline
Joined: 2003-06-11
Points: 0

There have been some very recent improvements to srdi behaviour in the JXSE nightly builds but I am rather uncertain that they would correct your problem.

The "silly rename" feature is indeed used by Java applications whenever a temporary file is created with File.createTempFile() or a file is set to delete upon close via File.deleteOnExit(). However, I can't find any usage of either of these APIs in the JXSE source.

Do the files disappear when you close JXTA?

Does JXTA continue to create the files while it runs or a few small number of files created with each run? (The latter is what I would expect).

If you delete the files manually while JXTA is not running how quickly does JXTA recreate the files?

The relevant source packages if you wish to investigate further :

net.jxta.impl.cm.CM.java
net.jxta.impl.xindice.filer

Mike

lutey
Offline
Joined: 2007-08-30
Points: 0

The files are deleted when jxta is shutdown (this is according to the lazy rename practice of nfs to delete them on exit of the application).
That also means jxta continually creates them while running.

The files are not created in regular intervals;
for about 1000 of the .nfsXXXX files to be created (+ some other open files) to reach the ulimit of 1024 open files jxta ran about 5 and a half hours.

Thank you for the hints to the files,
the CM.java I already looked at but didn't notice anything,
I'll have to look at the other one.

Thanx.
Andreas