Skip to main content

Deployment experiences with JXTA 2.5?

3 replies [Last post]
franco_dalmolin
Offline
Joined: 2007-11-27
Points: 0

We have deployed our production system on JXTA 2.5 in a multi-rdv multi-relay configuration. We have been struggling for many weeks with it and have still not solved our deployment issues with the new 2.5 super peers. The core problem seems to relate to the new NIO based TCP transport implementation (a.k.a. "epoll bug") and its behavior on certain OS's, Linux kernels, and Java versions. Currently we run on CentOS kernel 2.6.18 with Java 6 update 3. We ran also tests on Fedora kernel 2.6.23 and it seemed fine in a test simulating 100 concurrent edge peers running all weekend.

Are there any other experiences out there? I would love to hear your stories on the topic, not as a development discussion (our tech lead is working on that in the other mailing lists), but specifically to share real-life experiences when deploying JXTA 2.5 for (preferably, but not only) production systems. We seem to have landed in a "bleeding edge" area and have reached high frustration levels.... Even if you are not (yet) on 2.5, but have deployed JXTA in commercial/production system, it would be great hearing from you and share learnings. Thanks!

Franco Dal Molin
http://www.collanos.com

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
marco1a
Offline
Joined: 2005-11-14
Points: 0
alexkaidalov
Offline
Joined: 2007-07-16
Points: 0

Hello,

Working with JXTA 2.5, deploying it on Linux, we found several problems running Rendezvous and Relay on on load. It's appeared that in a while peers cannot quickly connect to rendezvous, periodically it happens that connection is done in 2 minutes but simultaneously for severally peers. The strange happens - that when peer connect to rendezvous it sends just 2 messages first - welcome and second - lease request - on first we get response immediately on second in 2 minutes.
We did severall investigation sessions and what we
found , that selector that getting incoming events just blocked and could not be woken up.
That simply means that our second message just not processed on rendezvous during 2 minutes or more. We did memory dumping with kill -3 [pid] on Linux and it helped to see dump of memory of rendezvous.

So it's appeared that sun.nio.ch.FileDispatcher.preClose0(Native Method) in one thread blocks selector in select() for ~2 min. So it completely stops processing of all incoming tcp connections. Does anybody know what could be the problem of waiting in this native method call sun.nio.ch.FileDispatcher.preClose0.

Stack trace - Thread 1. blocks selector in thread 2.
1.

"BlockingMessenger self destruct timer"
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcher.preClose0(Native Method)
...
at java.util.TimerThread.run(Timer.java:462)

2.
"TCP Transport MessengerSelectorThread for net.jxta.impl.endpoint.tcp.TcpTransport@1ebfa498"
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.SocketChannelImpl.kill(SocketChannelImpl.java:713)
...
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
at net.jxta.impl.endpoint.tcp.TcpTransport$MessengerSelectorThread.run(TcpTransport.java:1047)

Aleksandr Kaydalov

Message was edited by: alexkaidalov

franco_dalmolin
Offline
Joined: 2007-11-27
Points: 0

Just found this: [i]Conventional & Interruptable IO - Selector is broken in Linux[/i] here: http://forum.java.sun.com/thread.jspa?threadID=5135128&messageID=9498436

Seems like a bad idea to deploy super peers on Linux!? Ouch....
Next we will try to deploy on Solaris and/or Windows and will report our findings.
Are Linux edge peers possibly also negatively affected by this?

Please continue to report about your JXTA 2.5 deployment experiences. Thanks.