Skip to main content

asadmin start-node-agent timesout after 15 mins

14 replies [Last post]
lunobili
Offline
Joined: 2008-10-14

Hi,
I am try to set up a cluster of two machines but when I try to start the node agent for the second machine (the one not running the DAS) the command hangs.

Please notice that I have the following situation:
1) Node agent on machine1 starts correctly
2) DAS on machine1 knows about node agent on machine2 (rendezvousOccurred=true). This, in my opinion, should prove that communication between boxes occurs correctly and that the node agent on machine2 was correctly created.
3) I did some packet sniffing in my network. When the node agent on machine2 tries to start up there is indeed some communication between the boxes on port 8686, but I could not read anything useful in the data of the packets.

This is an extract of the log for the node agent on machine2:

[#|2008-12-30T11:47:48.747+0900|INFO|sun-appserver9.1|javax.ee.enterprise.system.nodeagent|_ThreadID=10;_ThreadName=main;|NAGT0038:Executing Synchronization for node-agent With DAS|#]

[#|2008-12-30T12:02:56.595+0900|INFO|sun-appserver9.1|javax.ee.enterprise.system.tools.synchronization|_ThreadID=10;_ThreadName=main;|SYNC001: Unable to communicate with Domain Administration Server. Skipping synchronization
.|#]

[#|2008-12-30T12:02:56.599+0900|SEVERE|sun-appserver9.1|javax.ee.enterprise.system.nodeagent|_ThreadID=10;_ThreadName=main;|NAGT0035:The NodeAgent failed to complete the intial synchronization with the DAS. Please make sure
the DAS is running and is accessible from the NodeAgents server|#]

[#|2008-12-30T12:02:58.605+0900|WARNING|sun-appserver9.1|javax.ee.enterprise.system.nodeagent|_ThreadID=10;_ThreadName=main;|NAGT0013:Stopping Node Agent...|#]

Note that there is 15 minutes difference between the first line and the others. There is clearly a timeout somewhere.

Has anybody seen this behavior before? Is there any log at the DAS side I can look at?

Thanks in advance for your kind reply,
Luca Nobili

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
vicjalan
Offline
Joined: 2009-01-11

Hi all,

Has anyone figured this issue out? I'm having the same issue.

I'm using Glassfish 2UR2 in a 2 Server cluster.

Machine 1 has the DAS and it is on, additionally node-agent1 is also running on this server.

I have Machine 2 which I have installed Glassfish 2UR2 and have created the agent but the agent will not start. Looking at the log I get the same error as lunobili.

I'm using RHEL 5 for my OS 32 bit. Java version 6. I know that Machine 2 can see Machine one because it can ping it, the agent appeared on the web console on Machine 1 (just with a status off), and additionally I can telnet using port 8686. I have SELinux disabled as well as the Firewall.

Any ideas?

Thanks a great deal.

Victor.

Satyajit.Tripathi@Sun.COM

Dear Victor, Luca,

I do not have a Linux system with me right now, so unable to test the
use case. I should be getting a system soon to test this case on Linux.

One of the suspects here is the authentication between the entities, DAS
and the node-agent, while starting the node-agent. Can you reverify the
password you are providing for the admin and the master.

Default password for admin is 'adminadmin' and master is 'changeit'.

--------------------------------
asadmin> start-node-agent n1
Please enter the admin user name>admin
Please enter the admin password>
Please enter the master password [Enter to accept the default]:>
--------------------------------

The other thing to check would be if there are any hostnames, assigned
or taken by default from network, to those system. Also if those systems
are able to access each other using the hostname(s). You have confirmed
that there is no firewall between the two aforementioned entities.

Let me know your findings.

If you would need a quick reference, please see the Presentation :
*GlassFish-V2-Clustering-Simplified.pdf*

Thanks & regards
--Satya

On 01/12/09 01:29, glassfish@javadesktop.org wrote:
> Hi all,
>
> Has anyone figured this issue out? I'm having the same issue.
>
> I'm using Glassfish 2UR2 in a 2 Server cluster.
>
> Machine 1 has the DAS and it is on, additionally node-agent1 is also running on this server.
>
> I have Machine 2 which I have installed Glassfish 2UR2 and have created the agent but the agent will not start. Looking at the log I get the same error as lunobili.
>
> I'm using RHEL 5 for my OS 32 bit. Java version 6. I know that Machine 2 can see Machine one because it can ping it, the agent appeared on the web console on Machine 1 (just with a status off), and additionally I can telnet using port 8686. I have SELinux disabled as well as the Firewall.
>
> Any ideas?
>
> Thanks a great deal.
>
> Victor.
> [Message sent by forum member 'vicjalan' (vicjalan)]
>
> http://forums.java.net/jive/thread.jspa?messageID=325254
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>
>

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SATYAJIT TRIPATHI
ISV Engineering APAC

Sun Microsystems India Private Limited
Bangalore 560025
DID : +91 80 66937865
Mobile: +91 9886019892
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[att1.html]

vicjalan
Offline
Joined: 2009-01-11

Hi Satyajit,

Thank you for reply. The passwords are correct, if they weren't I'd get a different error, I did some experimentation with this to make sure.

I found the solution however, well I believe. I uninstalled the cluster and node agents from all the nodes and recreated them but in a different order. I created the cluster on the DAS server, then I created the node agent on the second server first and lastly I created the node agent on the DAS server. Each node agent was able to start up just fine.

Not sure why this made a diference but it did and I'm up and running.

Thanks for your help and suggestions once again.

lunobili
Offline
Joined: 2008-10-14

Ok, I finally found the problem!!!
Both machines need to resolve their own name with their ip address and NOT as 127.0.0.1

I had in my /etc/hosts file the following lines:

127.0.0.1 localhost
127.0.0.1 lnobili-kubuntu

This is BAD!!! What it should look like is:

127.0.0.1 localhost
x.x.x.x lnobili-kubuntu

where x.x.x.x is the IP address that any other machine would use to communicate with this host.

This is the link that pointed me in the right direction:

http://blogs.sun.com/technical/entry/troubleshooting_cluster_startup

Satya, thanks for trying to help me with this issue.

Luca

Satyajit.Tripathi@Sun.COM

Hi Luca,

I am glad you finally got the resolution.

cheers
--Satya

On 01/15/09 16:11, glassfish@javadesktop.org wrote:
> Ok, I finally found the problem!!!
> Both machines need to resolve their own name with their ip address and NOT as 127.0.0.1
>
> I had in my /etc/hosts file the following lines:
>
> 127.0.0.1 localhost
> 127.0.0.1 lnobili-kubuntu
>
> This is BAD!!! What it should look like is:
>
> 127.0.0.1 localhost
> x.x.x.x lnobili-kubuntu
>
> where x.x.x.x is the IP address that any other machine would use to communicate with this host.
>
> This is the link that pointed me in the right direction:
>
> http://blogs.sun.com/technical/entry/troubleshooting_cluster_startup
>
> Satya, thanks for trying to help me with this issue.
>
> Luca
> [Message sent by forum member 'lunobili' (lunobili)]
>
> http://forums.java.net/jive/thread.jspa?messageID=326169
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>
>

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SATYAJIT TRIPATHI
ISV Engineering APAC

Sun Microsystems India Private Limited
Bangalore 560025
DID : +91 80 66937865
Mobile: +91 9886019892
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

lunobili
Offline
Joined: 2008-10-14

Ok Satya,
let me retry from scratch and I will be documenting here all the steps I take while I am I doing them, so you can eventually correct me.
First of all, environment is changed a bit I will be using this:
Machine1:
OS: Kubuntu 8.04
java version "1.6.0_07"
Host name: lnobili-kubuntu

Machine2:
Virtualized inside Machine1 using Virtualbox 2.1
OS: kubuntu 8.04
java version "1.6.0_07"
Host name: kubuntu-vbox

Note: the two machines can ping themselves just fine using each other names (dns resolution is ok ...and version 2.1 of Virtualbox rocks!!!)

These are my steps:
1) Reinstall glassfish on both machines so we have a clean environment
2)run “ant -f setup-cluster.xml” on both machines
3)start DAS on machine1 with command “asadmin start-domain”. DAS started no problem see the message on the command line:
lnobili@lnobili-kubuntu:~/apps/glassfish/bin$ ./asadmin start-domain
Starting Domain domain1, please wait.
Log redirected to /home/lnobili/apps/glassfish/domains/domain1/logs/server.log.
Redirecting output to /home/lnobili/apps/glassfish/domains/domain1/logs/server.log
Domain domain1 started.
Domain [domain1] is running [Sun Java System Application Server 9.1_02 (build b04-fcs)] with its configuration and logs at: [/home/lnobili/apps/glassfish/domains].
Admin Console is available at [http://localhost:4848].
Use the same port [4848] for "asadmin" commands.
User web applications are available at these URLs:
[http://localhost:8080 https://localhost:8181 ].
Following web-contexts are available:
[/web1 /__wstx-services ].
Standard JMX Clients (like JConsole) can connect to JMXServiceURL:
[service:jmx:rmi:///jndi/rmi://lnobili-kubuntu:8686/jmxrmi] for domain management purposes.
Domain listens on at least following ports for connections:
[8080 8181 4848 3700 3820 3920 8686 ].
Domain supports application server clusters and other standalone instances.

4)Create node agent on machine2 with command “asadmin create-node-agent -H lnobili-kubuntu NodeAgent2” (on DAS machine I can see that NodeAgent2 is automagically created and rendezvous occurred. Notice that I do NOT see any other agent including the default node agent that you told me I should see)
5)Start NodeAgent2 on machnie2 with command “asadmin start-node-agent NodeAgent2” user “admin”, password “adminadmin”, master password “changeit”
6)AND IT JUST DOESN'T WORK!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Let's see the logs and do a bit of problem determination:

- Error message on machine 2:
[#|2009-01-15T14:29:14.409+0900|CONFIG|sun-appserver9.1|javax.ee.enterprise.system.nodeagent|_ThreadID=10;_ThreadName=main;|DAS url = service:jmx:rmi:///jndi/rmi://lnobili-kubuntu:8686/management/rmi-jmx-connector|#]

[#|2009-01-15T14:29:14.482+0900|INFO|sun-appserver9.1|javax.ee.enterprise.system.nodeagent|_ThreadID=10;_ThreadName=main;|NAGT0025:The node agent could not configure logging levels. Default logging level will be used.|#]

[#|2009-01-15T14:29:14.482+0900|INFO|sun-appserver9.1|javax.ee.enterprise.system.nodeagent|_ThreadID=10;_ThreadName=main;|NAGT0038:Executing Synchronization for node-agent With DAS|#]

[#|2009-01-15T14:29:23.553+0900|INFO|sun-appserver9.1|javax.ee.enterprise.system.tools.synchronization|_ThreadID=10;_ThreadName=main;|SYNC001: Unable to communicate with Domain Administration Server. Skipping synchronization.|#]

[#|2009-01-15T14:29:23.555+0900|SEVERE|sun-appserver9.1|javax.ee.enterprise.system.nodeagent|_ThreadID=10;_ThreadName=main;|NAGT0035:The NodeAgent failed to complete the intial synchronization with the DAS. Please make sure the DAS is running and is accessible from the NodeAgents server|#]

[#|2009-01-15T14:29:25.556+0900|WARNING|sun-appserver9.1|javax.ee.enterprise.system.nodeagent|_ThreadID=10;_ThreadName=main;|NAGT0013:Stopping Node Agent...|#]

- Do I have problems to reach host lnobili-kubuntu on port 8686? Let's see:

lnobili@kubuntu-vbox:~/apps/glassfish/bin$ telnet lnobili-kubuntu 8686
Trying 172.21.16.201...
Connected to lnobili-kubuntu.
Escape character is '^]'.

...no I do not have problems, I can telnet that port.

What do I see on machine1 side? Nothing useful, last line of the log was printed when the server started up. There is no mention of the fact that the node agent tried to contact the DAS.

Now I am starting to wonder, how difficult can this be!?!? Am I doing something stupid without realizing? On what environment have you actually seen this really working? Was it with two instances on separate machines or the usual example of two instances on the same machine?

Thanks again for any further help you might be able to give me,
Luca

lunobili
Offline
Joined: 2008-10-14

Guys pleeeese,
does anybody know how to fix this? I am stuck here. Any help is very welcomed.

Satyajit.Tripathi@Sun.COM

Hi Luca Nobili,

Sorry, you didn't hear back from me earlier. I surely missed your
previous email in my Inbox.

Btw based on the Problem *Description
* provided by
you, I can make few guesses at this moment.
--------------------------------------------------------------------------------
[#|2008-12-30T12:02:56.599+0900|SEVERE|sun-appserver9.1|javax.ee.enterprise.system.nodeagent|_ThreadID=10;_ThreadName=main;|NAGT0035:The
NodeAgent failed to complete the initial synchronization with the DAS.
Please make sure the DAS is running and is accessible from the
NodeAgents server|#]
--------------------------------------------------------------------------------

The above error message suggests that you check for the following

* DAS should be STATUS: running on
* You would have executed the following command on the machine2
asadmin> create-node-agent --host --port
<4848/admin-port>
* May want to check the network accessibility also using hostname.
If you are on Solaris verify the file /etc/hosts
* No firewall port should be blocking the communication

Let me know if this was useful ? You can also refer my blog post
*GlassFish-V2-Clustering-Simplified.pdf*

Thanks & regards
--Satya

On 01/08/09 13:33, glassfish@javadesktop.org wrote:
> Guys pleeeese,
> does anybody know how to fix this? I am stuck here. Any help is very welcomed.
> [Message sent by forum member 'lunobili' (lunobili)]
>
> http://forums.java.net/jive/thread.jspa?messageID=324702
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>
>

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SATYAJIT TRIPATHI
ISV Engineering APAC

Sun Microsystems India Private Limited
Bangalore 560025
DID : +91 80 66937865
Mobile: +91 9886019892
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[att1.html]

lunobili
Offline
Joined: 2008-10-14

Hi Satyajit,
thank you very much for your reply.
As regards what you are asking me to check, I can telnet machine1 on port 8686 this, imho, should prove that DAS is running and there are no firewalls blocking the communication.
I confirm that I created the node agent with a command similar to the one you mentioned (indeed machine1 reports rendezvousOccurred=true for the remote node agent).
At the moment I do not have a name assigned to my DAS so I need to use the ip address. Is this an issue?

I hava a further question. How can I rise the log level of the node agent? That might give me a clue of what the problem is.

Thanks again,
Luca

Satyajit Tripathi

[att1.html]

lunobili
Offline
Joined: 2008-10-14

Hi Satya,
thank again for replying me.

- DAS runs on a Linux box intel DuoCore 2 3GB of RAM, while the remote agent runs on Solaris 10 Intel 8GB od RAM

- DAS is using java 1.6.0_07, while remote agent uses java 1.6.0_11

- Both machines run Sun Java System Application Server 9.1_02 (build b04-fcs)

- asadmin list-domains on DAS returns "domain1 running" is this correct?

- I ran asadmin start-node-agent --syncinstances=true , but bahaviour does not change

- I used this command to create my remote agent "./asadmin create-node-agent -H 172.21.16.201 --agentproperties loglevel=FINEST NodeAgent2 ", however I do not get more information in the log file and I keep on reading this message:
[#|2009-01-09T11:07:35.760+0900|INFO|sun-appserver9.1|javax.ee.enterprise.system.nodeagent|_ThreadID=10;_ThreadName=main;|NAGT0025:The node agent could not configure logging levels. Default logging level will be used.|#]
Looks like changing the log level has no effect.

What else would you suggest to try?

Thanks, Luca

Satyajit Tripathi

[att1.html]

lunobili
Offline
Joined: 2008-10-14

Hi Satya,
as you required, I sent you privately my domain.xml and server.log files.

Thanks,
Luca

Satyajit Tripathi

[att1.html]