Skip to main content

get-health reports incorrectly

16 replies [Last post]
windwalker78
Offline
Joined: 2011-04-19
Points: 0

Hello,
I managed to get Glassfish 3.1 b43 running as DAS and two remote instances. I have two problems in fact:
1. I have to start the instances manually from the DAS server after I reboot the OS on any of the remote instances (mycluster is not started automatically).
2. After I start both instances and issue: get-health mycluster on the DAS server I get:
asadmin> get-health cluster1
inst1 not started
inst2 not started
Command get-health executed successfully.

validate-cluster command says multicast is OK and I have no firewalls between the DAS and the two instances
I also made a network listener on custom port and did some LB with mod_jk. Applications are deployed centrally too without any problem.
Can you please suggest where to look for the problem? I haven't set any password on the admin account in order to get these things going first.
Thank you in advance,
Todor

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
bbissett
Offline
Joined: 2003-06-16
Points: 0

[I responded to this on the users list, but forgot to cc the forum, so am resending.]

Something I forgot before, which maybe you've already tried -- can you check your DAS and instance logs to see if there are errors there?

On Apr 19, 2011, at 3:52 PM, forums@java.net wrote:
> 2. After I start both instances and issue: get-health mycluster on the DAS
> server I get:
>
> asadmin> get-health cluster1
> inst1 not started
> inst2 not started
> Command get-health executed successfully.

I've seen the other emails with you and Tom, and want to follow up with this part. First, can you verify that the instances are really running? Can you do 'asadmin list-instances --long' and make sure the DAS knows the instances are up?

If the validate-multicast command shows multicast is working, that's a good start. But it doesn't mean the instances are using the right settings. See if any of the steps here help you:

http://blogs.sun.com/bobby/entry/validating_multicast_transport_where_d

If not, please respond with the list-instances output above, the output from 'validate-multicast' on the DAS and one other node (use the --verbose option as well), and if that doesn't solve it I might need a peek at your domain.xml.

Cheers,
Bobby

bbissett
Offline
Joined: 2003-06-16
Points: 0

>
> 2. After I start both instances and issue: get-health mycluster on the DAS
> server I get:
>
> asadmin> get-health cluster1
> inst1 not started
> inst2 not started
> Command get-health executed successfully.

I've seen the other emails with you and Tom, and want to follow up with this part. First, can you verify that the instances are really running? Can you do 'asadmin list-instances --long' and make sure the DAS knows the instances are up?

If the validate-multicast command shows multicast is working, that's a good start. But it doesn't mean the instances are using the right settings. See if any of the steps here help you:

http://blogs.sun.com/bobby/entry/validating_multicast_transport_where_d

If not, please respond with the list-instances output above, the output from 'validate-multicast' on the DAS and one other node (use the --verbose option as well), and if that doesn't solve it I might need a peek at your domain.xml.

Cheers,
Bobby

tmueller
Offline
Joined: 2005-10-31
Points: 0

Regarding #1, see the asadmin create-service command. This can be used
to have instances started automatically after an operating system boot.
A service must be created for each instance.

I assume that by "validate-cluster" you mean validate-multicast, as
there isn't any validate-cluster command. Did you run validate-multicast
on each of the 3 hosts at the same time? Does list-instances show that
all of the instances are running? Are you using an IPv6-only system?

Tom

windwalker78
Offline
Joined: 2011-04-19
Points: 0

Hello Tom,
First of all, thank you for your feedback.

tmueller wrote:
Regarding #1, see the asadmin create-service command. This can be used to have instances started automatically after an operating system boot. A service must be created for each instance.

I already issued this command and it generated the script Glassfish_domain1 in /etc/init.d/. This script only brings up the domain, but not the cluster itself. If the instances are stopped I have to start them one by one or I have to select "mycluster" and click "Start cluster". Do I have to add some "asadmin start-cluster mycluster" command to the Glassfish_domain1 script?
tmueller wrote:
I assume that by "validate-cluster" you mean validate-multicast, as there isn't any validate-cluster command. Did you run validate-multicast on each of the 3 hosts at the same time? Does list-instances show that all of the instances are running? Are you using an IPv6-only system? Tom

Sorry it is my mistake. The command is validate-multicast and the output confirms that each host sees the other two. I do not use ipv6 at all. I think it is disabled as I have NETWORK_IPV6=no in my network file.
These are the logs of the DAS and one of the instances after fresh start of the domain, cluster and issuing the get-health command:
server_das.log http://niti95.com/uploads/server_das.log
server_inst1.log http://niti95.com/uploads/server_inst1.log

Can you suggest what to do?
Best regards,
Todor

bbissett
Offline
Joined: 2003-06-16
Points: 0

windwalker78 wrote:

These are the logs of the DAS and one of the instances after fresh start of the domain, cluster and issuing the get-health command:
server_das.log http://niti95.com/uploads/server_das.log
server_inst1.log http://niti95.com/uploads/server_inst1.log

My email still hasn't shown up here, but maybe I can simplify things a bit. I didn't see the log URLs before, and looking at them now I see that GMS (the system that supports HA and the get-health command) is working. At least in the inst1 log, I see that it knows about inst2 from this entry (look for ShoalLogger for these):

[#|2011-04-20T22:46:49.581+0300|INFO|glassfish3.1|ShoalLogger|_ThreadID=12;_ThreadName=Thread-1;|GMS1092: GMS View Change Received for group: cluster1 : Members in view for JOINED_AND_READY_EVENT(before change analysis) are :
1: MemberId: inst1, MemberType: CORE, Address: 192.168.3.203:9143:228.9.151.111:25336:cluster1:inst1
2: MemberId: inst2, MemberType: CORE, Address: 192.168.3.204:9195:228.9.151.111:25336:cluster1:inst2

I don't see the DAS there, which would show up as member id "server." Because the timestamps are different in the two logs, I can't compare the inst1 log to what I see in the DAS. If those logs are from the same run, my guess is that the clocks on these two systems aren't in sync, and so the DAS is getting messages from the instances but they can't be processed. In the DAS log after the "GMS1062: GroupStart" message, you should start seeing messages like "GMS1024: Adding Join member: inst1".

Can you check the times on the two machines and see if they're way off? After that, you can try increasing the log level for ShoalLogger to see what's going on. But multicast is definitely working on your machines, so you can skip debugging that part.

Cheers,
Bobby

tmueller
Offline
Joined: 2005-10-31
Points: 0

The create-service command takes an optional argument. If you don't
specify it, it creates a service for the one and only domain. If you
want the service to start and instance, use the --node option and an
instance name as the argument.

Hopefully somebody that is more of an expert with GMS can help with that.
Tom

windwalker78
Offline
Joined: 2011-04-19
Points: 0

Good morning,

tmueller wrote:
The create-service command takes an optional argument. If you don't specify it, it creates a service for the one and only domain. If you want the service to start and instance, use the --node option and an instance name as the argument. Hopefully somebody that is more of an expert with GMS can help with that. Tom

Tom, thank you very much for "--node" tip. I run the command on one of the instances and now I am sure this instance will autostart after each reboot.
However the problem with "get-health cluster1" remains. Do you think I should contact the developers for this?
Another thing that I have problem with is the password alias. Can you give me a tip for this too. This part of the startup script on my DAS machine:
$ASADMIN start-domain --domaindir /usr/share/glassfish3/glassfish/domains --passwordfile /usr/share/glassfish3/glassfish/domains/domain1/config/passwords.txt domain1

$ASADMIN start-cluster --passwordfile /usr/share/glassfish3/glassfish/domains/domain1/config/passwords.txt cluster1
As you see I have setup a password with alias for the admin user. However the first command executes perfectly and the domain is started, but the second cannot operate with the alias option.If the password.txt file is like this:
"AS_ADMIN_PASSWORD=${ALIAS=jms-password}"
Only the first row is executed. If the password is in cleartext like this:
"AS_ADMIN_PASSWORD=MyPasswordHere"
Both commands get executed. I really need to hide my password. Do you think this is also a bug?

With kind regards,
Todor

tmueller
Offline
Joined: 2005-10-31
Points: 0

I see that Bobby has responded regarding the get-health issue.

Regarding password aliases, there is a mistake in the Security Guide
regarding password aliases for AS_ADMIN_PASSWORD. An alias for
AS_ADMIN_PASSWORD cannot be used in the password file that is passed to
asadmin. If it could, then anyone that knows the name of your alias
would be able to access the server without knowing the password. This
would be a big security hole. I've created issue GLASSFISH-16401 about
the documentation problem.

To store an obfuscated password locally so that you can run asadmin
command without entering the password everytime, use the "asadmin login"
subcommand. This command prompts for the username and password and
stores the obfuscated password in a "$HOME/.asadminpass" file, which is
then used by future asadmin commands.

Password aliases are intended for use within the domain.xml file so that
plaintext passwords do not have to be stored there.

Tom

windwalker78
Offline
Joined: 2011-04-19
Points: 0

Hello Tom and Bobby,

tmueller wrote:
I see that Bobby has responded regarding the get-health issue. Regarding password aliases, there is a mistake in the Security Guide regarding password aliases for AS_ADMIN_PASSWORD. An alias for AS_ADMIN_PASSWORD cannot be used in the password file that is passed to asadmin. If it could, then anyone that knows the name of your alias would be able to access the server without knowing the password. This would be a big security hole. I've created issue GLASSFISH-16401 about the documentation problem. To store an obfuscated password locally so that you can run asadmin command without entering the password everytime, use the "asadmin login" subcommand. This command prompts for the username and password and stores the obfuscated password in a "$HOME/.asadminpass" file, which is then used by future asadmin commands. Password aliases are intended for use within the domain.xml file so that plaintext passwords do not have to be stored there. Tom

Thank you again Tom for providing the precious asadmin login command. This solved my issue with cleartext password in 100%. :)
bbissett wrote:
If not, please respond with the list-instances output above, the output from 'validate-multicast' on the DAS and one other node (use the --verbose option as well), and if that doesn't solve it I might need a peek at your domain.xml.

validate-multicast works perfectly and all three nodes see each other.
asadmin> list-instances --long
NAME HOST PORT PID CLUSTER STATE
inst1 glassfishin01 24848 2977 cluster1 running
inst2 glassfishin02 24848 2966 cluster1 running
Command list-instances executed successfully.
I uploaded again the full information about the domain with synced clocks this time. You were right about the clocks. There was a difference of nearly 4-5 hours.
I fixed this and did the test again. Nothing different unfortunately. get-health cluster1 said both instances are not running again though they are 100% operational (also on the WEB GUI localhost:4848 both instances are correctly displayed as running) as you can see from the following logs:
http://www.niti95.com/uploads/server_das.log
http://www.niti95.com/uploads/server_inst1.log
http://www.niti95.com/uploads/server_inst2.log
http://www.niti95.com/uploads/domain.xml
I hope this information will be helpful to you.
Regards,
Todor Ivanov

tmueller
Offline
Joined: 2005-10-31
Points: 0

Your two instance IP addresses are on 192.168.3, but your DAS is on
10.0.0.1. Are you sure that multicast was validated between server
running the DAS and the servers running the instances? Typically
multicast is not configured to work across subnets.

In order for get-health to work, multicast is required between the DAS
and the instances.

Tom

bbissett
Offline
Joined: 2003-06-16
Points: 0

On Apr 21, 2011, at 3:14 PM, Tom Mueller wrote:
> Your two instance IP addresses are on 192.168.3, but your DAS is on 10.0.0.1. Are you sure that multicast was validated between server running the DAS and the servers running the instances? Typically multicast is not configured to work across subnets.
>
> In order for get-health to work, multicast is required between the DAS and the instances.

I agree with Tom that it looks like the DAS is on a different subnet with the two instances. The instances can see each other just fine, according to the logs.

Can you give the steps here a try and make sure you're running the validate-multicast command on the DAS and one of the instances at the same time (doesn't matter which instance)?

http://blogs.sun.com/bobby/entry/validating_multicast_transport_where_d

For step one in the blog, the multicast address and port that your cluster is using are 228.9.151.111 and 25336. It may be that your DAS machine has more than one network adapter and one of them is actually on the same subnet as the instances. Step 3 of the blog talks in a little more detail about this as well as how to check and how to specify the adapter if you need to.

Cheers,
Bobby

windwalker78
Offline
Joined: 2011-04-19
Points: 0

Hello again,

bbissett wrote:
On Apr 21, 2011, at 3:14 PM, Tom Mueller wrote: > Your two instance IP addresses are on 192.168.3, but your DAS is on 10.0.0.1. Are you sure that multicast was validated between server running the DAS and the servers running the instances? Typically multicast is not configured to work across subnets. > > In order for get-health to work, multicast is required between the DAS and the instances. I agree with Tom that it looks like the DAS is on a different subnet with the two instances. The instances can see each other just fine, according to the logs. Can you give the steps here a try and make sure you're running the validate-multicast command on the DAS and one of the instances at the same time (doesn't matter which instance)? http://blogs.sun.com/bobby/entry/validating_multicast_transport_where_d For step one in the blog, the multicast address and port that your cluster is using are 228.9.151.111 and 25336. It may be that your DAS machine has more than one network adapter and one of them is actually on the same subnet as the instances. Step 3 of the blog talks in a little more detail about this as well as how to check and how to specify the adapter if you need to. Cheers, Bobby

Yes you are right. The "validate-multicast" commnad works perfectly every time I run it, but the problem is that the DAS has more than one interface and for some reason domain.xml is using the IP address fo eth1 instead of eth0.
I will have to dig a little deeper. Now I am following:
http://download.oracle.com/docs/cd/E18930_01/html/821-2426/gjfnl.html#gjdlw

in order to try to do this change and will post back. Thanks for pointing me to the right direction.

Best regards,
Todor

windwalker78
Offline
Joined: 2011-04-19
Points: 0

Hello Tom and Bobby,

I changed the value of gms-bind-interface-address from ${GMS-BIND-INTERFACE-ADDRESS-cluster1} to 192.168.3.201 with the following command:
set clusters.cluster.cluster1.gms-bind-interface-address=192.168.3.201

After that I restarted the domain and cluster and now I have the following output from get-health cluster1:
asadmin> get-health cluster1
inst1 started since Sat Apr 23 16:02:49 EEST 2011
inst2 started since Sat Apr 23 16:02:43 EEST 2011
Command get-health executed successfully.

This was the desired output so thank you both again!
I was wondering why ${GMS-BIND-INTERFACE-ADDRESS-cluster1} takes the IP of 10.0.0.1 which belongs to eth1 (an interface used for another purposes). I thought it took eth0 as validate-multicast with default settings said that all instances see each other and the DAS.

With kind regards,
Todor Ivanov

bbissett
Offline
Joined: 2003-06-16
Points: 0

Hi again,

I responded a few days ago through email and haven't seen a reply on the forum yet, so am responding directly here. I *tried* to write:

I'm glad that worked out for you. Dealing with these network-level issues can be tricky. For instance, your question about why the DAS did one thing and validate-multicast another.

Without any options given, the validate-multicast subcommand has to use some default values, and the default for network adapter is to not specify one at all. So the JDK can pick whichever one it wants, which may not match what GMS chooses in your host at the time it runs. Though it's a mostly debugging-oriented tool, you can see how validate-multicast could also be used to plan your configuration before creating the cluster (so you can specify the network adapters as needed for each host).

I have an RFE to make the command (or a similar one) a little more user friendly, such that it reads the configuration information, talks to the instances for you, etc. Based on this, I just added a comment for myself that it should let you know if it's making any assumptions such as network interface:

<a class="moz-txt-link-freetext" href="http://java.net/jira/browse/GLASSFISH-13056?focusedCommentId=307561&amp;page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_307561">http://java.net/jira/browse/GLASSFISH-13056?focusedCommentId=307561&amp;page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_307561</a>

Cheers,
windwalker78
Offline
Joined: 2011-04-19
Points: 0

Hello,

I also think it's logical to use the configuration in domain.xml for one to validate the multicast communication. In my case it is your and Tom's help that saved me time. I guess other beginners like me will face this problem again unless you fix it in 3.2.
Sorry for mentioning this here, but I have another post:
http://www.java.net/forum/topic/glassfish/glassfish/bind-gms-more-one-ip...
I just wanted to know if is possible have GMS multicast on more than interface per host? Yes bonding is an option, but it adds another level of complexity. I was thinking of this option as I didn't find any DAS failover capabilities and wanted to add the IP address of my current DAS and the IP of the backup-DAS in case I need it.
Best regards,
Todor

bbissett
Offline
Joined: 2003-06-16
Points: 0

windwalker78 wrote:

I also think it's logical to use the configuration in domain.xml for one to validate the multicast communication. In my case it is your and Tom's help that saved me time. I guess other beginners like me will face this problem again unless you fix it in 3.2.

It's not really a "fix," but a different command that will do more than the current one (and will require the server to be up as well as the instances). The current tool is for general usage, and can be used to diagnose issues or plan a configuration before you create your cluster. It actually is documented in the GF 3.1 HA guide how to do all this as well.

I saw your other post but don't have any more info than what Shreedhar responded.

Cheers,
Bobby