Node on 3.12 cluster won't restart
We're having an issue where a node in our cluster will quit responding to requests (rmi, or http). The logs stop and cpu usage tanks. The problem this causes is that the DAS doesn't recognize that the node has failed and keeps trying to send work there which basically results in the cluster being unavailable because everything piles up on that node.
When this happens we attempt to restart the node using the GUI Admin. This is unsuccessful and fails to shut down the existing instance. If I view the log the following error is there
There should be only 1 primordial module but 0 primordial modules were found.|#]
The only way around this seems to be kill -9 which to I don't like to do. Once the instance has been shut down, then the GUI Admin can restart the node (I have to click the start button)
Does any one have any idea what might be going on or what I should be looking in to?