How to cleanup Failed Actions from pcs status of cluster
How to cleanup Failed Actions from pcs status of cluster
In a clustered environment, the health of the cluster is of utmost importance. The status of the cluster can be monitored using the pcs status command. It provides information about the current state of the cluster, including the list of nodes, resources, and the status of these resources.
In some cases, the output of the pcs status command may indicate that a particular resource or action has failed. These failed actions can cause issues with the availability of the cluster and must be addressed immediately. In this blog post, we will discuss how to cleanup failed actions from the pcs status of the cluster.
Identifying Failed Actions
The first step in cleaning up failed actions is to identify them. When the pcs status command is run, it will display any failed actions in the output. Failed actions are typically marked with the word "Failed" in the output.
For example, the output of the pcs status command may show something like this:
Resource Group: myresourcegroup
myresource (ocf::heartbeat:myresource): Started node1
Monitor: mymonitor
myresource_monitor (ocf::heartbeat:myresource): Started node1
Failed Actions:
myresource_monitor_20000 on node2 'not running' (7): call=48, status=complete, exitreason='none',
last-rc-change='Mon May 16 10:41:12 2023', queued=0ms, exec=10004ms
In this example, the failed action is the myresource_monitor_20000 on node2. This action failed with an exit status of 7.
Cleaning Up Failed Actions
Once the failed actions have been identified, they can be cleaned up. There are a few steps that need to be taken to clean up failed actions:
1. Identify the failed resource: The first step is to identify the resource that is associated with the failed action. In the example above, the failed action is associated with the myresource_monitor resource.
2. Stop the failed action: The next step is to stop the failed action. This can be done using the pcs resource cleanup command. For example, to stop the failed action from the example above, the following command can be used:
pcs resource cleanup myresource_monitor_20000
This will stop the failed action and remove it from the list of failed actions in the pcs status output.
3. Restart the resource: Once the failed action has been stopped, the resource can be restarted. This can be done using the pcs resource restart command. For example, to restart the myresource resource from the example above, the following command can be used:
pcs resource restart myresource
This will restart the resource and ensure that it is running correctly.
4. Verify the status: Finally, it is important to verify the status of the resource after it has been restarted. This can be done using the pcs status command. For example, to check the status of the myresource resource from the example above, the following command can be used:
pcs status myresource
This will display the current status of the resource and ensure that it is running correctly.
Conclusion
In a clustered environment, it is important to monitor the status of the cluster and address any issues that arise as quickly as possible. Failed actions can cause issues with the availability of the cluster and must be cleaned up promptly. By following the steps outlined in this blog post, you can identify and clean up failed actions in the pcs status of the cluster, ensuring that it remains healthy and available.
No comments