How to cleanup Failed Actions from pcs status of cluster

In a clustered environment, the health of the cluster is of utmost importance. The status of the cluster can be monitored using the pcs status command. It provides information about the current state of the cluster, including the list of nodes, resources, and the status of these resources.

In some cases, the output of the pcs status command may indicate that a particular resource or action has failed. These failed actions can cause issues with the availability of the cluster and must be addressed immediately. In this blog post, we will discuss how to cleanup failed actions from the pcs status of the cluster.

Identifying Failed Actions

The first step in cleaning up failed actions is to identify them. When the pcs status command is run, it will display any failed actions in the output. Failed actions are typically marked with the word "Failed" in the output.

For example, the output of the pcs status command may show something like this:

Resource Group: myresourcegroup

myresource (ocf::heartbeat:myresource): Started node1

Monitor: mymonitor

myresource_monitor (ocf::heartbeat:myresource): Started node1

Failed Actions:

myresource_monitor_20000 on node2 'not running' (7): call=48, status=complete, exitreason='none',

last-rc-change='Mon May 16 10:41:12 2023', queued=0ms, exec=10004ms

In this example, the failed action is the myresource_monitor_20000 on node2. This action failed with an exit status of 7.

Cleaning Up Failed Actions

Once the failed actions have been identified, they can be cleaned up. There are a few steps that need to be taken to clean up failed actions:

1. Identify the failed resource: The first step is to identify the resource that is associated with the failed action. In the example above, the failed action is associated with the myresource_monitor resource.

2. Stop the failed action: The next step is to stop the failed action. This can be done using the pcs resource cleanup command. For example, to stop the failed action from the example above, the following command can be used:

pcs resource cleanup myresource_monitor_20000

This will stop the failed action and remove it from the list of failed actions in the pcs status output.

3. Restart the resource: Once the failed action has been stopped, the resource can be restarted. This can be done using the pcs resource restart command. For example, to restart the myresource resource from the example above, the following command can be used:

pcs resource restart myresource

This will restart the resource and ensure that it is running correctly.

4. Verify the status: Finally, it is important to verify the status of the resource after it has been restarted. This can be done using the pcs status command. For example, to check the status of the myresource resource from the example above, the following command can be used:

pcs status myresource

This will display the current status of the resource and ensure that it is running correctly.

Conclusion

In a clustered environment, it is important to monitor the status of the cluster and address any issues that arise as quickly as possible. Failed actions can cause issues with the availability of the cluster and must be cleaned up promptly. By following the steps outlined in this blog post, you can identify and clean up failed actions in the pcs status of the cluster, ensuring that it remains healthy and available.

DEVOPSZONES

Recent blogs

How to cleanup Failed Actions from pcs status of cluster

How to cleanup Failed Actions from pcs status of cluster

No comments

Contributors

Popular

Subscribe Us

Please Support this website

Devopszones Page

Recent

Comments

Frequent Topics