What does autodisable mean? Why did VCS autodisable my Service Group?
VCS
does not allow failovers or online operation of a Service Group if it
is autodisabled. VCS has to autodisable a Service Group when VCS on a
particular node shuts down *but* the GAB heartbeat is still running.
Once GAB is unloaded, e.g. when the node actually shuts down to PROM
level, reboots, or powers off, VCS on the other nodes can automatically
clear the autodisable flag. During the time interval a Group is
autodisable, VCS won't allow that Group to failover or be onlined
anywhere within the cluster. This is a safety feature to protect against
"split brains", when more than one machine is using the same resources,
like the same filesystems and virtual IP at the same time. Once a node
leaves the cluster, VCS has to assume that machine can be
user-controlled before it goes down, that theoretically someone can
login to that machine and manually startup services. It is for that
reason that VCS autodisables a Group within the existing cluster. But
VCS does let you clear the autodisable flag yourself. Once you're sure
that the node that left the cluster doesn't have any services running,
you can clear the autodisable flag with this command:
hagrp -autoenable
{name of Group} -sys {name of node}
Repeat the command for each Group
that has been autodisabled. The Groups that are autodisabled and the
nodes they are autodisabled for can be found with this command: hastatus
-sum
Most of the time VCS autodisables a Group for a short period of
time and then clears the autodisable flag without you knowing it. If the
node that leaves the cluster actually shuts down, the GAB module is
also unloaded, and VCS running on the other nodes will assume that node
has shutdown. VCS will then automatically clear the autodisable flags
for you. There's one catch...by default VCS on the running cluster
requires GAB to be unloaded within 60 seconds after VCS on that node is
stopped. After 60 seconds, if GAB still isn't unloaded, VCS on the
existing cluster will assume that node isn't shutting down, and will
keep the autodisable flags until the administrator clears them. To
increase the 60 second window to 120 seconds, run this:
hasys -modify
ShutdownTimeout 120
For large systems that take a long time to shutdown,
it is a good idea to increase ShutdownTimeout.
No comments