See inline
-----Original Message-----
From: rdo-list-bounces(a)redhat.com [mailto:rdo-list-bounces@redhat.com]
On Behalf Of Andrew Beekhof
Sent: Tuesday, April 07, 2015 9:13 PM
To: rdo-list(a)redhat.com; rhos-pgm
Cc: milind.manjrekar(a)redhat.com; Perry Myers; Marcos Garcia; Balaji
Jayavelu
Subject: [Rdo-list] New deployment model for HA compute nodes - now
with automated recovery of VMs
Previously in order monitor the healthiness of compute nodes and the
services running on them, we had to create single node clusters due to
corosync's scaling limits.
We can now announce a new deployment model that allows Pacemaker to
continue this role, but presents a single coherent view of the entire
deployment while allowing us to scale beyond corosync's limits.
[snip]
With these capabilities in place, we can exploit Pacemaker's
node
monitoring and fencing capabilities to drive nova host-evacuate for the
failed compute nodes and recover the VMs elsewhere.
When a compute node fails, Pacemaker will:
1. Execute 'nova service-disable'
See
https://review.openstack.org/#/c/169836/
In particular, the note from Sylvain Bauza on patchset 8 about the issue
when Ironic and/or VMware drivers are in use. The blueprint this review
addresses is to introduce notifications similar to this. This blueprint
is targeted for Liberty. If the node is down, I'm not sure that
'nova service-disable' will quickly cause the VMs running on it to
disable, as any service on that node is likely already crashed, and is
unlikely to let anyone know about its death.
In any case, comments on the above blueprint are welcome, and the blueprint
should help with this step.
2. fence (power off) the failed compute node 3. fence_compute off
(waiting for nova to detect the compute node is gone) 4. fence_compute
on (a no-op unless the host happens to be up already) 5. Execute 'nova
service-enable' when the compute node returns
-- Andrew
Regards,
John Haller