[Rdo-list] New deployment model for HA compute nodes - now with automated recovery of VMs

Haller, John H (John) john.haller at alcatel-lucent.com
Fri Apr 10 20:12:31 UTC 2015

See inline
> -----Original Message-----
> From: rdo-list-bounces at redhat.com [mailto:rdo-list-bounces at redhat.com]
> On Behalf Of Andrew Beekhof
> Sent: Tuesday, April 07, 2015 9:13 PM
> To: rdo-list at redhat.com; rhos-pgm
> Cc: milind.manjrekar at redhat.com; Perry Myers; Marcos Garcia; Balaji
> Jayavelu
> Subject: [Rdo-list] New deployment model for HA compute nodes - now
> with automated recovery of VMs
> Previously in order monitor the healthiness of compute nodes and the
> services running on them, we had to create single node clusters due to
> corosync's scaling limits.
> We can now announce a new deployment model that allows Pacemaker to
> continue this role, but presents a single coherent view of the entire
> deployment while allowing us to scale beyond corosync's limits.
> With these capabilities in place, we can exploit Pacemaker's node
> monitoring and fencing capabilities to drive nova host-evacuate for the
> failed compute nodes and recover the VMs elsewhere.
> When a compute node fails, Pacemaker will:
> 1. Execute 'nova service-disable'

See https://review.openstack.org/#/c/169836/
In particular, the note from Sylvain Bauza on patchset 8 about the issue
when Ironic and/or VMware drivers are in use. The blueprint this review
addresses is to introduce notifications similar to this. This blueprint
is targeted for Liberty. If the node is down, I'm not sure that
'nova service-disable' will quickly cause the VMs running on it to
disable, as any service on that node is likely already crashed, and is
unlikely to let anyone know about its death.

In any case, comments on the above blueprint are welcome, and the blueprint
should help with this step.

> 2. fence (power off) the failed compute node 3. fence_compute off
> (waiting for nova to detect the compute node is gone) 4. fence_compute
> on (a no-op unless the host happens to be up already) 5. Execute 'nova
> service-enable' when the compute node returns
> -- Andrew

John Haller

More information about the dev mailing list