Re: [Rdo-list] New deployment model for HA compute nodes - now with automated recovery of VMs

Friday, 10 April 2015

See inline
...
 -----Original Message-----
 From: rdo-list-bounces(a)redhat.com [mailto:rdo-list-bounces@redhat.com]
 On Behalf Of Andrew Beekhof
 Sent: Tuesday, April 07, 2015 9:13 PM
 To: rdo-list(a)redhat.com; rhos-pgm
 Cc: milind.manjrekar(a)redhat.com; Perry Myers; Marcos Garcia; Balaji
 Jayavelu
 Subject: [Rdo-list] New deployment model for HA compute nodes - now
 with automated recovery of VMs

 Previously in order monitor the healthiness of compute nodes and the
 services running on them, we had to create single node clusters due to
 corosync's scaling limits.
 We can now announce a new deployment model that allows Pacemaker to
 continue this role, but presents a single coherent view of the entire
 deployment while allowing us to scale beyond corosync's limits. [snip]
...
 With these capabilities in place, we can exploit Pacemaker's
node
 monitoring and fencing capabilities to drive nova host-evacuate for the
 failed compute nodes and recover the VMs elsewhere.
 When a compute node fails, Pacemaker will:

 1. Execute 'nova service-disable' 
See https://review.openstack.org/#/c/169836/
In particular, the note from Sylvain Bauza on patchset 8 about the issue
when Ironic and/or VMware drivers are in use. The blueprint this review
addresses is to introduce notifications similar to this. This blueprint
is targeted for Liberty. If the node is down, I'm not sure that
'nova service-disable' will quickly cause the VMs running on it to
disable, as any service on that node is likely already crashed, and is
unlikely to let anyone know about its death.

In any case, comments on the above blueprint are welcome, and the blueprint
should help with this step.

...
 2. fence (power off) the failed compute node 3. fence_compute off
 (waiting for nova to detect the compute node is gone) 4. fence_compute
 on (a no-op unless the host happens to be up already) 5. Execute 'nova
 service-enable' when the compute node returns

 -- Andrew 

Regards,
John Haller

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [Rdo-list] New deployment model for HA compute nodes - now with automated recovery of VMs