Howdy folks,
Just wanted to give a heads up that I plan to replace the
"high-availability" tripleo-quickstart job in the CI promotion
pipeline[1], with a job with a lower footprint. In CI, we get a virthost
with 32G of RAM and a mediocre CPU. It is really hard to fit 5 really
active VMs on that, and we have never had the HA job stable enough to
use as a gate for that reason.
Instead, we will test the pacemaker code path in tripleo by using a
single controller setup with pacemaker enabled. We were never actually
testing HA (ie failover scenarios) in the current job, so this should be
a pretty minimal loss in coverage.
Since this allows us to drop two CPU intensive nodes from the deploy, we
can add a ceph node to that job. This will end up with more code
coverage then the current HA job, and will hopefully will end up being
stable enough to use as a gate as well.
Longer term, it would be good to restore an actual HA job, maybe even
adding some failure scenario tests to the job. I have a couple of ideas
about how we could do this, but none are feasible in the short term.
1. Use pre-existing servers for deploying[2]
This would allow running the HA job against any cloud, where we could
size the nodes appropriately to make the job stable.
2. Use an OVB cloud for the HA job.
Soon we should have an OVB (openstack virtual baremetal) cloud to run
tests in. OVB would have all of the benefits of the solution above
(unrestricted VM size), and would also provide us a way to test Ironic
in a more realistic way since it mocks IPMI rather than our current
method of using a fake ironic driver (which just does virsh commands
over SSH).
3. Add a feature to tripleo-quickstart to bridge multiple virthosts
If we could deploy our virtual machines across 2 different hosts, we
would then have much more room to deploy the HA job.
If anyone has some better ideas, they are very welcome!
-- trown
[1]
https://ci.centos.org/view/rdo/view/promotion-pipeline/
[2]
https://review.openstack.org/#/c/324777/