[TripleO] Long OpenStack deployment times with RDO recent releases

Wednesday, 22 September 2021

Hey guys,

we migrated an OpenStack installation of one of our customers to TripleO
RDO when Rocky was released. Back then deployment times were fine, as the
whole cluster didn't have that many nodes yet.

In the meantime we upgraded to Ussuri and use 3 controllers and 44 compute
nodes by now. Our deployment times exponentially increased with each set of
compute nodes we added. With our current setup, a complete deployment run
takes about 15 to 16 hours. In our clusters we count the following
ressources right now:

VMs: ~ 2100
Networks: ~500
Ports: ~6100
Volumes: ~5700

We implemented ARA for now so we can get exact measures of each
ansible-playbook runtime to see what is taking the most time. My question
is: How big are your production OpenStack environments and how long does it
take you to deploy?

Which methods do you guys use to scale-up compute nodes? (spoiler:
--skip-deploy-identifier doesn't seem to work properly)
Is Blacklisting all other Compute nodes the right move? Do you even
blacklist the Controllers as well?

Best Regards,

Guido

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017