Hello stackers,
I have a 3-controller based cluster deployed using TripleO (Queens,
tripleo-current). Every time after a fresh deployment, I am unable to
launch instances for the first few times. New launches either stuck at
BUILD or go to ERROR state. The issue continues until the "nova_placement"
container on the controller with the InternalAPI VIP becomes "unhealthy"
and receives a manual docker restart. Only after that, everything goes back
to normal. The problem is reproducible and consistent every time with fresh
deployments.
I found a similar description of the syndrome in this bug [1]. The report
is for Rocky, but the syndrome is similar to mine using Queens.
[1]
https://bugzilla.redhat.com/show_bug.cgi?id=1630069
Here is an excerpt from the /var/log/containers/nova/nova-scheduler.log
from one of the controller nodes:
...
2018-10-06 04:57:37.926 28 INFO nova.scheduler.host_manager
[req-dfa89ff4-cbde-482e-ad40-0696294ffdd1 - - - - -] Host mapping not found
for host overcloud-novacompute-1.localdomain. Not tracking instance info
for this host.
2018-10-06 04:57:37.926 25 INFO nova.scheduler.host_manager
[req-dfa89ff4-cbde-482e-ad40-0696294ffdd1 - - - - -] Host mapping not found
for host overcloud-novacompute-1.localdomain. Not tracking instance info
for this host.
2018-10-06 04:57:37.926 30 INFO nova.scheduler.host_manager
[req-dfa89ff4-cbde-482e-ad40-0696294ffdd1 - - - - -] Host mapping not found
for host overcloud-novacompute-1.localdomain. Not tracking instance info
for this host.
2018-10-06 04:57:37.926 32 INFO nova.scheduler.host_manager
[req-dfa89ff4-cbde-482e-ad40-0696294ffdd1 - - - - -] Host mapping not found
for host overcloud-novacompute-1.localdomain. Not tracking instance info
for this host.
2018-10-06 04:57:37.926 26 INFO nova.scheduler.host_manager
[req-dfa89ff4-cbde-482e-ad40-0696294ffdd1 - - - - -] Host mapping not found
for host overcloud-novacompute-1.localdomain. Not tracking instance info
for this host.
2018-10-06 04:57:37.927 28 INFO nova.scheduler.host_manager
[req-dfa89ff4-cbde-482e-ad40-0696294ffdd1 - - - - -] Received a sync
request from an unknown host 'overcloud-novacompute-1.localdomain'.
Re-created its InstanceList.
2018-10-06 04:57:37.927 25 INFO nova.scheduler.host_manager
[req-dfa89ff4-cbde-482e-ad40-0696294ffdd1 - - - - -] Received a sync
request from an unknown host 'overcloud-novacompute-1.localdomain'.
Re-created its InstanceList.
2018-10-06 04:57:37.927 30 INFO nova.scheduler.host_manager
[req-dfa89ff4-cbde-482e-ad40-0696294ffdd1 - - - - -] Received a sync
request from an unknown host 'overcloud-novacompute-1.localdomain'.
Re-created its InstanceList.
2018-10-06 04:57:37.927 32 INFO nova.scheduler.host_manager
[req-dfa89ff4-cbde-482e-ad40-0696294ffdd1 - - - - -] Received a sync
request from an unknown host 'overcloud-novacompute-1.localdomain'.
Re-created its InstanceList.
2018-10-06 04:57:37.927 26 INFO nova.scheduler.host_manager
[req-dfa89ff4-cbde-482e-ad40-0696294ffdd1 - - - - -] Received a sync
request from an unknown host 'overcloud-novacompute-1.localdomain'.
Re-created its InstanceList.
Could this be the same bug for Queens, too?
Cody