Hi folks,

Another issue I have noticed is with the nova_placement container using 3 controllers in HA mode with the Queens release.

More than half the time, users were unable to launch instances after a fresh deployment. Attempts to launch instances would either failed or stuck at BUILD status. Other times the first one or two launches went successfully, but subsequent launches would fail. This phenomenon would last until the nova_placement container on one of the controllers became unhealthy and was given a restart.

My current "solution" is to preemptively restart every nova_placement container on every controller node after a fresh deployment before launching any instances. This doesn't fix the problem every time as I can recall, but it does significantly reduce the likelihood of initial launch failures.

I have collected some logs from the nova_placement container and linked them below. To my untrained eye, I am unable to find out clues. Hopefully they are useful to help you debug.

/var/log/nova/nova-schedular.log : http://paste.openstack.org/show/731930/

/var/log/nova/nova-placement-api.log : http://paste.openstack.org/show/731931/

/var/log/nova-conductor.log : http://paste.openstack.org/show/731932/

/var/log/httpd/placement_wsgi_access.log (with repetitive entries/patterns shortened): http://paste.openstack.org/show/731933/

/var/log/httpd/placement_wsgi_error.log : http://paste.openstack.org/show/731934/


Thank you very much.

Best regards,
Cody