[rdo-users] [tripleo][nova][HA]nova_placement unhealthy after fresh HA deployment

Fri Oct 12 01:22:06 UTC 2018

Hi folks,

Another issue I have noticed is with the nova_placement container using 3
controllers in HA mode with the Queens release.

More than half the time, users were unable to launch instances after a
fresh deployment. Attempts to launch instances would either failed or stuck
at BUILD status. Other times the first one or two launches went
successfully, but subsequent launches would fail. This phenomenon would
last until the nova_placement container on one of the controllers became
unhealthy and was given a restart.

My current "solution" is to preemptively restart every nova_placement
container on every controller node after a fresh deployment before
launching any instances. This doesn't fix the problem every time as I can
recall, but it does significantly reduce the likelihood of initial launch
failures.

I have collected some logs from the nova_placement container and linked
them below. To my untrained eye, I am unable to find out clues. Hopefully
they are useful to help you debug.

/var/log/nova/nova-schedular.log : http://paste.openstack.org/show/731930/

/var/log/nova/nova-placement-api.log :
http://paste.openstack.org/show/731931/

/var/log/nova-conductor.log : http://paste.openstack.org/show/731932/

/var/log/httpd/placement_wsgi_access.log (with repetitive entries/patterns
shortened): http://paste.openstack.org/show/731933/

/var/log/httpd/placement_wsgi_error.log :
http://paste.openstack.org/show/731934/

Thank you very much.

Best regards,
Cody
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/users/attachments/20181011/8f5406b9/attachment-0001.html>