Got it.
I've noticed that it spends quite some time in CREATE_IN_PROGRESS state for
OS::Heat::ResourceGroup resource (on Controller node).
Overcloud deployment fails after 4h. I will check in which config file is the overcloud
deployment timeout configured and decrease it.
Thanks,
Anda
-----Original Message-----
From: Raoul Scarazzini [mailto:rasca@redhat.com]
Sent: Tuesday, January 30, 2018 4:28 PM
To: Anda Nicolae; users(a)lists.rdoproject.org
Subject: Re: [rdo-users] RHOSP 10 failed overcloud deployment
On 01/30/2018 02:06 PM, Anda Nicolae wrote:
Hello,
[...]> sudo journalctl -u os-collect-config
I think the problem is on my controller node, because I've
noticed the
following messages in the output of the above command:
os-collect-config[2996]: Source [ec2] Unavailable.
os-collect-config[2996]: /var/lib/os-collect-config/local-data not
found. Skipping
os-collect-config[2996]: No local metadata found
(['/var/lib/os-collect-config/local-data']
These messages repeat for various times in the output of the above command.
As far as I can say, this is perfectly normal. os-collect-config checks that path if there
are actions to do (driven by the director). The message you see basically says "no
actions to do".
On my underclud VM, I've noticed that overcloud deployment
remains
stuck when running wait_for_stack_ready function from
/usr/lib/python2.7/site-packages/tripleoclient/utils.py.
I also intend to add some logs in
/usr/lib/python2.7/site-packages/os_collect_config/collect.py to see
what causes the error message: Source [ec2] Unavailable I think I have
an error in my templates, but I don't figure out which yet. Do you
know what may cause this?
It might be useful to check in depth the status of the deployment, like described here
[1], maybe you're just low in resources, maybe it's something different, but what
you are seeing are not explicit errors.
[1]
https://docs.openstack.org/tripleo-docs/latest/install/troubleshooting/tr...
--
Raoul Scarazzini
rasca(a)redhat.com