[rdo-users] RHOSP 10 failed overcloud deployment

Wed Jan 31 10:45:44 UTC 2018

On Wed, Jan 31, 2018 at 10:14 AM, Anda Nicolae <anicolae at lenovo.com> wrote:
> I've let the deployment run overnight and it failed after almost 4hrs with the errors below. Do you happen to know the config file where I can decrease the timeout? I looked in /etc/nova/nova.conf and in ironic config files but I couldn't find anything relevant.

This is the default timeout for the heat stack which creates the nodes via nova.

It is set via the --timeout option to the openstack overcloud deploy
command, you can set it to a lower value which will mean your stalled
deployment fails faster.

> The errors are:
>
> [overcloud.Compute.0]: CREATE_FAILED  ResourceInError: resources[0].resources.NovaCompute: Went to status ERROR due to "Message: Unknown, Code: Unknown"
> [overcloud.Controller.0]: CREATE_FAILED  Resource CREATE failed: ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
>
> It is unclear to me why the above errors occur, since in my instackenv.json I declared node capabilities for both the computer and the controller node to be greater than the compute and controller flavors from 'openstack flavor list'.
>
> However, I've found this link and I am looking over it:
> https://docs.openstack.org/ironic/latest/admin/troubleshooting.html#nova-returns-no-valid-host-was-found-error

I would check the nova-scheduler.log on the undercloud
(/var/log/nova/nova-scheduler.log) and look carefully at the state of
the ironic nodes - basically this means the nova-scheduler can't find
any nodes to deploy to (or not enough to satisfy the node counts
requested), which can have various causes such as bad node state (not
available and/or in maintenance mode), problems with power control
(e.g nodes not responsive via ipmi etc), or mistakes with node
tagging, profile matching or flavors.

The nova-scheduler log is fairly noisy, but if you look carefully
often you will find some clues as to why the nodes aren't being
started correctly.

Also check the status of the nova & ironic services on the undercloud
to make sure e.g ironic-conductor hasn't failed for any reason.

Hope that helps,

Steve Hardy

>
> Thanks,
> Anda
>
> -----Original Message-----
> From: Raoul Scarazzini [mailto:rasca at redhat.com]
> Sent: Tuesday, January 30, 2018 8:17 PM
> To: Anda Nicolae; users at lists.rdoproject.org
> Subject: Re: [rdo-users] RHOSP 10 failed overcloud deployment
>
> On 01/30/2018 04:39 PM, Anda Nicolae wrote:
>> Got it.
>>
>> I've noticed that it spends quite some time in CREATE_IN_PROGRESS state for OS::Heat::ResourceGroup resource (on Controller node).
>> Overcloud deployment fails after 4h. I will check in which config file is the overcloud deployment timeout configured and decrease it.
>>
>> Thanks,
>> Anda
>
> Check also network settings. 4h timeout is the default when something is unreachable.
>
> --
> Raoul Scarazzini
> rasca at redhat.com
> _______________________________________________
> users mailing list
> users at lists.rdoproject.org
> http://lists.rdoproject.org/mailman/listinfo/users
>
> To unsubscribe: users-unsubscribe at lists.rdoproject.org