[rdo-list] HA overcloud-deploy.sh crashes again ( ControllerOvercloudServicesDeployment_Step4 )
Boris Derzhavets
bderzhavets at hotmail.com
Thu Jun 30 09:19:17 UTC 2016
________________________________
From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Boris Derzhavets <bderzhavets at hotmail.com>
Sent: Wednesday, June 29, 2016 5:14 PM
To: Dan Sneddon; rdo-list at redhat.com
Subject: Re: [rdo-list] HA overcloud-deploy.sh crashes again ( ControllerOvercloudServicesDeployment_Step4 )
Yes , attempt to deploy
########################
# HA +2xCompute
########################
control_memory: 6144
compute_memory: 6144
undercloud_memory: 8192
# Giving the undercloud additional CPUs can greatly improve heat's
# performance (and result in a shorter deploy time).
undercloud_vcpu: 4
# Create three controller nodes and one compute node.
overcloud_nodes:
- name: control_0
flavor: control
- name: control_1
flavor: control
- name: control_2
flavor: control
- name: compute_0
flavor: compute
- name: compute_1
flavor: compute
# We don't need introspection in a virtual environment (because we are
# creating all the "hardware" we really know the necessary
# information).
introspect: false
# Tell tripleo about our environment.
network_isolation: true
extra_args: >-
--control-scale 3 --compute-scale 2 --neutron-network-type vxlan
--neutron-tunnel-types vxlan
-e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml
--ntp-server pool.ntp.org
deploy_timeout: 75
tempest: false
pingtest: true
Results during overcloud deployment :-
2016-06-30 09:09:31 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-06-30 09:09:31 [NovaCompute]: DELETE_IN_PROGRESS state changed
2016-06-30 09:09:34 [NovaCompute]: DELETE_COMPLETE state changed
2016-06-30 09:09:44 [NovaCompute]: CREATE_IN_PROGRESS state changed
2016-06-30 09:09:48 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
. . . . .
2016-06-30 09:11:36 [overcloud]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources.Compute.resources[0].resources.NovaCompute: Went to status ERROR due to "Message: Build of instance bf483c34-7010-48ea-8f58-fe192c91093f aborted: Failed to provision instance bf483c34-7010-48ea-8f58-fe192
2016-06-30 09:11:36 [1]: SIGNAL_COMPLETE Unknown
2016-06-30 09:11:36 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
2016-06-30 09:11:36 [1]: CREATE_COMPLETE state changed
2016-06-30 09:11:36 [overcloud-ControllerCephDeployment-62xh7uhtpjqp]: CREATE_COMPLETE Stack CREATE completed successfully
2016-06-30 09:11:37 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
2016-06-30 09:11:37 [1]: SIGNAL_COMPLETE Unknown
Stack overcloud CREATE_FAILED
Deployment failed: Heat Stack create failed.
+ heat stack-list
+ grep -q CREATE_FAILED
+ deploy_status=1
++ heat resource-list --nested-depth 5 overcloud
++ grep FAILED
++ grep 'StructuredDeployment '
++ cut -d '|' -f3
+ exit 1
Thanks.
Boris
________________________________
From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Dan Sneddon <dsneddon at redhat.com>
Sent: Wednesday, June 29, 2016 1:46 PM
To: rdo-list at redhat.com
Subject: Re: [rdo-list] HA overcloud-deploy.sh crashes again ( ControllerOvercloudServicesDeployment_Step4 )
On 06/29/2016 10:42 AM, Dan Sneddon wrote:
> On 06/29/2016 07:03 AM, Boris Derzhavets wrote:
>> Boris Derzhavets has shared a OneDrive file with you. To view it, click
>> the link below.
>>
>> <https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk>
[https://p.sfx.ms/icons/v2/Large/Default.png]<https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk>
HeatCrash2.txt 1.gz<https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk>
1drv.ms
GZ File
>>
>> HeatCrash2.txt 1.gz <https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk>
>> [HeatCrash2.txt 1.gz]
>>
>> Reattach gzip archive via One Drive
>>
>>
>>
>> -----------------------------------------------------------------------
>> *From:* rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on
>> behalf of Boris Derzhavets <bderzhavets at hotmail.com>
>> *Sent:* Wednesday, June 29, 2016 9:36 AM
>> *To:* John Trowbridge; shardy at redhat.com
>> *Cc:* rdo-list at redhat.com
>> *Subject:* [rdo-list] HA overcloud-deploy.sh crashes again (
>> ControllerOvercloudServicesDeployment_Step4 )
>>
>>
>> Attempt to follow steps suggested
>> in http://hardysteven.blogspot.ru/2016/06/tripleo-partial-stack-updates.html
>>
>>
>> ./deploy-overstack crashes
>>
>>
>> 2016-06-29 12:42:41
>> [overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk-ControllerOvercloudServicesDeployment_Step4-nzdoizlgrmx2]:
>> CREATE_FAILED Resource CREATE failed: Error: resources[0]: Deployment
>> to server failed: deploy_status_code : Deployment exited with non-zero
>> status code: 6
>> 2016-06-29 12:42:42 [ControllerOvercloudServicesDeployment_Step4]:
>> CREATE_FAILED Error:
>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>> Deployment to server failed: deploy_status_code: Deployment exited with
>> non-zero status code: 6
>> 2016-06-29 12:42:43
>> [overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk]: CREATE_FAILED
>> Resource CREATE failed: Error:
>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>> Deployment to server failed: deploy_status_code: Deployment exited with
>> non-zero status code: 6
>> 2016-06-29 12:42:44 [ControllerNodesPostDeployment]: CREATE_FAILED
>> Error:
>> resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>> Deployment to server failed: deploy_status_code: Deployment exited with
>> non-zero status code: 6
>> 2016-06-29 12:42:44 [2]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:45 [2]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:45 [2]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:46 [overcloud]: CREATE_FAILED Resource CREATE failed:
>> Error:
>> resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>> Deployment to server failed: deploy_status_code: Deployment exited with
>> non-zero status code: 6
>> 2016-06-29 12:42:46 [2]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:47 [2]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:47 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:48 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:48 [2]: SIGNAL_COMPLETE Unknown
>> Stack overcloud CREATE_FAILED
>> Deployment failed: Heat Stack create failed.
>> + heat stack-list
>> + grep -q CREATE_FAILED
>> + deploy_status=1
>> ++ heat resource-list --nested-depth 5 overcloud
>> ++ grep FAILED
>> ++ grep 'StructuredDeployment '
>> ++ cut -d '|' -f3
>> + for failed in '$(heat resource-list --nested-depth 5
>> overcloud | grep FAILED |
>> grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'
>> + heat deployment-show 655c77fc-6a78-4cca-b4b7-a153a3f4ad52
>> + for failed in '$(heat resource-list --nested-depth 5
>> overcloud | grep FAILED |
>> grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'
>> + heat deployment-show 1fe5153c-e017-4ee5-823a-3d1524430c1d
>> + for failed in '$(heat resource-list --nested-depth 5
>> overcloud | grep FAILED |
>> grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'
>> + heat deployment-show bf6f25f4-d812-41e9-a7a8-122de619a624
>> + exit 1
>>
>> *****************************
>> Troubleshooting steps :-
>> *****************************
>>
>> [stack at undercloud ~]$ . stackrc
>> [stack at undercloud ~]$ heat resource-list overcloud | grep
>> ControllerNodesPost
>> | ControllerNodesPostDeployment |
>> f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3 |
>> OS::TripleO::ControllerPostDeployment | CREATE_FAILED |
>> 2016-06-29T12:11:21 |
>>
>>
>> [stack at undercloud ~]$ heat stack-list -n | grep "^|
>> f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3"
>> | f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3 |
>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk
>> | CREATE_FAILED | 2016-06-29T12:31:11 | None |
>> 17f82f6e-e0ca-44c6-9058-de82c00d4f79 |
>>
>>
>>
>> [stack at undercloud ~]$ heat event-list -m
>> f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3
>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk
>>
>> +------------------------------------------------------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+---------------------+
>> | resource_name |
>> id |
>> resource_status_reason
>> | resource_status | event_time |
>> +------------------------------------------------------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+---------------------+
>> | overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk |
>> 10ec0cf9-b3c9-4191-9966-3f4d47f27e2a | Stack CREATE started
>> . . . . . . . . . . . . . . . . .
>> Step1,2,3 succeeded
>> . . . . . . . . . . . . . . . . .
>>
>> | CREATE_IN_PROGRESS | 2016-06-29T12:31:14 |
>> | ControllerPuppetConfig |
>> a2a1df33-5106-425c-b16d-8d2df709b19f | state
>> changed
>> | CREATE_COMPLETE | 2016-06-29T12:35:02 |
>> | ControllerOvercloudServicesDeployment_Step4 |
>> 1e151333-4de5-4e7b-907c-ea0f42d31a47 | state
>> changed
>> | CREATE_IN_PROGRESS | 2016-06-29T12:35:03 |
>> | ControllerOvercloudServicesDeployment_Step4 |
>> 7bf36334-3d92-4554-b6c0-41294a072ab6 | Error:
>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>> Deployment to server failed: deploy_status_code: Deployment exited with
>> non-zero status code: 6 | CREATE_FAILED |
>> 2016-06-29T12:42:42 |
>> | overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk
>> | e72fb6f4-c2aa-4fe8-9bd1-5f5ad152685c | Resource CREATE failed:
>> Error:
>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>> Deployment to server failed: deploy_status_code: Deployment exited with
>> non-zero status code: 6 | CREATE_FAILED | 2016-06-29T12:42:43 |
>> +------------------------------------------------------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+---------------------+
>>
>> [stack at undercloud ~]$ heat stack-show
>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk | grep
>> NodeConfigIdentifiers
>> | | "NodeConfigIdentifiers":
>> "{u'deployment_identifier': 1467202276, u'controller_config': {u'1':
>> u'os-apply-config deployment 796df02a-7550-414b-a084-8b591a13e6db
>> completed,Root CA cert injection not enabled.,TLS not enabled.,None,',
>> u'0': u'os-apply-config deployment 613ec889-d852-470a-8e4c-6e243e1d2033
>> completed,Root CA cert injection not enabled.,TLS not enabled.,None,',
>> u'2': u'os-apply-config deployment c8b099d0-3af4-4ba0-a056-a0ce60f40e2d
>> completed,Root CA cert injection not enabled.,TLS not enabled.,None,'},
>> u'allnodes_extra': u'none'}" |
>>
>> However, when stack creating crashed update wouldn't help.
>>
>> [stack at undercloud ~]$ heat stack-update -x
>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk -e update_env.yaml
>> ERROR: PATCH update to non-COMPLETE stack is not supported.
>>
>> DUE TO :-
>>
>> [stack at undercloud ~]$ heat stack-list
>> +--------------------------------------+------------+---------------+---------------------+--------------+
>> | id | stack_name | stack_status |
>> creation_time | updated_time |
>> +--------------------------------------+------------+---------------+---------------------+--------------+
>> | 17f82f6e-e0ca-44c6-9058-de82c00d4f79 | overcloud | CREATE_FAILED |
>> 2016-06-29T12:11:20 | None |
>> +--------------------------------------+------------+---------------+---------------------+------
>>
>>
>> Complete error file `heat deployment-show
>> 655c77fc-6a78-4cca-b4b7-a153a3f4ad52` is attached a gzip archive.
>>
>>
>> Thanks.
>>
>> Boris.
>>
>>
>>
>> _______________________________________________
>> rdo-list mailing list
>> rdo-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/rdo-list
>>
>> To unsubscribe: rdo-list-unsubscribe at redhat.com
>>
>
> The failure occurred during the post-deployment, which means that the
> initial deployment succeeded, but then the steps that are done to the
> completed overcloud failed.
>
> This is most commonly attributable to network problems between the
> Undercloud and the Overcloud Public API. The Undercloud needs to reach
> the Public API in order to do some of the post-configuration steps. If
> this API isn't reachable, you end up with the error you saw above.
>
> You can test this connectivity by pinging the Public API VIP from the
> Undercloud. Starting with the failed deployment, run "neutron
> port-list" against the Underlcloud and look for the IP on the port
> named "public_virtual_ip". You should be able to ping this address from
> the Undercloud. If you can't reach that IP, then you need to check the
> connectivity/routing between the Undercloud and the External network on
> the Overcloud.
>
I should also mention common causes of this problem:
* Incorrect value for ExternalInterfaceDefaultRoute in the network
environment file.
* Controllers do not have the default route on the External network in
the NIC config templates (required for reachability from remote subnets).
* Incorrect subnet mask on the ExternalNetCidr in the network environment.
* Incorrect ExternalAllocationPools values in the network environment.
* Incorrect Ethernet switch config for the Controllers.
Issue has been reproduced with exactly same error 4 times
starting since 06/25/16 on daily basis with exactly same error at Step4
of overcloud-ControllerNodesPostDeployment.
In meantime I cannot reproduce the error.
Config 3xNode HA Controller + 1xCompute works .
There was one more issue 3xNode HA Controller + 2xCompute
failed immediately when overcloud-deploy.sh started due to
only 4 nodes could be introspected. I will test it tomorrow morning.
Thanks a lot.
Boris.
--
Dan Sneddon | Principal OpenStack Engineer
dsneddon at redhat.com | redhat.com/openstack
650.254.4025 | dsneddon:irc @dxs:twitter
_______________________________________________
rdo-list mailing list
rdo-list at redhat.com
https://www.redhat.com/mailman/listinfo/rdo-list
To unsubscribe: rdo-list-unsubscribe at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20160630/0865f64e/attachment.html>
More information about the dev
mailing list