[rdo-list] HA overcloud-deploy.sh crashes again ( ControllerOvercloudServicesDeployment_Step4 )

Boris Derzhavets bderzhavets at hotmail.com
Thu Jun 30 09:19:17 UTC 2016




________________________________
From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Boris Derzhavets <bderzhavets at hotmail.com>
Sent: Wednesday, June 29, 2016 5:14 PM
To: Dan Sneddon; rdo-list at redhat.com
Subject: Re: [rdo-list] HA overcloud-deploy.sh crashes again ( ControllerOvercloudServicesDeployment_Step4 )

 Yes , attempt to deploy

########################
#  HA +2xCompute
########################
control_memory: 6144
compute_memory: 6144

undercloud_memory: 8192

# Giving the undercloud additional CPUs can greatly improve heat's
# performance (and result in a shorter deploy time).
undercloud_vcpu: 4

# Create three controller nodes and one compute node.
overcloud_nodes:
  - name: control_0
    flavor: control
  - name: control_1
    flavor: control
  - name: control_2
    flavor: control

  - name: compute_0
    flavor: compute
  - name: compute_1
    flavor: compute

# We don't need introspection in a virtual environment (because we are
# creating all the "hardware" we really know the necessary
# information).
introspect: false

# Tell tripleo about our environment.
network_isolation: true
extra_args: >-
  --control-scale 3 --compute-scale 2 --neutron-network-type vxlan
  --neutron-tunnel-types vxlan
  -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml
  --ntp-server pool.ntp.org
deploy_timeout: 75
tempest: false
pingtest: true

Results during overcloud deployment :-

2016-06-30 09:09:31 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-06-30 09:09:31 [NovaCompute]: DELETE_IN_PROGRESS state changed
2016-06-30 09:09:34 [NovaCompute]: DELETE_COMPLETE state changed
2016-06-30 09:09:44 [NovaCompute]: CREATE_IN_PROGRESS state changed
2016-06-30 09:09:48 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
. . . . .

2016-06-30 09:11:36 [overcloud]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources.Compute.resources[0].resources.NovaCompute: Went to status ERROR due to "Message: Build of instance bf483c34-7010-48ea-8f58-fe192c91093f aborted: Failed to provision instance bf483c34-7010-48ea-8f58-fe192
2016-06-30 09:11:36 [1]: SIGNAL_COMPLETE Unknown
2016-06-30 09:11:36 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
2016-06-30 09:11:36 [1]: CREATE_COMPLETE state changed
2016-06-30 09:11:36 [overcloud-ControllerCephDeployment-62xh7uhtpjqp]: CREATE_COMPLETE Stack CREATE completed successfully
2016-06-30 09:11:37 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
2016-06-30 09:11:37 [1]: SIGNAL_COMPLETE Unknown
Stack overcloud CREATE_FAILED
Deployment failed:  Heat Stack create failed.
+ heat stack-list
+ grep -q CREATE_FAILED
+ deploy_status=1
++ heat resource-list --nested-depth 5 overcloud
++ grep FAILED
++ grep 'StructuredDeployment '
++ cut -d '|' -f3
+ exit 1


Thanks.

Boris


________________________________
From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Dan Sneddon <dsneddon at redhat.com>
Sent: Wednesday, June 29, 2016 1:46 PM
To: rdo-list at redhat.com
Subject: Re: [rdo-list] HA overcloud-deploy.sh crashes again ( ControllerOvercloudServicesDeployment_Step4 )

On 06/29/2016 10:42 AM, Dan Sneddon wrote:
> On 06/29/2016 07:03 AM, Boris Derzhavets wrote:
>> Boris Derzhavets has shared a OneDrive file with you. To view it, click
>> the link below.
>>
>> <https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk>
[https://p.sfx.ms/icons/v2/Large/Default.png]<https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk>

HeatCrash2.txt 1.gz<https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk>
1drv.ms
GZ File


>>
>> HeatCrash2.txt 1.gz <https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk>
>>       [HeatCrash2.txt 1.gz]
>>
>> Reattach gzip archive via One Drive
>>
>>
>>
>> -----------------------------------------------------------------------
>> *From:* rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on
>> behalf of Boris Derzhavets <bderzhavets at hotmail.com>
>> *Sent:* Wednesday, June 29, 2016 9:36 AM
>> *To:* John Trowbridge; shardy at redhat.com
>> *Cc:* rdo-list at redhat.com
>> *Subject:* [rdo-list] HA overcloud-deploy.sh crashes again (
>> ControllerOvercloudServicesDeployment_Step4 )
>>
>>
>> Attempt to follow steps suggested
>> in http://hardysteven.blogspot.ru/2016/06/tripleo-partial-stack-updates.html
>>
>>
>> ./deploy-overstack crashes
>>
>>
>> 2016-06-29 12:42:41
>> [overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk-ControllerOvercloudServicesDeployment_Step4-nzdoizlgrmx2]:
>> CREATE_FAILED Resource CREATE failed: Error: resources[0]: Deployment
>> to server failed: deploy_status_code : Deployment exited with non-zero
>> status code: 6
>> 2016-06-29 12:42:42 [ControllerOvercloudServicesDeployment_Step4]:
>> CREATE_FAILED Error:
>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>> Deployment to server failed: deploy_status_code: Deployment exited with
>> non-zero status code: 6
>> 2016-06-29 12:42:43
>> [overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk]: CREATE_FAILED
>> Resource CREATE failed: Error:
>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>> Deployment to server failed: deploy_status_code: Deployment exited with
>> non-zero status code: 6
>> 2016-06-29 12:42:44 [ControllerNodesPostDeployment]: CREATE_FAILED
>> Error:
>> resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>> Deployment to server failed: deploy_status_code: Deployment exited with
>> non-zero status code: 6
>> 2016-06-29 12:42:44 [2]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:45 [2]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:45 [2]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:46 [overcloud]: CREATE_FAILED Resource CREATE failed:
>> Error:
>> resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>> Deployment to server failed: deploy_status_code: Deployment exited with
>> non-zero status code: 6
>> 2016-06-29 12:42:46 [2]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:47 [2]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:47 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:48 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
>> 2016-06-29 12:42:48 [2]: SIGNAL_COMPLETE Unknown
>> Stack overcloud CREATE_FAILED
>> Deployment failed:  Heat Stack create failed.
>> + heat stack-list
>> + grep -q CREATE_FAILED
>> + deploy_status=1
>> ++ heat resource-list --nested-depth 5 overcloud
>> ++ grep FAILED
>> ++ grep 'StructuredDeployment '
>> ++ cut -d '|' -f3
>> + for failed in '$(heat resource-list         --nested-depth 5
>> overcloud | grep FAILED |
>>         grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'
>> + heat deployment-show 655c77fc-6a78-4cca-b4b7-a153a3f4ad52
>> + for failed in '$(heat resource-list         --nested-depth 5
>> overcloud | grep FAILED |
>>         grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'
>> + heat deployment-show 1fe5153c-e017-4ee5-823a-3d1524430c1d
>> + for failed in '$(heat resource-list         --nested-depth 5
>> overcloud | grep FAILED |
>>         grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'
>> + heat deployment-show bf6f25f4-d812-41e9-a7a8-122de619a624
>> + exit 1
>>
>> *****************************
>> Troubleshooting steps :-
>> *****************************
>>
>> [stack at undercloud ~]$ . stackrc
>> [stack at undercloud ~]$  heat resource-list overcloud | grep
>> ControllerNodesPost
>> | ControllerNodesPostDeployment             |
>> f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3          |
>> OS::TripleO::ControllerPostDeployment             | CREATE_FAILED   |
>> 2016-06-29T12:11:21 |
>>
>>
>> [stack at undercloud ~]$ heat stack-list -n | grep "^|
>> f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3"
>> | f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3 |
>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk
>> | CREATE_FAILED   | 2016-06-29T12:31:11 | None         |
>> 17f82f6e-e0ca-44c6-9058-de82c00d4f79 |
>>
>>
>>
>> [stack at undercloud ~]$ heat event-list -m
>> f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3
>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk
>>
>> +------------------------------------------------------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+---------------------+
>> | resource_name                                        |
>> id                                   |
>> resource_status_reason
>> | resource_status    | event_time          |
>> +------------------------------------------------------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+---------------------+
>> | overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk |
>> 10ec0cf9-b3c9-4191-9966-3f4d47f27e2a | Stack CREATE started
>> . . . . . . . . . . . . . . . . .
>> Step1,2,3 succeeded
>> . . . . . . . . . . . . . . . . .
>>
>> | CREATE_IN_PROGRESS | 2016-06-29T12:31:14 |
>> | ControllerPuppetConfig                               |
>> a2a1df33-5106-425c-b16d-8d2df709b19f | state
>> changed
>> | CREATE_COMPLETE    | 2016-06-29T12:35:02 |
>> | ControllerOvercloudServicesDeployment_Step4          |
>> 1e151333-4de5-4e7b-907c-ea0f42d31a47 | state
>> changed
>> | CREATE_IN_PROGRESS | 2016-06-29T12:35:03 |
>> | ControllerOvercloudServicesDeployment_Step4          |
>> 7bf36334-3d92-4554-b6c0-41294a072ab6 | Error:
>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>> Deployment to server failed: deploy_status_code: Deployment exited with
>> non-zero status code: 6                         | CREATE_FAILED      |
>> 2016-06-29T12:42:42 |
>> | overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk
>>  | e72fb6f4-c2aa-4fe8-9bd1-5f5ad152685c | Resource CREATE failed:
>> Error:
>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>> Deployment to server failed: deploy_status_code: Deployment exited with
>> non-zero status code: 6 | CREATE_FAILED      | 2016-06-29T12:42:43 |
>> +------------------------------------------------------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+---------------------+
>>
>> [stack at undercloud ~]$ heat stack-show
>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk | grep
>> NodeConfigIdentifiers
>> |                       |   "NodeConfigIdentifiers":
>> "{u'deployment_identifier': 1467202276, u'controller_config': {u'1':
>> u'os-apply-config deployment 796df02a-7550-414b-a084-8b591a13e6db
>> completed,Root CA cert injection not enabled.,TLS not enabled.,None,',
>> u'0': u'os-apply-config deployment 613ec889-d852-470a-8e4c-6e243e1d2033
>> completed,Root CA cert injection not enabled.,TLS not enabled.,None,',
>> u'2': u'os-apply-config deployment c8b099d0-3af4-4ba0-a056-a0ce60f40e2d
>> completed,Root CA cert injection not enabled.,TLS not enabled.,None,'},
>> u'allnodes_extra': u'none'}" |
>>
>> However, when stack creating crashed update wouldn't help.
>>
>> [stack at undercloud ~]$ heat stack-update -x
>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk   -e update_env.yaml
>> ERROR: PATCH update to non-COMPLETE stack is not supported.
>>
>> DUE TO :-
>>
>> [stack at undercloud ~]$ heat stack-list
>> +--------------------------------------+------------+---------------+---------------------+--------------+
>> | id                                   | stack_name | stack_status  |
>> creation_time       | updated_time |
>> +--------------------------------------+------------+---------------+---------------------+--------------+
>> | 17f82f6e-e0ca-44c6-9058-de82c00d4f79 | overcloud  | CREATE_FAILED |
>> 2016-06-29T12:11:20 | None         |
>> +--------------------------------------+------------+---------------+---------------------+------
>>
>>
>> Complete error file `heat deployment-show
>> 655c77fc-6a78-4cca-b4b7-a153a3f4ad52` is  attached a gzip archive.
>>
>>
>> Thanks.
>>
>> Boris.
>>
>>
>>
>> _______________________________________________
>> rdo-list mailing list
>> rdo-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/rdo-list
>>
>> To unsubscribe: rdo-list-unsubscribe at redhat.com
>>
>
> The failure occurred during the post-deployment, which means that the
> initial deployment succeeded, but then the steps that are done to the
> completed overcloud failed.
>
> This is most commonly attributable to network problems between the
> Undercloud and the Overcloud Public API. The Undercloud needs to reach
> the Public API in order to do some of the post-configuration steps. If
> this API isn't reachable, you end up with the error you saw above.
>
> You can test this connectivity by pinging the Public API VIP from the
> Undercloud. Starting with the failed deployment, run "neutron
> port-list" against the Underlcloud and look for the IP on the port
> named "public_virtual_ip". You should be able to ping this address from
> the Undercloud. If you can't reach that IP, then you need to check the
> connectivity/routing between the Undercloud and the External network on
> the Overcloud.
>

I should also mention common causes of this problem:

* Incorrect value for ExternalInterfaceDefaultRoute in the network
environment file.
* Controllers do not have the default route on the External network in
the NIC config templates (required for reachability from remote subnets).
* Incorrect subnet mask on the ExternalNetCidr in the network environment.
* Incorrect ExternalAllocationPools values in the network environment.
* Incorrect Ethernet switch config for the Controllers.

        Issue has been reproduced with exactly same error 4 times
        starting since 06/25/16 on daily basis with exactly same error at Step4
        of overcloud-ControllerNodesPostDeployment.
        In meantime I cannot reproduce the error.
        Config 3xNode HA Controller + 1xCompute  works .
        There was one more issue  3xNode HA Controller + 2xCompute
        failed   immediately when overcloud-deploy.sh started due to
        only 4 nodes could be introspected. I will test it tomorrow morning.

        Thanks a lot.
        Boris.

--
Dan Sneddon         |  Principal OpenStack Engineer
dsneddon at redhat.com |  redhat.com/openstack
650.254.4025        |  dsneddon:irc   @dxs:twitter

_______________________________________________
rdo-list mailing list
rdo-list at redhat.com
https://www.redhat.com/mailman/listinfo/rdo-list

To unsubscribe: rdo-list-unsubscribe at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20160630/0865f64e/attachment.html>


More information about the dev mailing list