On 06/30/2016 05:19 AM, Boris Derzhavets wrote:
>
>
>
> ________________________________
> From: rdo-list-bounces@redhat.com <rdo-list-bounces@redhat.com> on behalf of Boris Derzhavets <bderzhavets@hotmail.com>
> Sent: Wednesday, June 29, 2016 5:14 PM
> To: Dan Sneddon; rdo-list@redhat.com
> Subject: Re: [rdo-list] HA overcloud-deploy.sh crashes again ( ControllerOvercloudServicesDeployment_Step4 )
>
> Yes , attempt to deploy
>
> ########################
> # HA +2xCompute
> ########################
> control_memory: 6144
> compute_memory: 6144
>
> undercloud_memory: 8192
>
> # Giving the undercloud additional CPUs can greatly improve heat's
> # performance (and result in a shorter deploy time).
> undercloud_vcpu: 4
Increasing this without also increasing the memory on the undercloud
will usually end in sadness, because more CPUs means more worker
processes means more memory consumption. In general straying from the
values in CI, is unlikely to work unless you have significantly better
hardware than what runs in CI (32G hosts with decent CPU).
It will be verified tomorrow with
undercloud_vcpu: 2
This test would be a fair . It will take about 2 hr.
But, I still believe that it is not root cause of issue with
Configuration - 3xController(HA) + 2xCompute having :-
undercloud_memory: 8192
undercloud_vcpu: 4
which was tested many times OK since 06/05 up to 06/24
with no problems.
Thank you very much for feedback
Boris.
https://github.com/openstack/tripleo-quickstart/blob/master/config/general_config/ha.yml#L13
It is not 100% that is the root cause of your issue, as the logs below
look like we hit issues either with Ironic deployment to the nodes, or
some issue with Nova scheduler. Note, that is definitely a different
problem (and possibly transient), than the one reported in the beginning
of this thread.
>
> # Create three controller nodes and one compute node.
> overcloud_nodes:
> - name: control_0
> flavor: control
> - name: control_1
> flavor: control
> - name: control_2
> flavor: control
>
> - name: compute_0
> flavor: compute
> - name: compute_1
> flavor: compute
>
> # We don't need introspection in a virtual environment (because we are
> # creating all the "hardware" we really know the necessary
> # information).
> introspect: false
>
> # Tell tripleo about our environment.
> network_isolation: true
> extra_args: >-
> --control-scale 3 --compute-scale 2 --neutron-network-type vxlan
> --neutron-tunnel-types vxlan
> -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml
> --ntp-server pool.ntp.org
> deploy_timeout: 75
> tempest: false
> pingtest: true
>
> Results during overcloud deployment :-
>
> 2016-06-30 09:09:31 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
> 2016-06-30 09:09:31 [NovaCompute]: DELETE_IN_PROGRESS state changed
> 2016-06-30 09:09:34 [NovaCompute]: DELETE_COMPLETE state changed
> 2016-06-30 09:09:44 [NovaCompute]: CREATE_IN_PROGRESS state changed
> 2016-06-30 09:09:48 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
> . . . . .
>
> 2016-06-30 09:11:36 [overcloud]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources.Compute.resources[0].resources.NovaCompute: Went to status ERROR due to "Message: Build of instance bf483c34-7010-48ea-8f58-fe192c91093f aborted: Failed to provision
instance bf483c34-7010-48ea-8f58-fe192
> 2016-06-30 09:11:36 [1]: SIGNAL_COMPLETE Unknown
> 2016-06-30 09:11:36 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
> 2016-06-30 09:11:36 [1]: CREATE_COMPLETE state changed
> 2016-06-30 09:11:36 [overcloud-ControllerCephDeployment-62xh7uhtpjqp]: CREATE_COMPLETE Stack CREATE completed successfully
> 2016-06-30 09:11:37 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
> 2016-06-30 09:11:37 [1]: SIGNAL_COMPLETE Unknown
> Stack overcloud CREATE_FAILED
> Deployment failed: Heat Stack create failed.
> + heat stack-list
> + grep -q CREATE_FAILED
> + deploy_status=1
> ++ heat resource-list --nested-depth 5 overcloud
> ++ grep FAILED
> ++ grep 'StructuredDeployment '
> ++ cut -d '|' -f3
> + exit 1
>
>
> Thanks.
>
> Boris
>
>
> ________________________________
> From: rdo-list-bounces@redhat.com <rdo-list-bounces@redhat.com> on behalf of Dan Sneddon <dsneddon@redhat.com>
> Sent: Wednesday, June 29, 2016 1:46 PM
> To: rdo-list@redhat.com
> Subject: Re: [rdo-list] HA overcloud-deploy.sh crashes again ( ControllerOvercloudServicesDeployment_Step4 )
>
> On 06/29/2016 10:42 AM, Dan Sneddon wrote:
>> On 06/29/2016 07:03 AM, Boris Derzhavets wrote:
>>> Boris Derzhavets has shared a OneDrive file with you. To view it, click
>>> the link below.
>>>
>>> <
https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk>
> [
https://p.sfx.ms/icons/v2/Large/Default.png]<https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk>
>
> HeatCrash2.txt 1.gz<
https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk>
> 1drv.ms
> GZ File
>
>
>>>
>>> HeatCrash2.txt 1.gz <
https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk>
>>> [HeatCrash2.txt 1.gz]
>>>
>>> Reattach gzip archive via One Drive
>>>
>>>
>>>
>>> -----------------------------------------------------------------------
>>> *From:* rdo-list-bounces@redhat.com <rdo-list-bounces@redhat.com> on
>>> behalf of Boris Derzhavets <bderzhavets@hotmail.com>
>>> *Sent:* Wednesday, June 29, 2016 9:36 AM
>>> *To:* John Trowbridge; shardy@redhat.com
>>> *Cc:* rdo-list@redhat.com
>>> *Subject:* [rdo-list] HA overcloud-deploy.sh crashes again (
>>> ControllerOvercloudServicesDeployment_Step4 )
>>>
>>>
>>> Attempt to follow steps suggested
>>> in
http://hardysteven.blogspot.ru/2016/06/tripleo-partial-stack-updates.html
>>>
>>>
>>> ./deploy-overstack crashes
>>>
>>>
>>> 2016-06-29 12:42:41
>>> [overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk-ControllerOvercloudServicesDeployment_Step4-nzdoizlgrmx2]:
>>> CREATE_FAILED Resource CREATE failed: Error: resources[0]: Deployment
>>> to server failed: deploy_status_code : Deployment exited with non-zero
>>> status code: 6
>>> 2016-06-29 12:42:42 [ControllerOvercloudServicesDeployment_Step4]:
>>> CREATE_FAILED Error:
>>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>>> Deployment to server failed: deploy_status_code: Deployment exited with
>>> non-zero status code: 6
>>> 2016-06-29 12:42:43
>>> [overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk]: CREATE_FAILED
>>> Resource CREATE failed: Error:
>>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>>> Deployment to server failed: deploy_status_code: Deployment exited with
>>> non-zero status code: 6
>>> 2016-06-29 12:42:44 [ControllerNodesPostDeployment]: CREATE_FAILED
>>> Error:
>>> resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>>> Deployment to server failed: deploy_status_code: Deployment exited with
>>> non-zero status code: 6
>>> 2016-06-29 12:42:44 [2]: SIGNAL_COMPLETE Unknown
>>> 2016-06-29 12:42:45 [2]: SIGNAL_COMPLETE Unknown
>>> 2016-06-29 12:42:45 [2]: SIGNAL_COMPLETE Unknown
>>> 2016-06-29 12:42:46 [overcloud]: CREATE_FAILED Resource CREATE failed:
>>> Error:
>>> resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>>> Deployment to server failed: deploy_status_code: Deployment exited with
>>> non-zero status code: 6
>>> 2016-06-29 12:42:46 [2]: SIGNAL_COMPLETE Unknown
>>> 2016-06-29 12:42:47 [2]: SIGNAL_COMPLETE Unknown
>>> 2016-06-29 12:42:47 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
>>> 2016-06-29 12:42:48 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
>>> 2016-06-29 12:42:48 [2]: SIGNAL_COMPLETE Unknown
>>> Stack overcloud CREATE_FAILED
>>> Deployment failed: Heat Stack create failed.
>>> + heat stack-list
>>> + grep -q CREATE_FAILED
>>> + deploy_status=1
>>> ++ heat resource-list --nested-depth 5 overcloud
>>> ++ grep FAILED
>>> ++ grep 'StructuredDeployment '
>>> ++ cut -d '|' -f3
>>> + for failed in '$(heat resource-list --nested-depth 5
>>> overcloud | grep FAILED |
>>> grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'
>>> + heat deployment-show 655c77fc-6a78-4cca-b4b7-a153a3f4ad52
>>> + for failed in '$(heat resource-list --nested-depth 5
>>> overcloud | grep FAILED |
>>> grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'
>>> + heat deployment-show 1fe5153c-e017-4ee5-823a-3d1524430c1d
>>> + for failed in '$(heat resource-list --nested-depth 5
>>> overcloud | grep FAILED |
>>> grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'
>>> + heat deployment-show bf6f25f4-d812-41e9-a7a8-122de619a624
>>> + exit 1
>>>
>>> *****************************
>>> Troubleshooting steps :-
>>> *****************************
>>>
>>> [stack@undercloud ~]$ . stackrc
>>> [stack@undercloud ~]$ heat resource-list overcloud | grep
>>> ControllerNodesPost
>>> | ControllerNodesPostDeployment |
>>> f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3 |
>>> OS::TripleO::ControllerPostDeployment | CREATE_FAILED |
>>> 2016-06-29T12:11:21 |
>>>
>>>
>>> [stack@undercloud ~]$ heat stack-list -n | grep "^|
>>> f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3"
>>> | f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3 |
>>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk
>>> | CREATE_FAILED | 2016-06-29T12:31:11 | None |
>>> 17f82f6e-e0ca-44c6-9058-de82c00d4f79 |
>>>
>>>
>>>
>>> [stack@undercloud ~]$ heat event-list -m
>>> f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3
>>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk
>>>
>>> +------------------------------------------------------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+---------------------+
>>> | resource_name |
>>> id |
>>> resource_status_reason
>>> | resource_status | event_time |
>>> +------------------------------------------------------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+---------------------+
>>> | overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk |
>>> 10ec0cf9-b3c9-4191-9966-3f4d47f27e2a | Stack CREATE started
>>> . . . . . . . . . . . . . . . . .
>>> Step1,2,3 succeeded
>>> . . . . . . . . . . . . . . . . .
>>>
>>> | CREATE_IN_PROGRESS | 2016-06-29T12:31:14 |
>>> | ControllerPuppetConfig |
>>> a2a1df33-5106-425c-b16d-8d2df709b19f | state
>>> changed
>>> | CREATE_COMPLETE | 2016-06-29T12:35:02 |
>>> | ControllerOvercloudServicesDeployment_Step4 |
>>> 1e151333-4de5-4e7b-907c-ea0f42d31a47 | state
>>> changed
>>> | CREATE_IN_PROGRESS | 2016-06-29T12:35:03 |
>>> | ControllerOvercloudServicesDeployment_Step4 |
>>> 7bf36334-3d92-4554-b6c0-41294a072ab6 | Error:
>>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>>> Deployment to server failed: deploy_status_code: Deployment exited with
>>> non-zero status code: 6 | CREATE_FAILED |
>>> 2016-06-29T12:42:42 |
>>> | overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk
>>> | e72fb6f4-c2aa-4fe8-9bd1-5f5ad152685c | Resource CREATE failed:
>>> Error:
>>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:
>>> Deployment to server failed: deploy_status_code: Deployment exited with
>>> non-zero status code: 6 | CREATE_FAILED | 2016-06-29T12:42:43 |
>>> +------------------------------------------------------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+---------------------+
>>>
>>> [stack@undercloud ~]$ heat stack-show
>>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk | grep
>>> NodeConfigIdentifiers
>>> | | "NodeConfigIdentifiers":
>>> "{u'deployment_identifier': 1467202276, u'controller_config': {u'1':
>>> u'os-apply-config deployment 796df02a-7550-414b-a084-8b591a13e6db
>>> completed,Root CA cert injection not enabled.,TLS not enabled.,None,',
>>> u'0': u'os-apply-config deployment 613ec889-d852-470a-8e4c-6e243e1d2033
>>> completed,Root CA cert injection not enabled.,TLS not enabled.,None,',
>>> u'2': u'os-apply-config deployment c8b099d0-3af4-4ba0-a056-a0ce60f40e2d
>>> completed,Root CA cert injection not enabled.,TLS not enabled.,None,'},
>>> u'allnodes_extra': u'none'}" |
>>>
>>> However, when stack creating crashed update wouldn't help.
>>>
>>> [stack@undercloud ~]$ heat stack-update -x
>>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk -e update_env.yaml
>>> ERROR: PATCH update to non-COMPLETE stack is not supported.
>>>
>>> DUE TO :-
>>>
>>> [stack@undercloud ~]$ heat stack-list
>>> +--------------------------------------+------------+---------------+---------------------+--------------+
>>> | id | stack_name | stack_status |
>>> creation_time | updated_time |
>>> +--------------------------------------+------------+---------------+---------------------+--------------+
>>> | 17f82f6e-e0ca-44c6-9058-de82c00d4f79 | overcloud | CREATE_FAILED |
>>> 2016-06-29T12:11:20 | None |
>>> +--------------------------------------+------------+---------------+---------------------+------
>>>
>>>
>>> Complete error file `heat deployment-show
>>> 655c77fc-6a78-4cca-b4b7-a153a3f4ad52` is attached a gzip archive.
>>>
>>>
>>> Thanks.
>>>
>>> Boris.
>>>
>>>
>>>
>>> _______________________________________________
>>> rdo-list mailing list
>>> rdo-list@redhat.com
>>>
https://www.redhat.com/mailman/listinfo/rdo-list
>>>
>>> To unsubscribe: rdo-list-unsubscribe@redhat.com
>>>
>>
>> The failure occurred during the post-deployment, which means that the
>> initial deployment succeeded, but then the steps that are done to the
>> completed overcloud failed.
>>
>> This is most commonly attributable to network problems between the
>> Undercloud and the Overcloud Public API. The Undercloud needs to reach
>> the Public API in order to do some of the post-configuration steps. If
>> this API isn't reachable, you end up with the error you saw above.
>>
>> You can test this connectivity by pinging the Public API VIP from the
>> Undercloud. Starting with the failed deployment, run "neutron
>> port-list" against the Underlcloud and look for the IP on the port
>> named "public_virtual_ip". You should be able to ping this address from
>> the Undercloud. If you can't reach that IP, then you need to check the
>> connectivity/routing between the Undercloud and the External network on
>> the Overcloud.
>>
>
> I should also mention common causes of this problem:
>
> * Incorrect value for ExternalInterfaceDefaultRoute in the network
> environment file.
> * Controllers do not have the default route on the External network in
> the NIC config templates (required for reachability from remote subnets).
> * Incorrect subnet mask on the ExternalNetCidr in the network environment.
> * Incorrect ExternalAllocationPools values in the network environment.
> * Incorrect Ethernet switch config for the Controllers.
>
> Issue has been reproduced with exactly same error 4 times
> starting since 06/25/16 on daily basis with exactly same error at Step4
> of overcloud-ControllerNodesPostDeployment.
> In meantime I cannot reproduce the error.
> Config 3xNode HA Controller + 1xCompute works .
> There was one more issue 3xNode HA Controller + 2xCompute
> failed immediately when overcloud-deploy.sh started due to
> only 4 nodes could be introspected. I will test it tomorrow morning.
>
> Thanks a lot.
> Boris.
>
> --
> Dan Sneddon | Principal OpenStack Engineer
> dsneddon@redhat.com | redhat.com/openstack
> 650.254.4025 | dsneddon:irc @dxs:twitter
>
> _______________________________________________
> rdo-list mailing list
> rdo-list@redhat.com
>
https://www.redhat.com/mailman/listinfo/rdo-list
>
> To unsubscribe: rdo-list-unsubscribe@redhat.com
>
>
>
> This body part will be downloaded on demand.
>