<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;">
<p><br>
</p>
<br>
<br>
<div style="color: rgb(49, 55, 57);">
<div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> John Trowbridge <trown@redhat.com><br>
<b>Sent:</b> Thursday, June 30, 2016 10:14 AM<br>
<b>To:</b> Boris Derzhavets; Dan Sneddon; rdo-list@redhat.com<br>
<b>Subject:</b> Re: [rdo-list] HA overcloud-deploy.sh crashes again ( ControllerOvercloudServicesDeployment_Step4 )</font>
<div> </div>
</div>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText"><br>
<br>
On 06/30/2016 05:19 AM, Boris Derzhavets wrote:<br>
> <br>
> <br>
> <br>
> ________________________________<br>
> From: rdo-list-bounces@redhat.com <rdo-list-bounces@redhat.com> on behalf of Boris Derzhavets <bderzhavets@hotmail.com><br>
> Sent: Wednesday, June 29, 2016 5:14 PM<br>
> To: Dan Sneddon; rdo-list@redhat.com<br>
> Subject: Re: [rdo-list] HA overcloud-deploy.sh crashes again ( ControllerOvercloudServicesDeployment_Step4 )<br>
> <br>
> Yes , attempt to deploy<br>
> <br>
> ########################<br>
> # HA +2xCompute<br>
> ########################<br>
> control_memory: 6144<br>
> compute_memory: 6144<br>
> <br>
> undercloud_memory: 8192<br>
> <br>
> # Giving the undercloud additional CPUs can greatly improve heat's<br>
> # performance (and result in a shorter deploy time).<br>
> undercloud_vcpu: 4<br>
<br>
Increasing this without also increasing the memory on the undercloud<br>
will usually end in sadness, because more CPUs means more worker<br>
processes means more memory consumption. In general straying from the<br>
values in CI, is unlikely to work unless you have significantly better<br>
hardware than what runs in CI (32G hosts with decent CPU).<br>
<br>
It will be verified tomorrow with<br>
<b></b><font size="2"><span style="font-size:10pt;"><b>undercloud_vcpu: 2</b></span></font><br>
This test would be a fair . It will take about 2 hr.<br>
But, I still believe that it is not root cause of issue with <br>
Configuration - 3xController(HA) + 2xCompute having :-<br>
<font size="2"><span style="font-size:10pt;"><b>undercloud_memory: 8192</b></span></font><b><br>
</b><font size="2"><span style="font-size:10pt;"><font size="2"><span style="font-size:10pt;"><b>undercloud_vcpu: 4</b></span></font></span></font><br>
which was tested many times OK since 06/05 up to 06/24 <br>
with no problems.<br>
<br>
Thank you very much for feedback<br>
Boris.<br>
<br>
<a id="LPlnk885822" href="https://github.com/openstack/tripleo-quickstart/blob/master/config/general_config/ha.yml#L13">https://github.com/openstack/tripleo-quickstart/blob/master/config/general_config/ha.yml#L13</a><br>
<br>
It is not 100% that is the root cause of your issue, as the logs below<br>
look like we hit issues either with Ironic deployment to the nodes, or<br>
some issue with Nova scheduler. Note, that is definitely a different<br>
problem (and possibly transient), than the one reported in the beginning<br>
of this thread.<br>
<br>
> <br>
> # Create three controller nodes and one compute node.<br>
> overcloud_nodes:<br>
> - name: control_0<br>
> flavor: control<br>
> - name: control_1<br>
> flavor: control<br>
> - name: control_2<br>
> flavor: control<br>
> <br>
> - name: compute_0<br>
> flavor: compute<br>
> - name: compute_1<br>
> flavor: compute<br>
> <br>
> # We don't need introspection in a virtual environment (because we are<br>
> # creating all the "hardware" we really know the necessary<br>
> # information).<br>
> introspect: false<br>
> <br>
> # Tell tripleo about our environment.<br>
> network_isolation: true<br>
> extra_args: >-<br>
> --control-scale 3 --compute-scale 2 --neutron-network-type vxlan<br>
> --neutron-tunnel-types vxlan<br>
> -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml<br>
> --ntp-server pool.ntp.org<br>
> deploy_timeout: 75<br>
> tempest: false<br>
> pingtest: true<br>
> <br>
> Results during overcloud deployment :-<br>
> <br>
> 2016-06-30 09:09:31 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"<br>
> 2016-06-30 09:09:31 [NovaCompute]: DELETE_IN_PROGRESS state changed<br>
> 2016-06-30 09:09:34 [NovaCompute]: DELETE_COMPLETE state changed<br>
> 2016-06-30 09:09:44 [NovaCompute]: CREATE_IN_PROGRESS state changed<br>
> 2016-06-30 09:09:48 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"<br>
> . . . . .<br>
> <br>
> 2016-06-30 09:11:36 [overcloud]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources.Compute.resources[0].resources.NovaCompute: Went to status ERROR due to "Message: Build of instance bf483c34-7010-48ea-8f58-fe192c91093f aborted: Failed to provision
instance bf483c34-7010-48ea-8f58-fe192<br>
> 2016-06-30 09:11:36 [1]: SIGNAL_COMPLETE Unknown<br>
> 2016-06-30 09:11:36 [ControllerDeployment]: SIGNAL_COMPLETE Unknown<br>
> 2016-06-30 09:11:36 [1]: CREATE_COMPLETE state changed<br>
> 2016-06-30 09:11:36 [overcloud-ControllerCephDeployment-62xh7uhtpjqp]: CREATE_COMPLETE Stack CREATE completed successfully<br>
> 2016-06-30 09:11:37 [NetworkDeployment]: SIGNAL_COMPLETE Unknown<br>
> 2016-06-30 09:11:37 [1]: SIGNAL_COMPLETE Unknown<br>
> Stack overcloud CREATE_FAILED<br>
> Deployment failed: Heat Stack create failed.<br>
> + heat stack-list<br>
> + grep -q CREATE_FAILED<br>
> + deploy_status=1<br>
> ++ heat resource-list --nested-depth 5 overcloud<br>
> ++ grep FAILED<br>
> ++ grep 'StructuredDeployment '<br>
> ++ cut -d '|' -f3<br>
> + exit 1<br>
> <br>
> <br>
> Thanks.<br>
> <br>
> Boris<br>
> <br>
> <br>
> ________________________________<br>
> From: rdo-list-bounces@redhat.com <rdo-list-bounces@redhat.com> on behalf of Dan Sneddon <dsneddon@redhat.com><br>
> Sent: Wednesday, June 29, 2016 1:46 PM<br>
> To: rdo-list@redhat.com<br>
> Subject: Re: [rdo-list] HA overcloud-deploy.sh crashes again ( ControllerOvercloudServicesDeployment_Step4 )<br>
> <br>
> On 06/29/2016 10:42 AM, Dan Sneddon wrote:<br>
>> On 06/29/2016 07:03 AM, Boris Derzhavets wrote:<br>
>>> Boris Derzhavets has shared a OneDrive file with you. To view it, click<br>
>>> the link below.<br>
>>><br>
>>> <<a href="https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk">https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk</a>><br>
> [<a href="https://p.sfx.ms/icons/v2/Large/Default.png]<https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk">https://p.sfx.ms/icons/v2/Large/Default.png]<https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk</a>><br>
> <br>
> HeatCrash2.txt 1.gz<<a href="https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk">https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk</a>><br>
> 1drv.ms<br>
> GZ File<br>
> <br>
> <br>
>>><br>
>>> HeatCrash2.txt 1.gz <<a href="https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk">https://1drv.ms/u/s!AqjiDzRpwaKogSHAekH8ZluOaclk</a>><br>
>>> [HeatCrash2.txt 1.gz]<br>
>>><br>
>>> Reattach gzip archive via One Drive<br>
>>><br>
>>><br>
>>><br>
>>> -----------------------------------------------------------------------<br>
>>> *From:* rdo-list-bounces@redhat.com <rdo-list-bounces@redhat.com> on<br>
>>> behalf of Boris Derzhavets <bderzhavets@hotmail.com><br>
>>> *Sent:* Wednesday, June 29, 2016 9:36 AM<br>
>>> *To:* John Trowbridge; shardy@redhat.com<br>
>>> *Cc:* rdo-list@redhat.com<br>
>>> *Subject:* [rdo-list] HA overcloud-deploy.sh crashes again (<br>
>>> ControllerOvercloudServicesDeployment_Step4 )<br>
>>><br>
>>><br>
>>> Attempt to follow steps suggested<br>
>>> in <a href="http://hardysteven.blogspot.ru/2016/06/tripleo-partial-stack-updates.html">
http://hardysteven.blogspot.ru/2016/06/tripleo-partial-stack-updates.html</a><br>
>>><br>
>>><br>
>>> ./deploy-overstack crashes<br>
>>><br>
>>><br>
>>> 2016-06-29 12:42:41<br>
>>> [overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk-ControllerOvercloudServicesDeployment_Step4-nzdoizlgrmx2]:<br>
>>> CREATE_FAILED Resource CREATE failed: Error: resources[0]: Deployment<br>
>>> to server failed: deploy_status_code : Deployment exited with non-zero<br>
>>> status code: 6<br>
>>> 2016-06-29 12:42:42 [ControllerOvercloudServicesDeployment_Step4]:<br>
>>> CREATE_FAILED Error:<br>
>>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:<br>
>>> Deployment to server failed: deploy_status_code: Deployment exited with<br>
>>> non-zero status code: 6<br>
>>> 2016-06-29 12:42:43<br>
>>> [overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk]: CREATE_FAILED<br>
>>> Resource CREATE failed: Error:<br>
>>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:<br>
>>> Deployment to server failed: deploy_status_code: Deployment exited with<br>
>>> non-zero status code: 6<br>
>>> 2016-06-29 12:42:44 [ControllerNodesPostDeployment]: CREATE_FAILED<br>
>>> Error:<br>
>>> resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:<br>
>>> Deployment to server failed: deploy_status_code: Deployment exited with<br>
>>> non-zero status code: 6<br>
>>> 2016-06-29 12:42:44 [2]: SIGNAL_COMPLETE Unknown<br>
>>> 2016-06-29 12:42:45 [2]: SIGNAL_COMPLETE Unknown<br>
>>> 2016-06-29 12:42:45 [2]: SIGNAL_COMPLETE Unknown<br>
>>> 2016-06-29 12:42:46 [overcloud]: CREATE_FAILED Resource CREATE failed:<br>
>>> Error:<br>
>>> resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:<br>
>>> Deployment to server failed: deploy_status_code: Deployment exited with<br>
>>> non-zero status code: 6<br>
>>> 2016-06-29 12:42:46 [2]: SIGNAL_COMPLETE Unknown<br>
>>> 2016-06-29 12:42:47 [2]: SIGNAL_COMPLETE Unknown<br>
>>> 2016-06-29 12:42:47 [ControllerDeployment]: SIGNAL_COMPLETE Unknown<br>
>>> 2016-06-29 12:42:48 [NetworkDeployment]: SIGNAL_COMPLETE Unknown<br>
>>> 2016-06-29 12:42:48 [2]: SIGNAL_COMPLETE Unknown<br>
>>> Stack overcloud CREATE_FAILED<br>
>>> Deployment failed: Heat Stack create failed.<br>
>>> + heat stack-list<br>
>>> + grep -q CREATE_FAILED<br>
>>> + deploy_status=1<br>
>>> ++ heat resource-list --nested-depth 5 overcloud<br>
>>> ++ grep FAILED<br>
>>> ++ grep 'StructuredDeployment '<br>
>>> ++ cut -d '|' -f3<br>
>>> + for failed in '$(heat resource-list --nested-depth 5<br>
>>> overcloud | grep FAILED |<br>
>>> grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'<br>
>>> + heat deployment-show 655c77fc-6a78-4cca-b4b7-a153a3f4ad52<br>
>>> + for failed in '$(heat resource-list --nested-depth 5<br>
>>> overcloud | grep FAILED |<br>
>>> grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'<br>
>>> + heat deployment-show 1fe5153c-e017-4ee5-823a-3d1524430c1d<br>
>>> + for failed in '$(heat resource-list --nested-depth 5<br>
>>> overcloud | grep FAILED |<br>
>>> grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'<br>
>>> + heat deployment-show bf6f25f4-d812-41e9-a7a8-122de619a624<br>
>>> + exit 1<br>
>>><br>
>>> *****************************<br>
>>> Troubleshooting steps :-<br>
>>> *****************************<br>
>>><br>
>>> [stack@undercloud ~]$ . stackrc<br>
>>> [stack@undercloud ~]$ heat resource-list overcloud | grep<br>
>>> ControllerNodesPost<br>
>>> | ControllerNodesPostDeployment |<br>
>>> f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3 |<br>
>>> OS::TripleO::ControllerPostDeployment | CREATE_FAILED |<br>
>>> 2016-06-29T12:11:21 |<br>
>>><br>
>>><br>
>>> [stack@undercloud ~]$ heat stack-list -n | grep "^|<br>
>>> f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3"<br>
>>> | f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3 |<br>
>>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk<br>
>>> | CREATE_FAILED | 2016-06-29T12:31:11 | None |<br>
>>> 17f82f6e-e0ca-44c6-9058-de82c00d4f79 |<br>
>>><br>
>>><br>
>>><br>
>>> [stack@undercloud ~]$ heat event-list -m<br>
>>> f1d6a474-c946-46bf-ab0c-2fdaeb55d0b3<br>
>>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk<br>
>>><br>
>>> +------------------------------------------------------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+---------------------+<br>
>>> | resource_name |<br>
>>> id |<br>
>>> resource_status_reason<br>
>>> | resource_status | event_time |<br>
>>> +------------------------------------------------------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+---------------------+<br>
>>> | overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk |<br>
>>> 10ec0cf9-b3c9-4191-9966-3f4d47f27e2a | Stack CREATE started<br>
>>> . . . . . . . . . . . . . . . . .<br>
>>> Step1,2,3 succeeded<br>
>>> . . . . . . . . . . . . . . . . .<br>
>>><br>
>>> | CREATE_IN_PROGRESS | 2016-06-29T12:31:14 |<br>
>>> | ControllerPuppetConfig |<br>
>>> a2a1df33-5106-425c-b16d-8d2df709b19f | state<br>
>>> changed<br>
>>> | CREATE_COMPLETE | 2016-06-29T12:35:02 |<br>
>>> | ControllerOvercloudServicesDeployment_Step4 |<br>
>>> 1e151333-4de5-4e7b-907c-ea0f42d31a47 | state<br>
>>> changed<br>
>>> | CREATE_IN_PROGRESS | 2016-06-29T12:35:03 |<br>
>>> | ControllerOvercloudServicesDeployment_Step4 |<br>
>>> 7bf36334-3d92-4554-b6c0-41294a072ab6 | Error:<br>
>>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:<br>
>>> Deployment to server failed: deploy_status_code: Deployment exited with<br>
>>> non-zero status code: 6 | CREATE_FAILED |<br>
>>> 2016-06-29T12:42:42 |<br>
>>> | overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk<br>
>>> | e72fb6f4-c2aa-4fe8-9bd1-5f5ad152685c | Resource CREATE failed:<br>
>>> Error:<br>
>>> resources.ControllerOvercloudServicesDeployment_Step4.resources[0]:<br>
>>> Deployment to server failed: deploy_status_code: Deployment exited with<br>
>>> non-zero status code: 6 | CREATE_FAILED | 2016-06-29T12:42:43 |<br>
>>> +------------------------------------------------------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+---------------------+<br>
>>><br>
>>> [stack@undercloud ~]$ heat stack-show<br>
>>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk | grep<br>
>>> NodeConfigIdentifiers<br>
>>> | | "NodeConfigIdentifiers":<br>
>>> "{u'deployment_identifier': 1467202276, u'controller_config': {u'1':<br>
>>> u'os-apply-config deployment 796df02a-7550-414b-a084-8b591a13e6db<br>
>>> completed,Root CA cert injection not enabled.,TLS not enabled.,None,',<br>
>>> u'0': u'os-apply-config deployment 613ec889-d852-470a-8e4c-6e243e1d2033<br>
>>> completed,Root CA cert injection not enabled.,TLS not enabled.,None,',<br>
>>> u'2': u'os-apply-config deployment c8b099d0-3af4-4ba0-a056-a0ce60f40e2d<br>
>>> completed,Root CA cert injection not enabled.,TLS not enabled.,None,'},<br>
>>> u'allnodes_extra': u'none'}" |<br>
>>><br>
>>> However, when stack creating crashed update wouldn't help.<br>
>>><br>
>>> [stack@undercloud ~]$ heat stack-update -x<br>
>>> overcloud-ControllerNodesPostDeployment-2r4tlv5icaxk -e update_env.yaml<br>
>>> ERROR: PATCH update to non-COMPLETE stack is not supported.<br>
>>><br>
>>> DUE TO :-<br>
>>><br>
>>> [stack@undercloud ~]$ heat stack-list<br>
>>> +--------------------------------------+------------+---------------+---------------------+--------------+<br>
>>> | id | stack_name | stack_status |<br>
>>> creation_time | updated_time |<br>
>>> +--------------------------------------+------------+---------------+---------------------+--------------+<br>
>>> | 17f82f6e-e0ca-44c6-9058-de82c00d4f79 | overcloud | CREATE_FAILED |<br>
>>> 2016-06-29T12:11:20 | None |<br>
>>> +--------------------------------------+------------+---------------+---------------------+------<br>
>>><br>
>>><br>
>>> Complete error file `heat deployment-show<br>
>>> 655c77fc-6a78-4cca-b4b7-a153a3f4ad52` is attached a gzip archive.<br>
>>><br>
>>><br>
>>> Thanks.<br>
>>><br>
>>> Boris.<br>
>>><br>
>>><br>
>>><br>
>>> _______________________________________________<br>
>>> rdo-list mailing list<br>
>>> rdo-list@redhat.com<br>
>>> <a href="https://www.redhat.com/mailman/listinfo/rdo-list">https://www.redhat.com/mailman/listinfo/rdo-list</a><br>
>>><br>
>>> To unsubscribe: rdo-list-unsubscribe@redhat.com<br>
>>><br>
>><br>
>> The failure occurred during the post-deployment, which means that the<br>
>> initial deployment succeeded, but then the steps that are done to the<br>
>> completed overcloud failed.<br>
>><br>
>> This is most commonly attributable to network problems between the<br>
>> Undercloud and the Overcloud Public API. The Undercloud needs to reach<br>
>> the Public API in order to do some of the post-configuration steps. If<br>
>> this API isn't reachable, you end up with the error you saw above.<br>
>><br>
>> You can test this connectivity by pinging the Public API VIP from the<br>
>> Undercloud. Starting with the failed deployment, run "neutron<br>
>> port-list" against the Underlcloud and look for the IP on the port<br>
>> named "public_virtual_ip". You should be able to ping this address from<br>
>> the Undercloud. If you can't reach that IP, then you need to check the<br>
>> connectivity/routing between the Undercloud and the External network on<br>
>> the Overcloud.<br>
>><br>
> <br>
> I should also mention common causes of this problem:<br>
> <br>
> * Incorrect value for ExternalInterfaceDefaultRoute in the network<br>
> environment file.<br>
> * Controllers do not have the default route on the External network in<br>
> the NIC config templates (required for reachability from remote subnets).<br>
> * Incorrect subnet mask on the ExternalNetCidr in the network environment.<br>
> * Incorrect ExternalAllocationPools values in the network environment.<br>
> * Incorrect Ethernet switch config for the Controllers.<br>
> <br>
> Issue has been reproduced with exactly same error 4 times<br>
> starting since 06/25/16 on daily basis with exactly same error at Step4<br>
> of overcloud-ControllerNodesPostDeployment.<br>
> In meantime I cannot reproduce the error.<br>
> Config 3xNode HA Controller + 1xCompute works .<br>
> There was one more issue 3xNode HA Controller + 2xCompute<br>
> failed immediately when overcloud-deploy.sh started due to<br>
> only 4 nodes could be introspected. I will test it tomorrow morning.<br>
> <br>
> Thanks a lot.<br>
> Boris.<br>
> <br>
> --<br>
> Dan Sneddon | Principal OpenStack Engineer<br>
> dsneddon@redhat.com | redhat.com/openstack<br>
> 650.254.4025 | dsneddon:irc @dxs:twitter<br>
> <br>
> _______________________________________________<br>
> rdo-list mailing list<br>
> rdo-list@redhat.com<br>
> <a href="https://www.redhat.com/mailman/listinfo/rdo-list">https://www.redhat.com/mailman/listinfo/rdo-list</a><br>
> <br>
> To unsubscribe: rdo-list-unsubscribe@redhat.com<br>
> <br>
> <br>
> <br>
> This body part will be downloaded on demand.<br>
> <br>
</div>
</span></font></div>
</div>
</body>
</html>