[rdo-list] Tripleo QuickStart HA deployment attempts constantly crash

Boris Derzhavets bderzhavets at hotmail.com
Sat Jun 4 07:47:23 UTC 2016


From: John Trowbridge <trown at redhat.com>
Sent: Friday, June 3, 2016 5:43 PM
To: Boris Derzhavets; Lars Kellogg-Stedman
Cc: rdo-list
Subject: Re: [rdo-list] Tripleo QuickStart HA deployment attempts constantly crash



On 06/03/2016 04:53 PM, John Trowbridge wrote:
> I just did an HA deploy locally on master, and I see the same thing wrt
> telemetry services being down due to failed redis import. That could be
> a packaging bug (something should depend on python-redis, maybe
> python-tooz?). That said, it does not appear fatal in my case. Is there
> some issue other than telemetry services being down that you are seeing?
> That is certainly something we should fix, but I wouldn't characterize
> it as the deployment is constantly crashing.

   That was told by me in regards of comment #3 in  https://bugzilla.redhat.com/show_bug.cgi?id=1340865
   Of course , issue with telemetry services  is not "constantly crashing"
>

Confirmed that installing python-redis fixes the telemetry issue by
doing the following from the undercloud:

sudo LIBGUESTFS_BACKEND=direct virt-customize -a overcloud-full.qcow2
--install python-redis
openstack overcloud image upload --update-existing

> Then deleting the failed overcloud stack, and re-running
> overcloud-deploy.sh.

         Doesn't work for me. Re-running fails to recreate overcloud stack.

> On 06/03/2016 11:30 AM, Boris Derzhavets wrote:
>> 1. Attempting to address your concern ( if I understood you correct )
>>
>> First log :-
>>
>>  [root at overcloud-controller-0 ceilometer]# cat  central.log | grep ERROR
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service [req-4db5f172-0bf0-4200-9cf4-174859cdc00b admin - - - -] Error starting thread.
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service Traceback (most recent call last):
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 680, in run_service
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     service.start()
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/ceilometer/agent/manager.py", line 384, in start
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     self.partition_coordinator.start()
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/ceilometer/coordination.py", line 84, in start
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     backend_url, self._my_id)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 539, in get_coordinator
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     invoke_args=(member_id, parsed_url, options)).driver
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/driver.py", line 46, in __init__
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     verify_requirements=verify_requirements,
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 55, in __init__
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     verify_requirements)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 171, in _load_plugins
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     self._on_load_failure_callback(self, ep, err)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 163, in _load_plugins
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     verify_requirements,
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 123, in _load_one_plugin
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     verify_requirements,
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 186, in _load_one_plugin
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     plugin = ep.load(require=verify_requirements)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2260, in load
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     entry = __import__(self.module_name, globals(),globals(), ['__name__'])
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 27, in <module>
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     import redis
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service ImportError: No module named redis
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service
>> [root at overcloud-controller-0 ceilometer]# clear
>> 
>> [root at overcloud-controller-0 ceilometer]# cat  central.log | grep ERROR
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service [req-4db5f172-0bf0-4200-9cf4-174859cdc00b admin - - - -] Error starting thread.
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service Traceback (most recent call last):
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 680, in run_service
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     service.start()
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/ceilometer/agent/manager.py", line 384, in start
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     self.partition_coordinator.start()
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/ceilometer/coordination.py", line 84, in start
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     backend_url, self._my_id)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 539, in get_coordinator
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     invoke_args=(member_id, parsed_url, options)).driver
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/driver.py", line 46, in __init__
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     verify_requirements=verify_requirements,
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 55, in __init__
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     verify_requirements)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 171, in _load_plugins
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     self._on_load_failure_callback(self, ep, err)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 163, in _load_plugins
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     verify_requirements,
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 123, in _load_one_plugin
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     verify_requirements,
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 186, in _load_one_plugin
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     plugin = ep.load(require=verify_requirements)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2260, in load
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     entry = __import__(self.module_name, globals(),globals(), ['__name__'])
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 27, in <module>
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service     import redis
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service ImportError: No module named redis
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service
>>
>> Second log :-
>>
>> [root at overcloud-controller-0 ceilometer]# cd -
>> /var/log/aodh
>> [root at overcloud-controller-0 aodh]# cat  evaluator.log | grep ERROR
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service [-] Error starting thread.
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service Traceback (most recent call last):
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 680, in run_service
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service     service.start()
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/aodh/evaluator/__init__.py", line 229, in start
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service     self.partition_coordinator.start()
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/aodh/coordination.py", line 133, in start
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service     self.backend_url, self._my_id)
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 539, in get_coordinator
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service     invoke_args=(member_id, parsed_url, options)).driver
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/driver.py", line 46, in __init__
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service     verify_requirements=verify_requirements,
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 55, in __init__
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service     verify_requirements)
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 171, in _load_plugins
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service     self._on_load_failure_callback(self, ep, err)
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 163, in _load_plugins
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service     verify_requirements,
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 123, in _load_one_plugin
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service     verify_requirements,
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 186, in _load_one_plugin
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service     plugin = ep.load(require=verify_requirements)
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2260, in load
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service     entry = __import__(self.module_name, globals(),globals(), ['__name__'])
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 27, in <module>
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service     import redis
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service ImportError: No module named redis
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service
>>
>>  2 . Memory DIMMs DDR3 ( Kingston HyperX 1600 MHZ ) is not a problem
>>       My board ASUS Z97-P cannot support more 32 GB.  So ....
>>
>>  3. i7  4790  surprised me on doing deployment on TripleO Quickstart , in particular, Controller+2xComputes ( --compute-scale 2 )
>>
>> Thank you
>> Boris.
>> ________________________________________
>> From: John Trowbridge <trown at redhat.com>
>> Sent: Friday, June 3, 2016 8:43 AM
>> To: Boris Derzhavets; John Trowbridge; Lars Kellogg-Stedman
>> Cc: rdo-list
>> Subject: Re: [rdo-list] Tripleo QuickStart HA deployment attempts constantly crash
>>
>> So this last one looks like telemetry services went down. You could
>> check the logs on the controllers to see if it was OOM killed. My bet
>> would be this is what is happening.
>>
>> The reason that HA is not the default for tripleo-quickstart is exactly
>> this type of issue. It is pretty difficult to fit a full HA deployment
>> of TripleO on a 32G virthost. I think there is near 100% chance that the
>> default HA config will crash when trying to do anything on the
>> deployed overcloud, due to running out of memory.
>>
>> I have had some success in my local test setup using KSM [1] on the
>> virthost, and then changing the HA config to give the controllers more
>> memory. This results in overcommiting, but KSM can handle overcommiting
>> without going into swap. It might even be possible to try to setup KSM
>> in the environment setup part of quickstart. I would certainly accept an
>> RFE/patch for this [2,3].
>>
>> If you have a larger virthost than 32G, you could similarly bump the
>> memory for the controllers, which should lead to a much higher success rate.
>>
>> There is also a feature coming in TripleO [4] that will allow choosing
>> what services get deployed in each role, which will allow us to tweak
>> the tripleo-quickstart HA config to deploy a minimal service layout in
>> order to reduce memory requirements.
>>
>> Thanks a ton for giving tripleo-quickstart a go!
>>
>> [1] https://en.wikipedia.org/wiki/Kernel_same-page_merging
Kernel same-page merging - Wikipedia, the free encyclopedia<https://en.wikipedia.org/wiki/Kernel_same-page_merging>
en.wikipedia.org
In computing, kernel same-page merging (abbreviated as KSM, and also known as kernel shared memory and memory merging) is a kernel feature that makes it possible for ...


>> [2] https://bugs.launchpad.net/tripleo-quickstart
>> [3] https://review.openstack.org/#/q/project:openstack/tripleo-quickstart
>> [4]
>> https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
>>
>> On 06/03/2016 06:20 AM, Boris Derzhavets wrote:
>>> =====================================
>>>
>>> Fresh HA deployment attempt
>>>
>>> =====================================
>>>
>>> [stack at undercloud ~]$ date
>>> Fri Jun  3 10:05:35 UTC 2016
>>> [stack at undercloud ~]$ heat stack-list
>>> +--------------------------------------+------------+-----------------+---------------------+--------------+
>>> | id                                   | stack_name | stack_status    | creation_time       | updated_time |
>>> +--------------------------------------+------------+-----------------+---------------------+--------------+
>>> | 0c6b8205-be86-4a24-be36-fd4ece956c6d | overcloud  | CREATE_COMPLETE | 2016-06-03T08:14:19 | None         |
>>> +--------------------------------------+------------+-----------------+---------------------+--------------+
>>> [stack at undercloud ~]$ nova list
>>> +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
>>> | ID                                   | Name                    | Status | Task State | Power State | Networks            |
>>> +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
>>> | 6a38b7be-3743-4339-970b-6121e687741d | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=192.0.2.10 |
>>> | 9222dc1b-5974-495b-8b98-b8176ac742f4 | overcloud-controller-1  | ACTIVE | -          | Running     | ctlplane=192.0.2.9  |
>>> | 76adbb27-220f-42ef-9691-94729ee28749 | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=192.0.2.11 |
>>> | 8f57f7b6-a2d8-4b7b-b435-1c675e63ea84 | overcloud-novacompute-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.8  |
>>> +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
>>> [stack at undercloud ~]$ ssh heat-admin at 192.0.2.10
>>> Last login: Fri Jun  3 10:01:44 2016 from gateway
>>> [heat-admin at overcloud-controller-0 ~]$ sudo su -
>>> Last login: Fri Jun  3 10:01:49 UTC 2016 on pts/0
>>> [root at overcloud-controller-0 ~]# .  keystonerc_admin
>>>
>>> [root at overcloud-controller-0 ~]# pcs status
>>> Cluster name: tripleo_cluster
>>> Last updated: Fri Jun  3 10:07:22 2016        Last change: Fri Jun  3 08:50:59 2016 by root via cibadmin on overcloud-controller-0
>>> Stack: corosync
>>> Current DC: overcloud-controller-0 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
>>> 3 nodes and 123 resources configured
>>>
>>> Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>
>>> Full list of resources:
>>>
>>>  ip-192.0.2.6    (ocf::heartbeat:IPaddr2):    Started overcloud-controller-0
>>>  Clone Set: haproxy-clone [haproxy]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  ip-192.0.2.7    (ocf::heartbeat:IPaddr2):    Started overcloud-controller-1
>>>  Master/Slave Set: galera-master [galera]
>>>      Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: memcached-clone [memcached]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: rabbitmq-clone [rabbitmq]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-core-clone [openstack-core]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Master/Slave Set: redis-master [redis]
>>>      Masters: [ overcloud-controller-1 ]
>>>      Slaves: [ overcloud-controller-0 overcloud-controller-2 ]
>>>  Clone Set: mongod-clone [mongod]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-aodh-evaluator-clone [openstack-aodh-evaluator]
>>>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: neutron-l3-agent-clone [neutron-l3-agent]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  openstack-cinder-volume    (systemd:openstack-cinder-volume):    Started overcloud-controller-2
>>>  Clone Set: openstack-heat-engine-clone [openstack-heat-engine]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api]
>>>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-aodh-listener-clone [openstack-aodh-listener]
>>>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-gnocchi-metricd-clone [openstack-gnocchi-metricd]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-aodh-notifier-clone [openstack-aodh-notifier]
>>>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-heat-api-clone [openstack-heat-api]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector]
>>>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-glance-api-clone [openstack-glance-api]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-nova-api-clone [openstack-nova-api]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-sahara-api-clone [openstack-sahara-api]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-sahara-engine-clone [openstack-sahara-engine]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-glance-registry-clone [openstack-glance-registry]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-gnocchi-statsd-clone [openstack-gnocchi-statsd]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-cinder-api-clone [openstack-cinder-api]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: delay-clone [delay]
>>>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: neutron-server-clone [neutron-server]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central]
>>>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: httpd-clone [httpd]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>  Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor]
>>>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>
>>> Failed Actions:
>>> * openstack-aodh-evaluator_monitor_60000 on overcloud-controller-1 'not running' (7): call=76, status=complete, exitreason='none',
>>>     last-rc-change='Fri Jun  3 08:47:22 2016', queued=0ms, exec=0ms
>>> * openstack-ceilometer-central_start_0 on overcloud-controller-1 'not running' (7): call=290, status=complete, exitreason='none',
>>>     last-rc-change='Fri Jun  3 08:51:18 2016', queued=0ms, exec=2132ms
>>> * openstack-aodh-evaluator_monitor_60000 on overcloud-controller-2 'not running' (7): call=76, status=complete, exitreason='none',
>>>     last-rc-change='Fri Jun  3 08:47:16 2016', queued=0ms, exec=0ms
>>> * openstack-ceilometer-central_start_0 on overcloud-controller-2 'not running' (7): call=292, status=complete, exitreason='none',
>>>     last-rc-change='Fri Jun  3 08:51:31 2016', queued=0ms, exec=2102ms
>>> * openstack-aodh-evaluator_monitor_60000 on overcloud-controller-0 'not running' (7): call=77, status=complete, exitreason='none',
>>>     last-rc-change='Fri Jun  3 08:47:19 2016', queued=0ms, exec=0ms
>>> * openstack-ceilometer-central_start_0 on overcloud-controller-0 'not running' (7): call=270, status=complete, exitreason='none',
>>>     last-rc-change='Fri Jun  3 08:50:02 2016', queued=0ms, exec=2199ms
>>>
>>>
>>> PCSD Status:
>>>   overcloud-controller-0: Online
>>>   overcloud-controller-1: Online
>>>   overcloud-controller-2: Online
>>>
>>> Daemon Status:
>>>   corosync: active/enabled
>>>   pacemaker: active/enabled
>>>   pcsd: active/enabled
>>>
>>>
>>> ________________________________
>>> From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Boris Derzhavets <bderzhavets at hotmail.com>
>>> Sent: Monday, May 30, 2016 4:56 AM
>>> To: John Trowbridge; Lars Kellogg-Stedman
>>> Cc: rdo-list
>>> Subject: Re: [rdo-list] Tripleo QuickStart HA deployment attempts constantly crash
>>>
>>>
>>> Done one more time :-
>>>
>>>
>>> [stack at undercloud ~]$ heat deployment-show 9cc8087a-6d82-4261-8a13-ee8c46e3a02d
>>>
>>> Uploaded here :-
>>>
>>> http://textuploader.com/5bm5v
>>> ________________________________
>>> From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Boris Derzhavets <bderzhavets at hotmail.com>
>>> Sent: Sunday, May 29, 2016 3:39 AM
>>> To: John Trowbridge; Lars Kellogg-Stedman
>>> Cc: rdo-list
>>> Subject: [rdo-list] Tripleo QuickStart HA deploymemt attempts constantly crash
>>>
>>>
>>> Error every time is the same :-
>>>
>>>
>>> 2016-05-29 07:20:17 [0]: CREATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
>>> 2016-05-29 07:20:18 [0]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:18 [overcloud-ControllerNodesPostDeployment-dzawjmjyaidt-ControllerServicesBaseDeployment_Step2-ufz2ccs5egd7]: CREATE_FAILED Resource CREATE failed: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
>>> 2016-05-29 07:20:18 [0]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:19 [ControllerServicesBaseDeployment_Step2]: CREATE_FAILED Error: resources.ControllerServicesBaseDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
>>> 2016-05-29 07:20:19 [0]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:19 [0]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:20 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:20 [overcloud-ControllerNodesPostDeployment-dzawjmjyaidt]: CREATE_FAILED Resource CREATE failed: Error: resources.ControllerServicesBaseDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
>>> 2016-05-29 07:20:21 [ControllerNodesPostDeployment]: CREATE_FAILED Error: resources.ControllerNodesPostDeployment.resources.ControllerServicesBaseDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
>>> 2016-05-29 07:20:21 [0]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:22 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:22 [0]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:24:22 [ComputeNodesPostDeployment]: CREATE_FAILED CREATE aborted
>>> 2016-05-29 07:24:22 [overcloud]: CREATE_FAILED Resource CREATE failed: Error: resources.ControllerNodesPostDeployment.resources.ControllerServicesBaseDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
>>> Stack overcloud CREATE_FAILED
>>> Deployment failed:  Heat Stack create failed.
>>> + heat stack-list
>>> + grep -q CREATE_FAILED
>>> + deploy_status=1
>>> ++ heat resource-list --nested-depth 5 overcloud
>>> ++ grep FAILED
>>> ++ grep 'StructuredDeployment '
>>> ++ cut -d '|' -f3
>>> + for failed in '$(heat resource-list         --nested-depth 5 overcloud | grep FAILED |
>>>         grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'
>>> + heat deployment-show 66bd3fbe-296b-4f88-87a7-5ceafd05c1d3
>>> + exit 1
>>>
>>>
>>> Minimal configuration deployments run with no errors and build completely functional environment.
>>>
>>>
>>> However,   template :-
>>>
>>>
>>> #################################
>>> # Test Controller + 2*Compute nodes
>>> #################################
>>> control_memory: 6144
>>> compute_memory: 6144
>>>
>>> undercloud_memory: 8192
>>>
>>> # Giving the undercloud additional CPUs can greatly improve heat's
>>> # performance (and result in a shorter deploy time).
>>> undercloud_vcpu: 4
>>>
>>> # We set introspection to true and use only the minimal amount of nodes
>>> # for this job, but test all defaults otherwise.
>>> step_introspect: true
>>>
>>> # Define a single controller node and a single compute node.
>>> overcloud_nodes:
>>>   - name: control_0
>>>     flavor: control
>>>
>>>   - name: compute_0
>>>     flavor: compute
>>>
>>>   - name: compute_1
>>>     flavor: compute
>>>
>>> # Tell tripleo how we want things done.
>>> extra_args: >-
>>>   --neutron-network-type vxlan
>>>   --neutron-tunnel-types vxlan
>>>   --ntp-server pool.ntp.org
>>>
>>> network_isolation: true
>>>
>>>
>>> Picks up new memory setting but doesn't create second Compute Node.
>>>
>>> Every time just Controller && (1)* Compute.
>>>
>>>
>>> HW - i74790 , 32 GB RAM
>>>
>>>
>>> Thanks.
>>>
>>> Boris
>>>
>>> ________________________________
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> rdo-list mailing list
>>> rdo-list at redhat.com
>>> https://www.redhat.com/mailman/listinfo/rdo-list
>>>
>>> To unsubscribe: rdo-list-unsubscribe at redhat.com
>>>
>
> _______________________________________________
> rdo-list mailing list
> rdo-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rdo-list
>
> To unsubscribe: rdo-list-unsubscribe at redhat.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20160604/e0780fa8/attachment.html>


More information about the dev mailing list