[rdo-list] Tripleo QuickStart HA deployment attempts constantly crash
Boris Derzhavets
bderzhavets at hotmail.com
Sat Jun 4 07:47:23 UTC 2016
From: John Trowbridge <trown at redhat.com>
Sent: Friday, June 3, 2016 5:43 PM
To: Boris Derzhavets; Lars Kellogg-Stedman
Cc: rdo-list
Subject: Re: [rdo-list] Tripleo QuickStart HA deployment attempts constantly crash
On 06/03/2016 04:53 PM, John Trowbridge wrote:
> I just did an HA deploy locally on master, and I see the same thing wrt
> telemetry services being down due to failed redis import. That could be
> a packaging bug (something should depend on python-redis, maybe
> python-tooz?). That said, it does not appear fatal in my case. Is there
> some issue other than telemetry services being down that you are seeing?
> That is certainly something we should fix, but I wouldn't characterize
> it as the deployment is constantly crashing.
That was told by me in regards of comment #3 in https://bugzilla.redhat.com/show_bug.cgi?id=1340865
Of course , issue with telemetry services is not "constantly crashing"
>
Confirmed that installing python-redis fixes the telemetry issue by
doing the following from the undercloud:
sudo LIBGUESTFS_BACKEND=direct virt-customize -a overcloud-full.qcow2
--install python-redis
openstack overcloud image upload --update-existing
> Then deleting the failed overcloud stack, and re-running
> overcloud-deploy.sh.
Doesn't work for me. Re-running fails to recreate overcloud stack.
> On 06/03/2016 11:30 AM, Boris Derzhavets wrote:
>> 1. Attempting to address your concern ( if I understood you correct )
>>
>> First log :-
>>
>> [root at overcloud-controller-0 ceilometer]# cat central.log | grep ERROR
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service [req-4db5f172-0bf0-4200-9cf4-174859cdc00b admin - - - -] Error starting thread.
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service Traceback (most recent call last):
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 680, in run_service
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service service.start()
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/ceilometer/agent/manager.py", line 384, in start
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service self.partition_coordinator.start()
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/ceilometer/coordination.py", line 84, in start
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service backend_url, self._my_id)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 539, in get_coordinator
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service invoke_args=(member_id, parsed_url, options)).driver
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/driver.py", line 46, in __init__
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service verify_requirements=verify_requirements,
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 55, in __init__
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service verify_requirements)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 171, in _load_plugins
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service self._on_load_failure_callback(self, ep, err)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 163, in _load_plugins
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service verify_requirements,
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 123, in _load_one_plugin
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service verify_requirements,
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 186, in _load_one_plugin
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service plugin = ep.load(require=verify_requirements)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2260, in load
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service entry = __import__(self.module_name, globals(),globals(), ['__name__'])
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 27, in <module>
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service import redis
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service ImportError: No module named redis
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service
>> [root at overcloud-controller-0 ceilometer]# clear
>> [3;J
>> [root at overcloud-controller-0 ceilometer]# cat central.log | grep ERROR
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service [req-4db5f172-0bf0-4200-9cf4-174859cdc00b admin - - - -] Error starting thread.
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service Traceback (most recent call last):
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 680, in run_service
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service service.start()
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/ceilometer/agent/manager.py", line 384, in start
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service self.partition_coordinator.start()
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/ceilometer/coordination.py", line 84, in start
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service backend_url, self._my_id)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 539, in get_coordinator
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service invoke_args=(member_id, parsed_url, options)).driver
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/driver.py", line 46, in __init__
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service verify_requirements=verify_requirements,
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 55, in __init__
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service verify_requirements)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 171, in _load_plugins
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service self._on_load_failure_callback(self, ep, err)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 163, in _load_plugins
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service verify_requirements,
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 123, in _load_one_plugin
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service verify_requirements,
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 186, in _load_one_plugin
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service plugin = ep.load(require=verify_requirements)
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2260, in load
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service entry = __import__(self.module_name, globals(),globals(), ['__name__'])
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 27, in <module>
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service import redis
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service ImportError: No module named redis
>> 2016-06-03 08:50:04.405 17503 ERROR oslo_service.service
>>
>> Second log :-
>>
>> [root at overcloud-controller-0 ceilometer]# cd -
>> /var/log/aodh
>> [root at overcloud-controller-0 aodh]# cat evaluator.log | grep ERROR
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service [-] Error starting thread.
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service Traceback (most recent call last):
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 680, in run_service
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service service.start()
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/aodh/evaluator/__init__.py", line 229, in start
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service self.partition_coordinator.start()
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/aodh/coordination.py", line 133, in start
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service self.backend_url, self._my_id)
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 539, in get_coordinator
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service invoke_args=(member_id, parsed_url, options)).driver
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/driver.py", line 46, in __init__
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service verify_requirements=verify_requirements,
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 55, in __init__
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service verify_requirements)
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 171, in _load_plugins
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service self._on_load_failure_callback(self, ep, err)
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 163, in _load_plugins
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service verify_requirements,
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 123, in _load_one_plugin
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service verify_requirements,
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 186, in _load_one_plugin
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service plugin = ep.load(require=verify_requirements)
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2260, in load
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service entry = __import__(self.module_name, globals(),globals(), ['__name__'])
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 27, in <module>
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service import redis
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service ImportError: No module named redis
>> 2016-06-03 08:46:20.552 32101 ERROR oslo_service.service
>>
>> 2 . Memory DIMMs DDR3 ( Kingston HyperX 1600 MHZ ) is not a problem
>> My board ASUS Z97-P cannot support more 32 GB. So ....
>>
>> 3. i7 4790 surprised me on doing deployment on TripleO Quickstart , in particular, Controller+2xComputes ( --compute-scale 2 )
>>
>> Thank you
>> Boris.
>> ________________________________________
>> From: John Trowbridge <trown at redhat.com>
>> Sent: Friday, June 3, 2016 8:43 AM
>> To: Boris Derzhavets; John Trowbridge; Lars Kellogg-Stedman
>> Cc: rdo-list
>> Subject: Re: [rdo-list] Tripleo QuickStart HA deployment attempts constantly crash
>>
>> So this last one looks like telemetry services went down. You could
>> check the logs on the controllers to see if it was OOM killed. My bet
>> would be this is what is happening.
>>
>> The reason that HA is not the default for tripleo-quickstart is exactly
>> this type of issue. It is pretty difficult to fit a full HA deployment
>> of TripleO on a 32G virthost. I think there is near 100% chance that the
>> default HA config will crash when trying to do anything on the
>> deployed overcloud, due to running out of memory.
>>
>> I have had some success in my local test setup using KSM [1] on the
>> virthost, and then changing the HA config to give the controllers more
>> memory. This results in overcommiting, but KSM can handle overcommiting
>> without going into swap. It might even be possible to try to setup KSM
>> in the environment setup part of quickstart. I would certainly accept an
>> RFE/patch for this [2,3].
>>
>> If you have a larger virthost than 32G, you could similarly bump the
>> memory for the controllers, which should lead to a much higher success rate.
>>
>> There is also a feature coming in TripleO [4] that will allow choosing
>> what services get deployed in each role, which will allow us to tweak
>> the tripleo-quickstart HA config to deploy a minimal service layout in
>> order to reduce memory requirements.
>>
>> Thanks a ton for giving tripleo-quickstart a go!
>>
>> [1] https://en.wikipedia.org/wiki/Kernel_same-page_merging
Kernel same-page merging - Wikipedia, the free encyclopedia<https://en.wikipedia.org/wiki/Kernel_same-page_merging>
en.wikipedia.org
In computing, kernel same-page merging (abbreviated as KSM, and also known as kernel shared memory and memory merging) is a kernel feature that makes it possible for ...
>> [2] https://bugs.launchpad.net/tripleo-quickstart
>> [3] https://review.openstack.org/#/q/project:openstack/tripleo-quickstart
>> [4]
>> https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
>>
>> On 06/03/2016 06:20 AM, Boris Derzhavets wrote:
>>> =====================================
>>>
>>> Fresh HA deployment attempt
>>>
>>> =====================================
>>>
>>> [stack at undercloud ~]$ date
>>> Fri Jun 3 10:05:35 UTC 2016
>>> [stack at undercloud ~]$ heat stack-list
>>> +--------------------------------------+------------+-----------------+---------------------+--------------+
>>> | id | stack_name | stack_status | creation_time | updated_time |
>>> +--------------------------------------+------------+-----------------+---------------------+--------------+
>>> | 0c6b8205-be86-4a24-be36-fd4ece956c6d | overcloud | CREATE_COMPLETE | 2016-06-03T08:14:19 | None |
>>> +--------------------------------------+------------+-----------------+---------------------+--------------+
>>> [stack at undercloud ~]$ nova list
>>> +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
>>> | ID | Name | Status | Task State | Power State | Networks |
>>> +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
>>> | 6a38b7be-3743-4339-970b-6121e687741d | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.0.2.10 |
>>> | 9222dc1b-5974-495b-8b98-b8176ac742f4 | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.0.2.9 |
>>> | 76adbb27-220f-42ef-9691-94729ee28749 | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.0.2.11 |
>>> | 8f57f7b6-a2d8-4b7b-b435-1c675e63ea84 | overcloud-novacompute-0 | ACTIVE | - | Running | ctlplane=192.0.2.8 |
>>> +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
>>> [stack at undercloud ~]$ ssh heat-admin at 192.0.2.10
>>> Last login: Fri Jun 3 10:01:44 2016 from gateway
>>> [heat-admin at overcloud-controller-0 ~]$ sudo su -
>>> Last login: Fri Jun 3 10:01:49 UTC 2016 on pts/0
>>> [root at overcloud-controller-0 ~]# . keystonerc_admin
>>>
>>> [root at overcloud-controller-0 ~]# pcs status
>>> Cluster name: tripleo_cluster
>>> Last updated: Fri Jun 3 10:07:22 2016 Last change: Fri Jun 3 08:50:59 2016 by root via cibadmin on overcloud-controller-0
>>> Stack: corosync
>>> Current DC: overcloud-controller-0 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
>>> 3 nodes and 123 resources configured
>>>
>>> Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>
>>> Full list of resources:
>>>
>>> ip-192.0.2.6 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
>>> Clone Set: haproxy-clone [haproxy]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> ip-192.0.2.7 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
>>> Master/Slave Set: galera-master [galera]
>>> Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: memcached-clone [memcached]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: rabbitmq-clone [rabbitmq]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-core-clone [openstack-core]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Master/Slave Set: redis-master [redis]
>>> Masters: [ overcloud-controller-1 ]
>>> Slaves: [ overcloud-controller-0 overcloud-controller-2 ]
>>> Clone Set: mongod-clone [mongod]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-aodh-evaluator-clone [openstack-aodh-evaluator]
>>> Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: neutron-l3-agent-clone [neutron-l3-agent]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-2
>>> Clone Set: openstack-heat-engine-clone [openstack-heat-engine]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api]
>>> Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-aodh-listener-clone [openstack-aodh-listener]
>>> Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-gnocchi-metricd-clone [openstack-gnocchi-metricd]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-aodh-notifier-clone [openstack-aodh-notifier]
>>> Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-heat-api-clone [openstack-heat-api]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector]
>>> Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-glance-api-clone [openstack-glance-api]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-nova-api-clone [openstack-nova-api]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-sahara-api-clone [openstack-sahara-api]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-sahara-engine-clone [openstack-sahara-engine]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-glance-registry-clone [openstack-glance-registry]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-gnocchi-statsd-clone [openstack-gnocchi-statsd]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-cinder-api-clone [openstack-cinder-api]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: delay-clone [delay]
>>> Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: neutron-server-clone [neutron-server]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central]
>>> Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: httpd-clone [httpd]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>> Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor]
>>> Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>>>
>>> Failed Actions:
>>> * openstack-aodh-evaluator_monitor_60000 on overcloud-controller-1 'not running' (7): call=76, status=complete, exitreason='none',
>>> last-rc-change='Fri Jun 3 08:47:22 2016', queued=0ms, exec=0ms
>>> * openstack-ceilometer-central_start_0 on overcloud-controller-1 'not running' (7): call=290, status=complete, exitreason='none',
>>> last-rc-change='Fri Jun 3 08:51:18 2016', queued=0ms, exec=2132ms
>>> * openstack-aodh-evaluator_monitor_60000 on overcloud-controller-2 'not running' (7): call=76, status=complete, exitreason='none',
>>> last-rc-change='Fri Jun 3 08:47:16 2016', queued=0ms, exec=0ms
>>> * openstack-ceilometer-central_start_0 on overcloud-controller-2 'not running' (7): call=292, status=complete, exitreason='none',
>>> last-rc-change='Fri Jun 3 08:51:31 2016', queued=0ms, exec=2102ms
>>> * openstack-aodh-evaluator_monitor_60000 on overcloud-controller-0 'not running' (7): call=77, status=complete, exitreason='none',
>>> last-rc-change='Fri Jun 3 08:47:19 2016', queued=0ms, exec=0ms
>>> * openstack-ceilometer-central_start_0 on overcloud-controller-0 'not running' (7): call=270, status=complete, exitreason='none',
>>> last-rc-change='Fri Jun 3 08:50:02 2016', queued=0ms, exec=2199ms
>>>
>>>
>>> PCSD Status:
>>> overcloud-controller-0: Online
>>> overcloud-controller-1: Online
>>> overcloud-controller-2: Online
>>>
>>> Daemon Status:
>>> corosync: active/enabled
>>> pacemaker: active/enabled
>>> pcsd: active/enabled
>>>
>>>
>>> ________________________________
>>> From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Boris Derzhavets <bderzhavets at hotmail.com>
>>> Sent: Monday, May 30, 2016 4:56 AM
>>> To: John Trowbridge; Lars Kellogg-Stedman
>>> Cc: rdo-list
>>> Subject: Re: [rdo-list] Tripleo QuickStart HA deployment attempts constantly crash
>>>
>>>
>>> Done one more time :-
>>>
>>>
>>> [stack at undercloud ~]$ heat deployment-show 9cc8087a-6d82-4261-8a13-ee8c46e3a02d
>>>
>>> Uploaded here :-
>>>
>>> http://textuploader.com/5bm5v
>>> ________________________________
>>> From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Boris Derzhavets <bderzhavets at hotmail.com>
>>> Sent: Sunday, May 29, 2016 3:39 AM
>>> To: John Trowbridge; Lars Kellogg-Stedman
>>> Cc: rdo-list
>>> Subject: [rdo-list] Tripleo QuickStart HA deploymemt attempts constantly crash
>>>
>>>
>>> Error every time is the same :-
>>>
>>>
>>> 2016-05-29 07:20:17 [0]: CREATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
>>> 2016-05-29 07:20:18 [0]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:18 [overcloud-ControllerNodesPostDeployment-dzawjmjyaidt-ControllerServicesBaseDeployment_Step2-ufz2ccs5egd7]: CREATE_FAILED Resource CREATE failed: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
>>> 2016-05-29 07:20:18 [0]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:19 [ControllerServicesBaseDeployment_Step2]: CREATE_FAILED Error: resources.ControllerServicesBaseDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
>>> 2016-05-29 07:20:19 [0]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:19 [0]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:20 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:20 [overcloud-ControllerNodesPostDeployment-dzawjmjyaidt]: CREATE_FAILED Resource CREATE failed: Error: resources.ControllerServicesBaseDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
>>> 2016-05-29 07:20:21 [ControllerNodesPostDeployment]: CREATE_FAILED Error: resources.ControllerNodesPostDeployment.resources.ControllerServicesBaseDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
>>> 2016-05-29 07:20:21 [0]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:22 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:20:22 [0]: SIGNAL_COMPLETE Unknown
>>> 2016-05-29 07:24:22 [ComputeNodesPostDeployment]: CREATE_FAILED CREATE aborted
>>> 2016-05-29 07:24:22 [overcloud]: CREATE_FAILED Resource CREATE failed: Error: resources.ControllerNodesPostDeployment.resources.ControllerServicesBaseDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
>>> Stack overcloud CREATE_FAILED
>>> Deployment failed: Heat Stack create failed.
>>> + heat stack-list
>>> + grep -q CREATE_FAILED
>>> + deploy_status=1
>>> ++ heat resource-list --nested-depth 5 overcloud
>>> ++ grep FAILED
>>> ++ grep 'StructuredDeployment '
>>> ++ cut -d '|' -f3
>>> + for failed in '$(heat resource-list --nested-depth 5 overcloud | grep FAILED |
>>> grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'
>>> + heat deployment-show 66bd3fbe-296b-4f88-87a7-5ceafd05c1d3
>>> + exit 1
>>>
>>>
>>> Minimal configuration deployments run with no errors and build completely functional environment.
>>>
>>>
>>> However, template :-
>>>
>>>
>>> #################################
>>> # Test Controller + 2*Compute nodes
>>> #################################
>>> control_memory: 6144
>>> compute_memory: 6144
>>>
>>> undercloud_memory: 8192
>>>
>>> # Giving the undercloud additional CPUs can greatly improve heat's
>>> # performance (and result in a shorter deploy time).
>>> undercloud_vcpu: 4
>>>
>>> # We set introspection to true and use only the minimal amount of nodes
>>> # for this job, but test all defaults otherwise.
>>> step_introspect: true
>>>
>>> # Define a single controller node and a single compute node.
>>> overcloud_nodes:
>>> - name: control_0
>>> flavor: control
>>>
>>> - name: compute_0
>>> flavor: compute
>>>
>>> - name: compute_1
>>> flavor: compute
>>>
>>> # Tell tripleo how we want things done.
>>> extra_args: >-
>>> --neutron-network-type vxlan
>>> --neutron-tunnel-types vxlan
>>> --ntp-server pool.ntp.org
>>>
>>> network_isolation: true
>>>
>>>
>>> Picks up new memory setting but doesn't create second Compute Node.
>>>
>>> Every time just Controller && (1)* Compute.
>>>
>>>
>>> HW - i74790 , 32 GB RAM
>>>
>>>
>>> Thanks.
>>>
>>> Boris
>>>
>>> ________________________________
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> rdo-list mailing list
>>> rdo-list at redhat.com
>>> https://www.redhat.com/mailman/listinfo/rdo-list
>>>
>>> To unsubscribe: rdo-list-unsubscribe at redhat.com
>>>
>
> _______________________________________________
> rdo-list mailing list
> rdo-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rdo-list
>
> To unsubscribe: rdo-list-unsubscribe at redhat.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20160604/e0780fa8/attachment.html>
More information about the dev
mailing list