Re: [rdo-list] Newton HA galera-ready dependency error

Wednesday, 12 October 2016

Oh, that explains it. It looks that the overcloud image doesn't
contain the patch.

I'm not familiar with the image build process but according to the
docs[1] I think the packages get installed from the repo specified by
export DELOREAN_TRUNK_REPO so maybe you should try to rebuild the
image and use the same repo as the one set on the undercloud.

[1]
http://docs.openstack.org/developer/tripleo-docs/basic_deployment/basic_d...

On Wed, Oct 12, 2016 at 10:30 AM, Charles Short <cems(a)ebi.ac.uk&gt; wrote:
...
 Hi,

 We noticed something that may be the reason for this patch not working which
 may be related to the way the Undercloud built the image? -

 This is difference between the puppet files in the undercloud and
 puppet in the image:

 [stack@hh-extcl05-undercloud ~]$ grep mysql_short_node_names

/usr/share/openstack-puppet/modules/tripleo/manifests/profile/pacemaker/database/mysql.pp

 $galera_node_names_lookup = hiera('mysql_short_node_names',
 hiera('mysql_node_names', $::hostname))

 [root@overcloud-controller-1 puppet]# grep mysql_short_node_names
 /etc/puppet/modules/tripleo/manifests/profile/pacemaker/database/mysql.pp

 (nothing found)

 On 12/10/2016 09:09, Marius Cornea wrote:
>
> That's odd. I encountered the same issue and it was caused by missing
> this patch.  What do you get if you do sudo hiera
> mysql_short_node_names on the controller node?
>
> On Wed, Oct 12, 2016 at 12:48 AM, Charles Short <cems(a)ebi.ac.uk&gt; wrote:
>>
>> Hi,
>>
>> puppet-tripleo-5.2.0-0.20161007035759.f32e484.el7.centos.noarch
>>
>> The patch seems to be already present -
>>
>>    grep short
>>
>>
/usr/share/openstack-puppet/modules/tripleo/manifests/profile/pacemaker/database/mysql.pp
>>
>>    # short name which is already registered in pacemaker until we get
>> around
>>    $galera_node_names_lookup = hiera('mysql_short_node_names',
>> hiera('mysql_node_names', $::hostname))
>>
>> Charles
>>
>> On 11/10/2016 19:39, Marius Cornea wrote:
>>>
>>> I think the issue is caused by the addresses in wsrep_cluster_address
>>> not matching the pacemaker node names:
>>>
>>>
>>>
wsrep_cluster_address=gcomm://overcloud-controller-0.internalapi.localdomain,overcloud-controller-1.internalapi.localdomain,overcloud-controller-2.internalapi.localdomain
>>>
>>> Could you please confirm what version of puppet-tripleo you've got
>>> installed on the overcloud nodes and if it contains the following
>>> patch:
>>>
>>>
https://review.openstack.org/#/c/382883/1/manifests/profile/pacemaker/dat...
>>>
>>> Thanks,
>>> Marius
>>>
>>> On Tue, Oct 11, 2016 at 7:42 PM, Charles Short <cems(a)ebi.ac.uk&gt; wrote:
>>>>
>>>> Ok install finished with same error
>>>> The latest pcs status etc
>>>>
>>>> http://pastebin.com/ZK683gZe
>>>>
>>>>
>>>> On 11/10/2016 17:35, Charles Short wrote:
>>>>>
>>>>> Deployment almost finished...so
>>>>>
>>>>>
>>>>> http://pastebin.com/zE9B19XB
>>>>>
>>>>> This shows the pcs status as the deployment nears the end, and pcs
>>>>> resource show galera
>>>>>
>>>>> Charles
>>>>>
>>>>> On 11/10/2016 16:59, Marius Cornea wrote:
>>>>>>
>>>>>> Great, thanks for checking this.
>>>>>>
>>>>>> On Tue, Oct 11, 2016 at 5:58 PM, Charles Short
<cems(a)ebi.ac.uk&gt;
>>>>>> wrote:
>>>>>>>
>>>>>>> Currently having more generic deployment issues (no valid
host found
>>>>>>> etc).
>>>>>>> I can work around/solve these.
>>>>>>> I don't yet have another stack to analyse, but will do
soon.
>>>>>>>
>>>>>>> Charles
>>>>>>>
>>>>>>>
>>>>>>> On 11/10/2016 16:35, Marius Cornea wrote:
>>>>>>>>
>>>>>>>> Did it succeed in bringing the Galera nodes to Master?
You can ssh
>>>>>>>> to
>>>>>>>> the nodes and run 'pcs resource show galera' even
though the
>>>>>>>> deployment hasn't finished. I'm interested to see
how the
>>>>>>>> wsrep_cluster_address is set to see if it's affected
by the
>>>>>>>> resource
>>>>>>>> agent issue described in
>>>>>>>> https://bugs.launchpad.net/tripleo/+bug/1628521
>>>>>>>>
>>>>>>>> On Tue, Oct 11, 2016 at 5:18 PM, Charles Short
<cems(a)ebi.ac.uk&gt;
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Looks similar to this bug (still waiting on
deployment to finish)
>>>>>>>>>
>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1368214
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/10/2016 15:25, Charles Short wrote:
>>>>>>>>>>
>>>>>>>>>> Sorry for the delay.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Just redeploying to make sure I can repeat the
same error. Should
>>>>>>>>>> not
>>>>>>>>>> be
>>>>>>>>>> long.
>>>>>>>>>>
>>>>>>>>>> Charles
>>>>>>>>>>
>>>>>>>>>> On 11/10/2016 14:24, Marius Cornea wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Could you also please paste the output for
'pcs resource show
>>>>>>>>>>> galera',
>>>>>>>>>>> it looks that all the galera nodes show up as
slaves?
>>>>>>>>>>>
>>>>>>>>>>>       Master/Slave Set: galera-master
[galera]
>>>>>>>>>>>           Slaves: [ overcloud-controller-0
>>>>>>>>>>> overcloud-controller-1
>>>>>>>>>>> overcloud-controller-2 ]
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 11, 2016 at 2:16 PM, Charles
Short <cems(a)ebi.ac.uk&gt;
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Here you are -
>>>>>>>>>>>>
>>>>>>>>>>>>       - Heat stack error  -
http://pastebin.com/E8KZa2vE
>>>>>>>>>>>>       - PCS status  -
http://pastebin.com/z34gSLq6
>>>>>>>>>>>>       - mariadb.log -
http://pastebin.com/APFXPBLc
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>> Charles
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/10/2016 12:07, Marius Cornea
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Charles,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Could you please paste the output of
'pcs status' ? The log in
>>>>>>>>>>>>> /var/log/mariadb/mariadb.log might
also be a good indicator.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 11, 2016 at 11:16 AM,
Charles Short
>>>>>>>>>>>>> <cems(a)ebi.ac.uk&gt;
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To add I built my own image from
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> as the images in
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
http://buildlogs.centos.org/centos/7/cloud/x86_64/tripleo_images/newton/d...
>>>>>>>>>>>>>> caused sporadic ramdisk loading
errors  (hung at x% loaded on
>>>>>>>>>>>>>> boot)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does my image now need to be
customised in any way for HA to
>>>>>>>>>>>>>> work?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Charles
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 11/10/2016 09:55, Charles
Short wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am installing Newton with
TripleO on baremetal HP blades.
>>>>>>>>>>>>>>> I can deploy a single
controller stack overcloud no problem,
>>>>>>>>>>>>>>> however
>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>> I choose three controllers
the deployment fails (including
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
/usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml)
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The heat stack error first
complains "Dependency
>>>>>>>>>>>>>>> Exec[galera-ready]
>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>> failures" which in turn
causes lots of other errors.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have deployed Liberty and
Mitaka successfully in the past
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>> baremetal
>>>>>>>>>>>>>>> with three controllers, and
this is the first time I have
>>>>>>>>>>>>>>> seen
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>> error.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Charles
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Charles Short
>>>>>>>>>>>>>> Cloud Engineer
>>>>>>>>>>>>>> Virtualization and Cloud Team
>>>>>>>>>>>>>> European Bioinformatics Institute
(EMBL-EBI)
>>>>>>>>>>>>>> Tel: +44 (0)1223 494205
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>> rdo-list mailing list
>>>>>>>>>>>>>> rdo-list(a)redhat.com
>>>>>>>>>>>>>>
https://www.redhat.com/mailman/listinfo/rdo-list
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To unsubscribe:
rdo-list-unsubscribe(a)redhat.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Charles Short
>>>>>>>>>>>> Cloud Engineer
>>>>>>>>>>>> Virtualization and Cloud Team
>>>>>>>>>>>> European Bioinformatics Institute
(EMBL-EBI)
>>>>>>>>>>>> Tel: +44 (0)1223 494205
>>>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Charles Short
>>>>>>>>> Cloud Engineer
>>>>>>>>> Virtualization and Cloud Team
>>>>>>>>> European Bioinformatics Institute (EMBL-EBI)
>>>>>>>>> Tel: +44 (0)1223 494205
>>>>>>>>>
>>>>>>> --
>>>>>>> Charles Short
>>>>>>> Cloud Engineer
>>>>>>> Virtualization and Cloud Team
>>>>>>> European Bioinformatics Institute (EMBL-EBI)
>>>>>>> Tel: +44 (0)1223 494205
>>>>>>>
>>>> --
>>>> Charles Short
>>>> Cloud Engineer
>>>> Virtualization and Cloud Team
>>>> European Bioinformatics Institute (EMBL-EBI)
>>>> Tel: +44 (0)1223 494205
>>>>
>> --
>> Charles Short
>> Cloud Engineer
>> Virtualization and Cloud Team
>> European Bioinformatics Institute (EMBL-EBI)
>> Tel: +44 (0)1223 494205
>>

 --
 Charles Short
 Cloud Engineer
 Virtualization and Cloud Team
 European Bioinformatics Institute (EMBL-EBI)
 Tel: +44 (0)1223 494205

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [rdo-list] Newton HA galera-ready dependency error