[rdo-list] Newton HA galera-ready dependency error

Damien Ciabrini dciabrin at redhat.com
Wed Oct 12 08:53:26 UTC 2016


Hi Charles,

>From http://pastebin.com/ZK683gZe, I can see that the gcomm address
doesn't match pacemaker's node name.

Could it be that the initial deployment was done with prior to having 
the version of puppet-tripleo installed?

If you want to unblock the galera resource without reinstalling the stack
you could do:

  pcs resource update galera additional_parameters=--open-files-limit=16384 enable_creation=true wsrep_cluster_address=gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2

----- Original Message -----
> [heat-admin at overcloud-controller-0 ~]$ sudo hiera mysql_short_node_names
> ["overcloud-controller-0", "overcloud-controller-1",
> "overcloud-controller-2"]
> 
> On 12/10/2016 09:09, Marius Cornea wrote:
> > That's odd. I encountered the same issue and it was caused by missing
> > this patch.  What do you get if you do sudo hiera
> > mysql_short_node_names on the controller node?
> >
> > On Wed, Oct 12, 2016 at 12:48 AM, Charles Short <cems at ebi.ac.uk> wrote:
> >> Hi,
> >>
> >> puppet-tripleo-5.2.0-0.20161007035759.f32e484.el7.centos.noarch
> >>
> >> The patch seems to be already present -
> >>
> >>    grep short
> >> /usr/share/openstack-puppet/modules/tripleo/manifests/profile/pacemaker/database/mysql.pp
> >>
> >>    # short name which is already registered in pacemaker until we get
> >>    around
> >>    $galera_node_names_lookup = hiera('mysql_short_node_names',
> >> hiera('mysql_node_names', $::hostname))
> >>
> >> Charles
> >>
> >> On 11/10/2016 19:39, Marius Cornea wrote:
> >>> I think the issue is caused by the addresses in wsrep_cluster_address
> >>> not matching the pacemaker node names:
> >>>
> >>> wsrep_cluster_address=gcomm://overcloud-controller-0.internalapi.localdomain,overcloud-controller-1.internalapi.localdomain,overcloud-controller-2.internalapi.localdomain
> >>>
> >>> Could you please confirm what version of puppet-tripleo you've got
> >>> installed on the overcloud nodes and if it contains the following
> >>> patch:
> >>> https://review.openstack.org/#/c/382883/1/manifests/profile/pacemaker/database/mysql.pp
> >>>
> >>> Thanks,
> >>> Marius
> >>>
> >>> On Tue, Oct 11, 2016 at 7:42 PM, Charles Short <cems at ebi.ac.uk> wrote:
> >>>> Ok install finished with same error
> >>>> The latest pcs status etc
> >>>>
> >>>> http://pastebin.com/ZK683gZe
> >>>>
> >>>>
> >>>> On 11/10/2016 17:35, Charles Short wrote:
> >>>>> Deployment almost finished...so
> >>>>>
> >>>>>
> >>>>> http://pastebin.com/zE9B19XB
> >>>>>
> >>>>> This shows the pcs status as the deployment nears the end, and pcs
> >>>>> resource show galera
> >>>>>
> >>>>> Charles
> >>>>>
> >>>>> On 11/10/2016 16:59, Marius Cornea wrote:
> >>>>>> Great, thanks for checking this.
> >>>>>>
> >>>>>> On Tue, Oct 11, 2016 at 5:58 PM, Charles Short <cems at ebi.ac.uk> wrote:
> >>>>>>> Currently having more generic deployment issues (no valid host found
> >>>>>>> etc).
> >>>>>>> I can work around/solve these.
> >>>>>>> I don't yet have another stack to analyse, but will do soon.
> >>>>>>>
> >>>>>>> Charles
> >>>>>>>
> >>>>>>>
> >>>>>>> On 11/10/2016 16:35, Marius Cornea wrote:
> >>>>>>>> Did it succeed in bringing the Galera nodes to Master? You can ssh
> >>>>>>>> to
> >>>>>>>> the nodes and run 'pcs resource show galera' even though the
> >>>>>>>> deployment hasn't finished. I'm interested to see how the
> >>>>>>>> wsrep_cluster_address is set to see if it's affected by the resource
> >>>>>>>> agent issue described in
> >>>>>>>> https://bugs.launchpad.net/tripleo/+bug/1628521
> >>>>>>>>
> >>>>>>>> On Tue, Oct 11, 2016 at 5:18 PM, Charles Short <cems at ebi.ac.uk>
> >>>>>>>> wrote:
> >>>>>>>>> Looks similar to this bug (still waiting on deployment to finish)
> >>>>>>>>>
> >>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1368214
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 11/10/2016 15:25, Charles Short wrote:
> >>>>>>>>>> Sorry for the delay.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Just redeploying to make sure I can repeat the same error. Should
> >>>>>>>>>> not
> >>>>>>>>>> be
> >>>>>>>>>> long.
> >>>>>>>>>>
> >>>>>>>>>> Charles
> >>>>>>>>>>
> >>>>>>>>>> On 11/10/2016 14:24, Marius Cornea wrote:
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> Could you also please paste the output for 'pcs resource show
> >>>>>>>>>>> galera',
> >>>>>>>>>>> it looks that all the galera nodes show up as slaves?
> >>>>>>>>>>>
> >>>>>>>>>>>       Master/Slave Set: galera-master [galera]
> >>>>>>>>>>>           Slaves: [ overcloud-controller-0 overcloud-controller-1
> >>>>>>>>>>> overcloud-controller-2 ]
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Oct 11, 2016 at 2:16 PM, Charles Short <cems at ebi.ac.uk>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Here you are -
> >>>>>>>>>>>>
> >>>>>>>>>>>>       - Heat stack error  - http://pastebin.com/E8KZa2vE
> >>>>>>>>>>>>       - PCS status  - http://pastebin.com/z34gSLq6
> >>>>>>>>>>>>       - mariadb.log - http://pastebin.com/APFXPBLc
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks
> >>>>>>>>>>>>
> >>>>>>>>>>>> Charles
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 11/10/2016 12:07, Marius Cornea wrote:
> >>>>>>>>>>>>> Hi Charles,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Could you please paste the output of 'pcs status' ? The log in
> >>>>>>>>>>>>> /var/log/mariadb/mariadb.log might also be a good indicator.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Oct 11, 2016 at 11:16 AM, Charles Short
> >>>>>>>>>>>>> <cems at ebi.ac.uk>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>> To add I built my own image from
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> as the images in
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> http://buildlogs.centos.org/centos/7/cloud/x86_64/tripleo_images/newton/delorean/
> >>>>>>>>>>>>>> caused sporadic ramdisk loading errors  (hung at x% loaded on
> >>>>>>>>>>>>>> boot)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Does my image now need to be customised in any way for HA to
> >>>>>>>>>>>>>> work?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Charles
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 11/10/2016 09:55, Charles Short wrote:
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I am installing Newton with TripleO on baremetal HP blades.
> >>>>>>>>>>>>>>> I can deploy a single controller stack overcloud no problem,
> >>>>>>>>>>>>>>> however
> >>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>> I choose three controllers the deployment fails (including
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml)
> >>>>>>>>>>>>>>> .
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The heat stack error first complains "Dependency
> >>>>>>>>>>>>>>> Exec[galera-ready]
> >>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>> failures" which in turn causes lots of other errors.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I have deployed Liberty and Mitaka successfully in the past
> >>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>> baremetal
> >>>>>>>>>>>>>>> with three controllers, and this is the first time I have
> >>>>>>>>>>>>>>> seen
> >>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>> error.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Charles
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Charles Short
> >>>>>>>>>>>>>> Cloud Engineer
> >>>>>>>>>>>>>> Virtualization and Cloud Team
> >>>>>>>>>>>>>> European Bioinformatics Institute (EMBL-EBI)
> >>>>>>>>>>>>>> Tel: +44 (0)1223 494205
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>> rdo-list mailing list
> >>>>>>>>>>>>>> rdo-list at redhat.com
> >>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/rdo-list
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> To unsubscribe: rdo-list-unsubscribe at redhat.com
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Charles Short
> >>>>>>>>>>>> Cloud Engineer
> >>>>>>>>>>>> Virtualization and Cloud Team
> >>>>>>>>>>>> European Bioinformatics Institute (EMBL-EBI)
> >>>>>>>>>>>> Tel: +44 (0)1223 494205
> >>>>>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Charles Short
> >>>>>>>>> Cloud Engineer
> >>>>>>>>> Virtualization and Cloud Team
> >>>>>>>>> European Bioinformatics Institute (EMBL-EBI)
> >>>>>>>>> Tel: +44 (0)1223 494205
> >>>>>>>>>
> >>>>>>> --
> >>>>>>> Charles Short
> >>>>>>> Cloud Engineer
> >>>>>>> Virtualization and Cloud Team
> >>>>>>> European Bioinformatics Institute (EMBL-EBI)
> >>>>>>> Tel: +44 (0)1223 494205
> >>>>>>>
> >>>> --
> >>>> Charles Short
> >>>> Cloud Engineer
> >>>> Virtualization and Cloud Team
> >>>> European Bioinformatics Institute (EMBL-EBI)
> >>>> Tel: +44 (0)1223 494205
> >>>>
> >> --
> >> Charles Short
> >> Cloud Engineer
> >> Virtualization and Cloud Team
> >> European Bioinformatics Institute (EMBL-EBI)
> >> Tel: +44 (0)1223 494205
> >>
> 
> --
> Charles Short
> Cloud Engineer
> Virtualization and Cloud Team
> European Bioinformatics Institute (EMBL-EBI)
> Tel: +44 (0)1223 494205
> 
> _______________________________________________
> rdo-list mailing list
> rdo-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rdo-list
> 
> To unsubscribe: rdo-list-unsubscribe at redhat.com
> 




More information about the dev mailing list