[rdo-list] Overcloud pacemaker services restart behavior causes downtime
Pedro Sousa
pgsousa at gmail.com
Thu Aug 4 16:02:26 UTC 2016
Hi,
I've deleted the nova and neutron services but the issue persists, so I
guess it's not related.
Filing the sosreport.
Thanks
On Thu, Aug 4, 2016 at 4:32 PM, Ilja Maslov <imaslov at dispersivegroup.com>
wrote:
> Not on this fresh install, but what I saw few weeks back was that when
> controller nodes restart, I see services created with FQDN names that were
> up and I was able to safely clean the original services with short host
> names. But I haven’t re-tested controller restarts afterwards.
>
>
>
> With my fresh install, rabbitmq is not coming up upon reboot (‘unknown
> error’ (1)), so I need to fix this first before I’m able to proceed with
> testing. I’ll let you know how it goes.
>
>
>
> Ilja
>
>
>
> *From:* Pedro Sousa [mailto:pgsousa at gmail.com]
> *Sent:* Thursday, August 04, 2016 11:23 AM
> *To:* Ilja Maslov <imaslov at dispersivegroup.com>
> *Cc:* Raoul Scarazzini <rasca at redhat.com>; rdo-list <rdo-list at redhat.com>
>
> *Subject:* Re: [rdo-list] Overcloud pacemaker services restart behavior
> causes downtime
>
>
>
> Hi Ilja,
>
>
>
> I noticed that too. Did you try to delete the services that are marked
> down and retest?
>
>
>
> Thanks
>
>
>
> On Thu, Aug 4, 2016 at 4:12 PM, Ilja Maslov <imaslov at dispersivegroup.com>
> wrote:
>
> Hi,
>
> I've noticed similar behavior on Mitaka installed from
> trunk/mitaka/passed-ci. Appreciate if you could put me in CC.
>
> Additional detail is that during initial deployment, nova services,
> neutron agents and heat engines are registered with the short hostnames and
> upon controller node restart, these will all show with state=down.
> Probably because hosts files are re-written after the services had been
> started with FQDN as a first entry. I do not know to what extent pacemaker
> resources are monitored, but it could be related to the problem you are
> reporting.
>
> Cheers,
> Ilja
>
>
>
> -----Original Message-----
> From: rdo-list-bounces at redhat.com [mailto:rdo-list-bounces at redhat.com] On
> Behalf Of Raoul Scarazzini
> Sent: Thursday, August 04, 2016 9:31 AM
> To: Pedro Sousa <pgsousa at gmail.com>
> Cc: rdo-list <rdo-list at redhat.com>
> Subject: Re: [rdo-list] Overcloud pacemaker services restart behavior
> causes downtime
>
> That will be great, thank you, put me in CC so I can follow this.
>
> Thanks,
>
> --
> Raoul Scarazzini
> rasca at redhat.com
>
> On 04/08/2016 15:29, Pedro Sousa wrote:
> > Hi Raoul,
> >
> > this only happens when the node comes back online after booting. When I
> > stop the node with "pcs cluster stop", everything works fine, even if
> > VIP is active on that node.
> >
> > Anyway I will file a bugzilla.
> >
> > Thanks
> >
> >
> >
> >
> > On Thu, Aug 4, 2016 at 1:51 PM, Raoul Scarazzini <rasca at redhat.com
> > <mailto:rasca at redhat.com>> wrote:
> >
> > Ok, so we are on mitaka. Here we have VIPs that are a (Optional)
> > dependency for haproxy, which is a (Mandatory) dependency for
> > openstack-core from which all the others (nova, neutron, cinder and
> so
> > on) depends.
> > This means that if you are rebooting a controller in which a VIP is
> > active you will NOT have a restart of openstack-core since haproxy
> will
> > not be restarted, because of the OPTIONAL constraint.
> > So the behavior you're describing is quite strange.
> > Maybe other components are in the game here. Can you open a bugzilla
> > with the exact steps you're using to reproduce the problem and share
> the
> > sosreports of your systems?
> >
> > Thanks,
> >
> > --
> > Raoul Scarazzini
> > rasca at redhat.com <mailto:rasca at redhat.com>
> >
> > On 04/08/2016 12:34, Pedro Sousa wrote:
> > > Hi,
> > >
> > > I use mitaka from centos sig repos:
> > >
> > > Centos 7.2
> > > centos-release-openstack-mitaka-1-3.el7.noarch
> > > pacemaker-cli-1.1.13-10.el7_2.2.x86_64
> > > pacemaker-1.1.13-10.el7_2.2.x86_64
> > > pacemaker-remote-1.1.13-10.el7_2.2.x86_64
> > > pacemaker-cluster-libs-1.1.13-10.el7_2.2.x86_64
> > > pacemaker-libs-1.1.13-10.el7_2.2.x86_64
> > > corosynclib-2.3.4-7.el7_2.3.x86_64
> > > corosync-2.3.4-7.el7_2.3.x86_64
> > > resource-agents-3.9.5-54.el7_2.10.x86_64
> > >
> > > Let me know if you need more info.
> > >
> > > Thanks
> > >
> > >
> > >
> > > On Thu, Aug 4, 2016 at 11:21 AM, Raoul Scarazzini <
> rasca at redhat.com <mailto:rasca at redhat.com>
> > > <mailto:rasca at redhat.com <mailto:rasca at redhat.com>>> wrote:
> > >
> > > Hi,
> > > can you please give us more information about the environment
> you are
> > > using? Release, package versions and so on.
> > >
> > > --
> > > Raoul Scarazzini
> > > rasca at redhat.com <mailto:rasca at redhat.com>
> > <mailto:rasca at redhat.com <mailto:rasca at redhat.com>>
> > >
> > > On 04/08/2016 11:34, Pedro Sousa wrote:
> > > > Hi all,
> > > >
> > > > I have an overcloud with 3 controller nodes, everything is
> > working fine,
> > > > the problem is when I reboot one of the controllers. When
> > the node comes
> > > > online, all the services (nova-api, neutron-server) on the
> > other nodes
> > > > are also restarted, causing a couple of minutes of downtime
> > until
> > > > everything is recovered.
> > > >
> > > > In the example below I restarted controller2 and I see these
> > messages on
> > > > controller0. My question is if this is the expected
> > behavior, because in
> > > > my opinion it shouldn't happen.
> > > >
> > > > *Authorization Failed: Service Unavailable (HTTP 503)*
> > > > *== Glance images ==*
> > > > *Service Unavailable (HTTP 503)*
> > > > *== Nova managed services ==*
> > > > *No handlers could be found for logger
> > > "keystoneauth.identity.generic.base"*
> > > > *ERROR (ServiceUnavailable): Service Unavailable (HTTP 503)*
> > > > *== Nova networks ==*
> > > > *No handlers could be found for logger
> > > "keystoneauth.identity.generic.base"*
> > > > *ERROR (ServiceUnavailable): Service Unavailable (HTTP 503)*
> > > > *== Nova instance flavors ==*
> > > > *No handlers could be found for logger
> > > "keystoneauth.identity.generic.base"*
> > > > *ERROR (ServiceUnavailable): Service Unavailable (HTTP 503)*
> > > > *== Nova instances ==*
> > > > *No handlers could be found for logger
> > > "keystoneauth.identity.generic.base"*
> > > > *ERROR (ServiceUnavailable): Service Unavailable (HTTP 503)*
> > > > *[root at overcloud-controller-0 ~]# openstack-status *
> > > > *Broadcast message from
> > > > systemd-journald at overcloud-controller-0.localdomain (Thu
> > 2016-08-04
> > > > 09:22:31 UTC):*
> > > > *
> > > > *
> > > > *haproxy[2816]: proxy neutron has no server available!*
> > > >
> > > > Thanks,
> > > > Pedro Sousa
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > rdo-list mailing list
> > > > rdo-list at redhat.com <mailto:rdo-list at redhat.com>
> > <mailto:rdo-list at redhat.com <mailto:rdo-list at redhat.com>>
> > > > https://www.redhat.com/mailman/listinfo/rdo-list
> > > >
> > > > To unsubscribe: rdo-list-unsubscribe at redhat.com <mailto:
> rdo-list-unsubscribe at redhat.com>
> > > <mailto:rdo-list-unsubscribe at redhat.com
> > <mailto:rdo-list-unsubscribe at redhat.com>>
> > > >
> > >
> > >
> >
> >
>
> _______________________________________________
> rdo-list mailing list
> rdo-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rdo-list
>
> To unsubscribe: rdo-list-unsubscribe at redhat.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20160804/91af190b/attachment.html>
More information about the dev
mailing list