Re: [rdo-list] Overcloud pacemaker services restart behavior causes downtime

Thursday, 4 August 2016

Not on this fresh install, but what I saw few weeks back was that when controller nodes
restart, I see services created with FQDN names that were up and I was able to safely
clean the original services with short host names.  But I haven’t re-tested controller
restarts afterwards.

With my fresh install, rabbitmq is not coming up upon reboot (‘unknown error’ (1)), so I
need to fix this first before I’m able to proceed with testing.   I’ll let you know how it
goes.

Ilja

From: Pedro Sousa [mailto:pgsousa@gmail.com]
Sent: Thursday, August 04, 2016 11:23 AM
To: Ilja Maslov <imaslov(a)dispersivegroup.com&gt;
Cc: Raoul Scarazzini <rasca(a)redhat.com>; rdo-list <rdo-list(a)redhat.com&gt;
Subject: Re: [rdo-list] Overcloud pacemaker services restart behavior causes downtime

Hi Ilja,

I noticed that too. Did you try to delete the services that are marked down and retest?

Thanks

On Thu, Aug 4, 2016 at 4:12 PM, Ilja Maslov
<imaslov@dispersivegroup.com<mailto:imaslov@dispersivegroup.com>> wrote:
Hi,

I've noticed similar behavior on Mitaka installed from trunk/mitaka/passed-ci. 
Appreciate if you could put me in CC.

Additional detail is that during initial deployment, nova services, neutron agents and
heat engines are registered with the short hostnames and upon controller node restart,
these will all show with state=down.  Probably because hosts files are re-written after
the services had been started with FQDN as a first entry.  I do not know to what extent
pacemaker resources are monitored, but it could be related to the problem you are
reporting.

Cheers,
Ilja

-----Original Message-----
From: rdo-list-bounces@redhat.com<mailto:rdo-list-bounces@redhat.com>
[mailto:rdo-list-bounces@redhat.com<mailto:rdo-list-bounces@redhat.com>] On Behalf
Of Raoul Scarazzini
Sent: Thursday, August 04, 2016 9:31 AM
To: Pedro Sousa <pgsousa@gmail.com<mailto:pgsousa@gmail.com>>
Cc: rdo-list <rdo-list@redhat.com<mailto:rdo-list@redhat.com>>
Subject: Re: [rdo-list] Overcloud pacemaker services restart behavior causes downtime

That will be great, thank you, put me in CC so I can follow this.

Thanks,

--
Raoul Scarazzini
rasca@redhat.com<mailto:rasca@redhat.com>

On 04/08/2016 15:29, Pedro Sousa wrote:
...
 Hi Raoul,

 this only happens when the node comes back online after booting. When I
 stop the node with "pcs cluster stop", everything works fine, even if
 VIP is active on that node.

 Anyway I will file a bugzilla.

 Thanks

 On Thu, Aug 4, 2016 at 1:51 PM, Raoul Scarazzini
<rasca@redhat.com<mailto:rasca@redhat.com>
 <mailto:rasca@redhat.com<mailto:rasca@redhat.com>>> wrote:

     Ok, so we are on mitaka. Here we have VIPs that are a (Optional)
     dependency for haproxy, which is a (Mandatory) dependency for
     openstack-core from which all the others (nova, neutron, cinder and so
     on) depends.
     This means that if you are rebooting a controller in which a VIP is
     active you will NOT have a restart of openstack-core since haproxy will
     not be restarted, because of the OPTIONAL constraint.
     So the behavior you're describing is quite strange.
     Maybe other components are in the game here. Can you open a bugzilla
     with the exact steps you're using to reproduce the problem and share the
     sosreports of your systems?

     Thanks,

     --
     Raoul Scarazzini
     rasca@redhat.com<mailto:rasca@redhat.com>
<mailto:rasca@redhat.com<mailto:rasca@redhat.com>>

     On 04/08/2016 12:34, Pedro Sousa wrote:
     > Hi,
     >
     > I use mitaka from centos sig repos:
     >
     > Centos 7.2
     > centos-release-openstack-mitaka-1-3.el7.noarch
     > pacemaker-cli-1.1.13-10.el7_2.2.x86_64
     > pacemaker-1.1.13-10.el7_2.2.x86_64
     > pacemaker-remote-1.1.13-10.el7_2.2.x86_64
     > pacemaker-cluster-libs-1.1.13-10.el7_2.2.x86_64
     > pacemaker-libs-1.1.13-10.el7_2.2.x86_64
     > corosynclib-2.3.4-7.el7_2.3.x86_64
     > corosync-2.3.4-7.el7_2.3.x86_64
     > resource-agents-3.9.5-54.el7_2.10.x86_64
     >
     > Let me know if you need more info.
     >
     > Thanks
     >
     >
     >
     > On Thu, Aug 4, 2016 at 11:21 AM, Raoul Scarazzini
<rasca@redhat.com<mailto:rasca@redhat.com>
<mailto:rasca@redhat.com<mailto:rasca@redhat.com>>
     > <mailto:rasca@redhat.com<mailto:rasca@redhat.com>
<mailto:rasca@redhat.com<mailto:rasca@redhat.com>>>> wrote:
     >
     >     Hi,
     >     can you please give us more information about the environment you are
     >     using? Release, package versions and so on.
     >
     >     --
     >     Raoul Scarazzini
     >     rasca@redhat.com<mailto:rasca@redhat.com>
<mailto:rasca@redhat.com<mailto:rasca@redhat.com>>
     <mailto:rasca@redhat.com<mailto:rasca@redhat.com>
<mailto:rasca@redhat.com<mailto:rasca@redhat.com>>>
     >
     >     On 04/08/2016 11:34, Pedro Sousa wrote:
     >     > Hi all,
     >     >
     >     > I have an overcloud with 3 controller nodes, everything is
     working fine,
     >     > the problem is when I reboot one of the controllers. When
     the node comes
     >     > online, all the services (nova-api, neutron-server) on the
     other nodes
     >     > are also restarted, causing a couple of minutes of downtime
     until
     >     > everything is recovered.
     >     >
     >     > In the example below I restarted controller2 and I see these
     messages on
     >     > controller0. My question is if this is the expected
     behavior, because in
     >     > my opinion it shouldn't happen.
     >     >
     >     > *Authorization Failed: Service Unavailable (HTTP 503)*
     >     > *== Glance images ==*
     >     > *Service Unavailable (HTTP 503)*
     >     > *== Nova managed services ==*
     >     > *No handlers could be found for logger
     >     "keystoneauth.identity.generic.base"*
     >     > *ERROR (ServiceUnavailable): Service Unavailable (HTTP 503)*
     >     > *== Nova networks ==*
     >     > *No handlers could be found for logger
     >     "keystoneauth.identity.generic.base"*
     >     > *ERROR (ServiceUnavailable): Service Unavailable (HTTP 503)*
     >     > *== Nova instance flavors ==*
     >     > *No handlers could be found for logger
     >     "keystoneauth.identity.generic.base"*
     >     > *ERROR (ServiceUnavailable): Service Unavailable (HTTP 503)*
     >     > *== Nova instances ==*
     >     > *No handlers could be found for logger
     >     "keystoneauth.identity.generic.base"*
     >     > *ERROR (ServiceUnavailable): Service Unavailable (HTTP 503)*
     >     > *[root@overcloud-controller-0 ~]# openstack-status *
     >     > *Broadcast message from
     >     >
systemd-journald@overcloud-controller-0.localdomain<mailto:systemd-journald@overcloud-controller-0.localdomain>
(Thu
     2016-08-04
     >     > 09:22:31 UTC):*
     >     > *
     >     > *
     >     > *haproxy[2816]: proxy neutron has no server available!*
     >     >
     >     > Thanks,
     >     > Pedro Sousa
     >     >
     >     >
     >     >
     >     >
     >     > _______________________________________________
     >     > rdo-list mailing list
     >     > rdo-list@redhat.com<mailto:rdo-list@redhat.com>
<mailto:rdo-list@redhat.com<mailto:rdo-list@redhat.com>>
     <mailto:rdo-list@redhat.com<mailto:rdo-list@redhat.com>
<mailto:rdo-list@redhat.com<mailto:rdo-list@redhat.com>>>
     >     > https://www.redhat.com/mailman/listinfo/rdo-list
     >     >
     >     > To unsubscribe:
rdo-list-unsubscribe@redhat.com<mailto:rdo-list-unsubscribe@redhat.com>
<mailto:rdo-list-unsubscribe@redhat.com<mailto:rdo-list-unsubscribe@redhat.com>>
     >    
<mailto:rdo-list-unsubscribe@redhat.com<mailto:rdo-list-unsubscribe@redhat.com>

<mailto:rdo-list-unsubscribe@redhat.com<mailto:rdo-list-unsubscribe@redhat.com>>>
     >     >
     >
     >

_______________________________________________
rdo-list mailing list
rdo-list@redhat.com<mailto:rdo-list@redhat.com>
https://www.redhat.com/mailman/listinfo/rdo-list

To unsubscribe:
rdo-list-unsubscribe@redhat.com<mailto:rdo-list-unsubscribe@redhat.com>

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [rdo-list] Overcloud pacemaker services restart behavior causes downtime