[rdo-list] Instack-virt-setup vs TripleO QuickStart in regards of managing HA PCS/Corosync cluster via pcs CLI

Tue Aug 23 17:42:32 UTC 2016

________________________________
From: Raoul Scarazzini <rasca at redhat.com>
Sent: Tuesday, August 23, 2016 11:29 AM
To: Boris Derzhavets; Wesley Hayutin; Attila Darazs
Cc: rdo-list
Subject: Re: [rdo-list] Instack-virt-setup vs TripleO QuickStart in regards of managing HA PCS/Corosync cluster via pcs CLI

Hi Boris,
so, for what I see the pcs commands that stops and starts the cluster on
the rebooted node should not be used. It can happen that a service fails
to start but we need to investigate why from the logs.

Remember that cleaning up resources can be useful if we know what
happened, but using it repeatedly makes no sense. In addition remember
that you can use just "pcs resource cleanup" to cleanup the entire
cluster status and in some way "start from the beginning".

Now, about this specific problem we need to understand what is happening
here. Correct me if I'm wrong:

1) We have a clean env in which we reboot a node;
    That is correct

2) The nodes comes up, but some resources fails;
    All resources fail

3) After some cleanups the env becomes clean again;

    a)   If VENV is setup by instack-virt-setup ( official guide )
          Mentioned script start.sh works right a way . It comes as well from official guide.

     b) if VENV is setup by Tripleo QuickStart ( where undecloud.qcow2 gets uploaded
          to libvirt pool already having overcloud images integrated per Jon's Video explanation
          QuickStart CI  vs Tripleo CI  )
          then ( via my experience )   before attempting start.sh I MUST  restart PCS Cluster
          on bounced Controller-X , then invoke `. ./start.sh`  ( not simply ./start.sh )
              Pretty often second run start.sh is required from another controller-Y.
          Some times I cannot fix it in script mode and have manually run commands giving
          delay more the 10 sec.  So finally ( about 25 tests passed) I get `pcs status` OK.
          In other words all service are up and running on every controller-X,Y,Z

  Details :-
  http://bderzhavets.blogspot.ru/2016/08/emulation-rdo-triple0-quickstart-ha.html
[https://3.bp.blogspot.com/-xtnWVVrV2cs/V7nQssWM9pI/AAAAAAAAHBw/DrYHJeCNEO4nTCigqZpgt4P7iwgmKekhQCLcB/w1200-h630-p-nu/Screenshot%2Bfrom%2B2016-08-21%2B19-00-24.png]<http://bderzhavets.blogspot.ru/2016/08/emulation-rdo-triple0-quickstart-ha.html>

Xen Virtualization on Linux and Solaris: Emulation Triple0 QuickStart HA Controller's Cluster failover<http://bderzhavets.blogspot.ru/2016/08/emulation-rdo-triple0-quickstart-ha.html>
bderzhavets.blogspot.ru

Is this the sequence of operations you are using? Is the problem
systematic and can we reproduce it?
>
YES
>
Can we grab sosreports from the
machine involved?
>
Instruct me how to do this ?
>
Most important question: which OpenStack version are
you testing?
>
Mitaka stable  :-

[tripleo-quickstart at stack] $ bash quickstart --config ./ha.yml $VIRTHOST
By default , no --release specified Mitaka Delorean trunks get selected
Just check /etc/yum.repos.d/ for delorean.repos
quickstart places on   undercloud  when it exits asking to to connect to undercloud
>
Boris
--
Raoul Scarazzini
rasca at redhat.com

On 22/08/2016 13:49, Boris Derzhavets wrote:
>
> Sorry , for my English
>
> I was also keeping (not kept ) track on Galera DB via `clustercheck`
>
> either I just kept.
>
>
> Boris
>
>
> ------------------------------------------------------------------------
> *From:* rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on
> behalf of Boris Derzhavets <bderzhavets at hotmail.com>
> *Sent:* Monday, August 22, 2016 7:29 AM
> *To:* Raoul Scarazzini; Wesley Hayutin; Attila Darazs
> *Cc:* rdo-list
> *Subject:* Re: [rdo-list] Instack-virt-setup vs TripleO QuickStart in
> regards of managing HA PCS/Corosync cluster via pcs CLI
>
>
>
>
>
> ------------------------------------------------------------------------
> *From:* Raoul Scarazzini <rasca at redhat.com>
> *Sent:* Monday, August 22, 2016 3:51 AM
> *To:* Wesley Hayutin; Boris Derzhavets; Attila Darazs
> *Cc:* David Moreau Simard; rdo-list
> *Subject:* Re: [rdo-list] Instack-virt-setup vs TripleO QuickStart in
> regards of managing HA PCS/Corosync cluster via pcs CLI
>
> Hi everybody,
> sorry for the late response but I was on PTO. I don't understand the
> meaning of the cleanup commands, but maybe it's just because I'm not
> getting the whole picture.
>
>>
> I have to confirm that fault was mine PCS CLI is working on TripeO
> QuickStart
> but requires pcs cluster restart on particular node which  went down
> via ` nova stop controller-X`  and was brought up via `nova start
> controller-X`
> Details here :-
>
> http://bderzhavets.blogspot.ru/2016/08/emulation-rdo-triple0-quickstart-ha.html
[https://3.bp.blogspot.com/-xtnWVVrV2cs/V7nQssWM9pI/AAAAAAAAHBw/DrYHJeCNEO4nTCigqZpgt4P7iwgmKekhQCLcB/w1200-h630-p-nu/Screenshot%2Bfrom%2B2016-08-21%2B19-00-24.png]<http://bderzhavets.blogspot.ru/2016/08/emulation-rdo-triple0-quickstart-ha.html>

Xen Virtualization on Linux and Solaris: Emulation Triple0 QuickStart HA Controller's Cluster failover<http://bderzhavets.blogspot.ru/2016/08/emulation-rdo-triple0-quickstart-ha.html>
bderzhavets.blogspot.ru

>
> VENV been set up with instack-virt-setup doesn't require ( on bounced
> Controller node )
>
> # pcs cluster stop
> # pcs cluster start
>
> Before issuing start.sh
>
> #!/bash -x
> pcs resource cleanup rabbitmq-clone ;
> sleep 10
> pcs resource cleanup neutron-server-clone ;
> sleep 10
> pcs resource cleanup openstack-nova-api-clone ;
> sleep 10
> pcs resource cleanup openstack-nova-consoleauth-clone ;
> sleep 10
> pcs resource cleanup openstack-heat-engine-clone ;
> sleep 10
> pcs resource cleanup openstack-cinder-api-clone ;
> sleep 10
> pcs resource cleanup openstack-glance-registry-clone ;
> sleep 10
> pcs resource cleanup httpd-clone ;
>
> # .  ./start.sh
>
> In worse case scenario I have to issue start.sh   twice from different
> Controllers
> pcs resource cleanup openstack-nova-api-clone  attempts to start
> corresponding
> service , which is down at the moment.  In fact two cleanups above start all
> Nova Services   && one neutron cleanup starts all neutron agents as well.
> I was also kept   track of Galera DB via `clustercheck`
>
> Thanks.
> Boris
>>
>
>
> I guess we're hitting a version problem here: if you deploy the actual
> master (i.e. with quickstart) you'll get the environment with the
> constraints limited to the core services because of [1] and [2] (so none
> of the mentioned services exists in the cluster configuration).
>
> Hope this helps,
>
> [1] https://review.openstack.org/#/c/314208/
> [2] https://review.openstack.org/#/c/342650/
>
> --
> Raoul Scarazzini
> rasca at redhat.com
>
> On 08/08/2016 14:43, Wesley Hayutin wrote:
>> Attila, Raoul
>> Can you please investigate this issue.
>>
>> Thanks!
>>
>> On Sun, Aug 7, 2016 at 3:52 AM, Boris Derzhavets
>> <bderzhavets at hotmail.com <mailto:bderzhavets at hotmail.com>> wrote:
>>
>>     TripleO HA Controller been installed via instack-virt-setup  has PCS
>>     CLI like :-
>>
>>     pcs resource cleanup neutron-server-clone
>>     pcs resource cleanup openstack-nova-api-clone
>>     pcs resource cleanup openstack-nova-consoleauth-clone
>>     pcs resource cleanup openstack-heat-engine-clone
>>     pcs resource cleanup openstack-cinder-api-clone
>>     pcs resource cleanup openstack-glance-registry-clone
>>     pcs resource cleanup httpd-clone
>>
>>     been working  as expected on bare metal
>>
>>
>>     Same cluster been setup via QuickStart  (Virtual ENV) after bouncing
>>     one of controllers
>>
>>     included in cluster ignores PCS CLI at least via my experience (
>>     which is obviously limited
>>
>>     either format of particular commands is wrong for QuickStart )
>>
>>     I believe that dropping (complete replacing ) instack-virt-setup is
>>     not a good idea in general. Personally, I believe that like in case
>>     with packstack it is always good
>>
>>     to have VENV configuration been tested before going to bare metal
>>     deployment.
>>
>>     My major concern is maintenance and disaster recovery tests , rather
>>     then deployment itself . What good is for me TripleO Quickstart
>>     running on bare metal if I cannot replace
>>
>>     crashed VM Controller just been limited to Services HA ( all 3
>>     Cluster VMs running on single
>>
>>     bare metal node )
>>
>>
>>     Thanks
>>
>>     Boris.
>>
>>
>>
>>
>>
>>     ------------------------------------------------------------------------
>>
>>
>>
>>     _______________________________________________
>>     rdo-list mailing list
>>     rdo-list at redhat.com <mailto:rdo-list at redhat.com>
>>     https://www.redhat.com/mailman/listinfo/rdo-list
>>     <https://www.redhat.com/mailman/listinfo/rdo-list>
>>
>>     To unsubscribe: rdo-list-unsubscribe at redhat.com
>>     <mailto:rdo-list-unsubscribe at redhat.com>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20160823/ea844a53/attachment.html>