[rdo-dev] tripleo cluster failure

Rahul Pathak rpathak at i2k2.com
Thu Jul 23 11:37:53 UTC 2020


Hi, 

When I am trying to create new projects and one network for each projects after 70 or 80 projects and networks in OC. Controller HA availablitliy failed with below error. 

[stack at director LogTool_Python2]$ ssh heat-admin at 192.168.100.28 "sudo pcs status" 
Cluster name: tripleo_cluster 
Stack: corosync 
Current DC: overcloud-controller-1 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum 
Last updated: Thu Jul 23 17:00:22 2020 
Last change: Wed Jul 22 14:35:34 2020 by hacluster via crmd on overcloud-controller-2 

12 nodes configured 
37 resources configured 

Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 
GuestOnline: [ galera-bundle-0 at overcloud-controller-0 galera-bundle-1 at overcloud-controller-1 galera-bundle-2 at overcloud-controller-2 rabbitmq-bundle-0 at overcloud-controller-0 rabbitmq-bundle-1 at overcloud-controller-1 rabbitmq-bundle-2 at overcloud-controller-2 redis-bundle-0 at overcloud-controller-0 redis-bundle-1 at overcloud-controller-1 redis-bundle-2 at overcloud-controller-2 ] 

Full list of resources: 

Docker container set: rabbitmq-bundle [192.168.100.1:8787/tripleorocky/centos-binary-rabbitmq:pcmklatest] 
rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-0 
rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-1 
rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-2 
Docker container set: galera-bundle [192.168.100.1:8787/tripleorocky/centos-binary-mariadb:pcmklatest] 
galera-bundle-0 (ocf::heartbeat:galera): Master overcloud-controller-0 
galera-bundle-1 (ocf::heartbeat:galera): Master overcloud-controller-1 
galera-bundle-2 (ocf::heartbeat:galera): FAILED Master overcloud-controller-2 (blocked) 
Docker container set: redis-bundle [192.168.100.1:8787/tripleorocky/centos-binary-redis:pcmklatest] 
redis-bundle-0 (ocf::heartbeat:redis): Master overcloud-controller-0 
redis-bundle-1 (ocf::heartbeat:redis): Slave overcloud-controller-1 
redis-bundle-2 (ocf::heartbeat:redis): Slave overcloud-controller-2 
ip-192.168.100.98 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 
ip-10.10.0.11 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 
ip-192.168.102.185 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 
ip-192.168.102.116 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 
ip-192.168.103.187 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 
ip-192.168.104.127 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 
Docker container set: haproxy-bundle [192.168.100.1:8787/tripleorocky/centos-binary-haproxy:pcmklatest] 
haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-0 
haproxy-bundle-docker-1 (ocf::heartbeat:docker): Started overcloud-controller-1 
haproxy-bundle-docker-2 (ocf::heartbeat:docker): Started overcloud-controller-2 
Docker container: openstack-cinder-volume [192.168.100.1:8787/tripleorocky/centos-binary-cinder-volume:pcmklatest] 
openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-0 

Failed Resource Actions: 
* redis-bundle-docker-1_monitor_60000 on overcloud-controller-1 'unknown error' (1): call=132, status=Timed Out, exitreason='', 
last-rc-change='Thu Jul 23 16:42:15 2020', queued=0ms, exec=0ms 
* galera-bundle-docker-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=41, status=Timed Out, exitreason='', 
last-rc-change='Thu Jul 23 16:48:39 2020', queued=0ms, exec=0ms 
* redis-bundle-docker-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=62, status=Timed Out, exitreason='', 
last-rc-change='Thu Jul 23 16:48:39 2020', queued=0ms, exec=0ms 
* haproxy-bundle-docker-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=106, status=Timed Out, exitreason='', 
last-rc-change='Thu Jul 23 16:48:39 2020', queued=0ms, exec=0ms 
* rabbitmq-bundle-docker-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=121, status=Timed Out, exitreason='', 
last-rc-change='Thu Jul 23 16:48:39 2020', queued=0ms, exec=0ms 
* galera_promote_0 on galera-bundle-2 'unknown error' (1): call=43, status=complete, exitreason='MySQL server failed to start (pid=646) (rc=0), please check your installation', 
last-rc-change='Thu Jul 23 16:49:14 2020', queued=0ms, exec=12193ms 

Daemon Status: 
corosync: active/enabled 
pacemaker: active/enabled 
pcsd: active/enabled 


It happens all the time when OC total number of networks goes above 70+ . I am attaching error logs of overcloud also. 


Regards 
Rahul Pathak 
i2k2 Networks (P) Ltd. | Spring Meadows Business Park 
A61-B4 & 4A First Floor, Sector 63, Noida - 201 301 
ISO/IEC 27001:2005 & ISO 9001:2008 Certified 

----- Original Message -----

From: "Alfredo Moralejo Alonso" <amoralej at redhat.com> 
To: "Rahul Pathak" <rpathak at i2k2.com> 
Cc: "RDO Developmen List" <dev at lists.rdoproject.org> 
Sent: Thursday, July 23, 2020 3:34:45 PM 
Subject: Re: [rdo-dev] tripleo cluster failure 



On Wed, Jul 22, 2020 at 3:23 PM Rahul Pathak < rpathak at i2k2.com > wrote: 



Hi, 


I have installed tripleo openstack version rocky containerized with Undercloud in virtual platform and 3 controllers and 2 compute Baremetal. 


My whole setup is running on centos7. 


Overcloud cluster start failing once number of networks in overcloud reach more than 70. Lots of resources failure issue shown there. I don't know why HA cluster failed after 70 networks in OC. 




What kind of errors are you seeing?, what "resource failures"? 

<blockquote>



Is some kind of threshold in tripleo configuration? so it is restricted not to create more than 70 or 80 netwoks. How could i fix this? 

I did not see such issue when I am using redhat platform and it's repos. This issue coming in opensource repos on Centos 7 . Please help how to fix this issue so I can scale up my openstack upto 2000 vms in this situation it's not possible. 
Regards 
Rahul Pathak 
i2k2 Networks (P) Ltd. | Spring Meadows Business Park 
A61-B4 & 4A First Floor, Sector 63, Noida - 201 301 
ISO/IEC 27001:2005 & ISO 9001:2008 Certified 


_______________________________________________ 
dev mailing list 
dev at lists.rdoproject.org 
http://lists.rdoproject.org/mailman/listinfo/dev 

To unsubscribe: dev-unsubscribe at lists.rdoproject.org 

</blockquote>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20200723/2a156f18/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Overcloud_ERROR.rar
Type: application/x-rar
Size: 6285 bytes
Desc: not available
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20200723/2a156f18/attachment-0001.bin>


More information about the dev mailing list