Hi,

When I am trying to create new projects and one network for each projects after 70 or 80 projects and networks in OC. Controller HA availablitliy failed with below error.

[stack@director LogTool_Python2]$ ssh heat-admin@192.168.100.28 "sudo pcs status"
Cluster name: tripleo_cluster
Stack: corosync
Current DC: overcloud-controller-1 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Thu Jul 23 17:00:22 2020
Last change: Wed Jul 22 14:35:34 2020 by hacluster via crmd on overcloud-controller-2

12 nodes configured
37 resources configured

Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
GuestOnline: [ galera-bundle-0@overcloud-controller-0 galera-bundle-1@overcloud-controller-1 galera-bundle-2@overcloud-controller-2 rabbitmq-bundle-0@overcloud-controller-0 rabbitmq-bundle-1@overcloud-controller-1 rabbitmq-bundle-2@overcloud-controller-2 redis-bundle-0@overcloud-controller-0 redis-bundle-1@overcloud-controller-1 redis-bundle-2@overcloud-controller-2 ]

Full list of resources:

 Docker container set: rabbitmq-bundle [192.168.100.1:8787/tripleorocky/centos-binary-rabbitmq:pcmklatest]
   rabbitmq-bundle-0    (ocf::heartbeat:rabbitmq-cluster):      Started overcloud-controller-0
   rabbitmq-bundle-1    (ocf::heartbeat:rabbitmq-cluster):      Started overcloud-controller-1
   rabbitmq-bundle-2    (ocf::heartbeat:rabbitmq-cluster):      Started overcloud-controller-2
 Docker container set: galera-bundle [192.168.100.1:8787/tripleorocky/centos-binary-mariadb:pcmklatest]
   galera-bundle-0      (ocf::heartbeat:galera):        Master overcloud-controller-0
   galera-bundle-1      (ocf::heartbeat:galera):        Master overcloud-controller-1
   galera-bundle-2      (ocf::heartbeat:galera):        FAILED Master overcloud-controller-2 (blocked)
 Docker container set: redis-bundle [192.168.100.1:8787/tripleorocky/centos-binary-redis:pcmklatest]
   redis-bundle-0       (ocf::heartbeat:redis): Master overcloud-controller-0
   redis-bundle-1       (ocf::heartbeat:redis): Slave overcloud-controller-1
   redis-bundle-2       (ocf::heartbeat:redis): Slave overcloud-controller-2
 ip-192.168.100.98      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 ip-10.10.0.11  (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 ip-192.168.102.185     (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2
 ip-192.168.102.116     (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 ip-192.168.103.187     (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 ip-192.168.104.127     (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2
 Docker container set: haproxy-bundle [192.168.100.1:8787/tripleorocky/centos-binary-haproxy:pcmklatest]
   haproxy-bundle-docker-0      (ocf::heartbeat:docker):        Started overcloud-controller-0
   haproxy-bundle-docker-1      (ocf::heartbeat:docker):        Started overcloud-controller-1
   haproxy-bundle-docker-2      (ocf::heartbeat:docker):        Started overcloud-controller-2
 Docker container: openstack-cinder-volume [192.168.100.1:8787/tripleorocky/centos-binary-cinder-volume:pcmklatest]
   openstack-cinder-volume-docker-0     (ocf::heartbeat:docker):        Started overcloud-controller-0

Failed Resource Actions:
* redis-bundle-docker-1_monitor_60000 on overcloud-controller-1 'unknown error' (1): call=132, status=Timed Out, exitreason='',
    last-rc-change='Thu Jul 23 16:42:15 2020', queued=0ms, exec=0ms
* galera-bundle-docker-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=41, status=Timed Out, exitreason='',
    last-rc-change='Thu Jul 23 16:48:39 2020', queued=0ms, exec=0ms
* redis-bundle-docker-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=62, status=Timed Out, exitreason='',
    last-rc-change='Thu Jul 23 16:48:39 2020', queued=0ms, exec=0ms
* haproxy-bundle-docker-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=106, status=Timed Out, exitreason='',
    last-rc-change='Thu Jul 23 16:48:39 2020', queued=0ms, exec=0ms
* rabbitmq-bundle-docker-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=121, status=Timed Out, exitreason='',
    last-rc-change='Thu Jul 23 16:48:39 2020', queued=0ms, exec=0ms
* galera_promote_0 on galera-bundle-2 'unknown error' (1): call=43, status=complete, exitreason='MySQL server failed to start (pid=646) (rc=0), please check your installation',
    last-rc-change='Thu Jul 23 16:49:14 2020', queued=0ms, exec=12193ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


It happens all the time when OC total number of networks goes above 70+ . I am attaching error logs of overcloud also.


Regards
Rahul Pathak
i2k2 Networks (P) Ltd. | Spring Meadows Business Park
A61-B4 & 4A First Floor, Sector 63, Noida - 201 301
ISO/IEC 27001:2005 & ISO 9001:2008 Certified


From: "Alfredo Moralejo Alonso" <amoralej@redhat.com>
To: "Rahul Pathak" <rpathak@i2k2.com>
Cc: "RDO Developmen List" <dev@lists.rdoproject.org>
Sent: Thursday, July 23, 2020 3:34:45 PM
Subject: Re: [rdo-dev] tripleo cluster failure



On Wed, Jul 22, 2020 at 3:23 PM Rahul Pathak <rpathak@i2k2.com> wrote:
Hi,

I have installed tripleo openstack version rocky containerized with Undercloud in virtual platform and 3 controllers and 2 compute Baremetal.

My whole setup is running  on centos7.

Overcloud cluster start failing once number of networks in overcloud reach more than 70. Lots of resources failure issue shown there. I don't know why HA cluster failed after 70 networks in OC.


What kind of errors are you seeing?, what "resource failures"?
 

Is some kind of threshold in tripleo configuration? so it is restricted not to create more than 70 or 80 netwoks. How could i fix this?

I did not see such issue when I am using redhat platform and it's repos. This issue coming in opensource repos on Centos 7 . Please help how to fix this issue so I can scale up my openstack upto 2000 vms in this situation it's not possible.


Regards
Rahul Pathak
i2k2 Networks (P) Ltd. | Spring Meadows Business Park
A61-B4 & 4A First Floor, Sector 63, Noida - 201 301
ISO/IEC 27001:2005 & ISO 9001:2008 Certified


_______________________________________________
dev mailing list
dev@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/dev

To unsubscribe: dev-unsubscribe@lists.rdoproject.org