[rdo-dev] tripleo cluster failure
Rahul Pathak
rpathak at i2k2.com
Thu Jul 23 11:37:53 UTC 2020
Hi,
When I am trying to create new projects and one network for each projects after 70 or 80 projects and networks in OC. Controller HA availablitliy failed with below error.
[stack at director LogTool_Python2]$ ssh heat-admin at 192.168.100.28 "sudo pcs status"
Cluster name: tripleo_cluster
Stack: corosync
Current DC: overcloud-controller-1 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Thu Jul 23 17:00:22 2020
Last change: Wed Jul 22 14:35:34 2020 by hacluster via crmd on overcloud-controller-2
12 nodes configured
37 resources configured
Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
GuestOnline: [ galera-bundle-0 at overcloud-controller-0 galera-bundle-1 at overcloud-controller-1 galera-bundle-2 at overcloud-controller-2 rabbitmq-bundle-0 at overcloud-controller-0 rabbitmq-bundle-1 at overcloud-controller-1 rabbitmq-bundle-2 at overcloud-controller-2 redis-bundle-0 at overcloud-controller-0 redis-bundle-1 at overcloud-controller-1 redis-bundle-2 at overcloud-controller-2 ]
Full list of resources:
Docker container set: rabbitmq-bundle [192.168.100.1:8787/tripleorocky/centos-binary-rabbitmq:pcmklatest]
rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-0
rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-1
rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-2
Docker container set: galera-bundle [192.168.100.1:8787/tripleorocky/centos-binary-mariadb:pcmklatest]
galera-bundle-0 (ocf::heartbeat:galera): Master overcloud-controller-0
galera-bundle-1 (ocf::heartbeat:galera): Master overcloud-controller-1
galera-bundle-2 (ocf::heartbeat:galera): FAILED Master overcloud-controller-2 (blocked)
Docker container set: redis-bundle [192.168.100.1:8787/tripleorocky/centos-binary-redis:pcmklatest]
redis-bundle-0 (ocf::heartbeat:redis): Master overcloud-controller-0
redis-bundle-1 (ocf::heartbeat:redis): Slave overcloud-controller-1
redis-bundle-2 (ocf::heartbeat:redis): Slave overcloud-controller-2
ip-192.168.100.98 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-10.10.0.11 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
ip-192.168.102.185 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2
ip-192.168.102.116 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-192.168.103.187 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
ip-192.168.104.127 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2
Docker container set: haproxy-bundle [192.168.100.1:8787/tripleorocky/centos-binary-haproxy:pcmklatest]
haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-0
haproxy-bundle-docker-1 (ocf::heartbeat:docker): Started overcloud-controller-1
haproxy-bundle-docker-2 (ocf::heartbeat:docker): Started overcloud-controller-2
Docker container: openstack-cinder-volume [192.168.100.1:8787/tripleorocky/centos-binary-cinder-volume:pcmklatest]
openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-0
Failed Resource Actions:
* redis-bundle-docker-1_monitor_60000 on overcloud-controller-1 'unknown error' (1): call=132, status=Timed Out, exitreason='',
last-rc-change='Thu Jul 23 16:42:15 2020', queued=0ms, exec=0ms
* galera-bundle-docker-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=41, status=Timed Out, exitreason='',
last-rc-change='Thu Jul 23 16:48:39 2020', queued=0ms, exec=0ms
* redis-bundle-docker-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=62, status=Timed Out, exitreason='',
last-rc-change='Thu Jul 23 16:48:39 2020', queued=0ms, exec=0ms
* haproxy-bundle-docker-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=106, status=Timed Out, exitreason='',
last-rc-change='Thu Jul 23 16:48:39 2020', queued=0ms, exec=0ms
* rabbitmq-bundle-docker-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=121, status=Timed Out, exitreason='',
last-rc-change='Thu Jul 23 16:48:39 2020', queued=0ms, exec=0ms
* galera_promote_0 on galera-bundle-2 'unknown error' (1): call=43, status=complete, exitreason='MySQL server failed to start (pid=646) (rc=0), please check your installation',
last-rc-change='Thu Jul 23 16:49:14 2020', queued=0ms, exec=12193ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
It happens all the time when OC total number of networks goes above 70+ . I am attaching error logs of overcloud also.
Regards
Rahul Pathak
i2k2 Networks (P) Ltd. | Spring Meadows Business Park
A61-B4 & 4A First Floor, Sector 63, Noida - 201 301
ISO/IEC 27001:2005 & ISO 9001:2008 Certified
----- Original Message -----
From: "Alfredo Moralejo Alonso" <amoralej at redhat.com>
To: "Rahul Pathak" <rpathak at i2k2.com>
Cc: "RDO Developmen List" <dev at lists.rdoproject.org>
Sent: Thursday, July 23, 2020 3:34:45 PM
Subject: Re: [rdo-dev] tripleo cluster failure
On Wed, Jul 22, 2020 at 3:23 PM Rahul Pathak < rpathak at i2k2.com > wrote:
Hi,
I have installed tripleo openstack version rocky containerized with Undercloud in virtual platform and 3 controllers and 2 compute Baremetal.
My whole setup is running on centos7.
Overcloud cluster start failing once number of networks in overcloud reach more than 70. Lots of resources failure issue shown there. I don't know why HA cluster failed after 70 networks in OC.
What kind of errors are you seeing?, what "resource failures"?
<blockquote>
Is some kind of threshold in tripleo configuration? so it is restricted not to create more than 70 or 80 netwoks. How could i fix this?
I did not see such issue when I am using redhat platform and it's repos. This issue coming in opensource repos on Centos 7 . Please help how to fix this issue so I can scale up my openstack upto 2000 vms in this situation it's not possible.
Regards
Rahul Pathak
i2k2 Networks (P) Ltd. | Spring Meadows Business Park
A61-B4 & 4A First Floor, Sector 63, Noida - 201 301
ISO/IEC 27001:2005 & ISO 9001:2008 Certified
_______________________________________________
dev mailing list
dev at lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/dev
To unsubscribe: dev-unsubscribe at lists.rdoproject.org
</blockquote>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20200723/2a156f18/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Overcloud_ERROR.rar
Type: application/x-rar
Size: 6285 bytes
Desc: not available
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20200723/2a156f18/attachment-0001.bin>
More information about the dev
mailing list