[rdo-users] Single Controller Environment in Victoria

Mon Dec 28 10:58:30 UTC 2020

Hi Yatin,

Thank you for the confirmation! I re-enabled the pacemaker and haproxy
roles and I have been since digging into why HA has been failing and I am
seeing the following:

1. pacemaker.service won't start due to Corosync not running.
2. Corosync seems to be failing to start due to not having the
/etc/corosync/corosync.conf file as it does not exist.
3. The pcsd log file shows the following errors:
---
Config files sync started
Config files sync skipped, this host does not seem to be in a cluster of at
least 2 nodes
---
This is what originally led me to believe that it wouldn't work without a
proper HA environment with 3 nodes.

The overcloud deployment itself simply times out at "Wait for puppet host
configuration to finish". I saw that step_1 seems to be where things are
failing (due to pacemaker), and when running it manually, I am seeing the
following messages:

---
Debug: Executing: '/bin/systemctl is-enabled -- corosync'
Debug: Executing: '/bin/systemctl is-enabled -- pacemaker'
Debug: Executing: '/bin/systemctl is-active -- pcsd'
Debug: Executing: '/bin/systemctl is-enabled -- pcsd'
Debug: Exec[check-for-local-authentication](provider=posix): Executing
check '/sbin/pcs status pcsd controller 2>&1 | grep 'Unable to
authenticate''
Debug: Executing: '/sbin/pcs status pcsd controller 2>&1 | grep 'Unable to
authenticate''
Debug:
/Stage[main]/Pacemaker::Corosync/Exec[check-for-local-authentication]:
'/bin/echo 'local pcsd auth failed, triggering a reauthentication'' won't
be executed because of failed check 'onlyif'
Debug:
/Stage[main]/Pacemaker::Corosync/Exec[reauthenticate-across-all-nodes]:
'/sbin/pcs host auth controller.cloud.hirstgroup.net -u hacluster -p
oaJOCgGDxRfJ1dLK' won't be executed because of failed check 'refreshonly'
Debug:
/Stage[main]/Pacemaker::Corosync/Exec[auth-successful-across-all-nodes]:
'/sbin/pcs host auth controller.cloud.hirstgroup.net -u hacluster -p
oaJOCgGDxRfJ1dLK' won't be executed because of failed check 'refreshonly'
Debug: Exec[wait-for-settle](provider=posix): Executing check '/sbin/pcs
status | grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: Executing: '/sbin/pcs status | grep -q 'partition with quorum' >
/dev/null 2>&1'
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/unless:
Error: error running crm_mon, is pacemaker running?
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/unless:
Could not connect to the CIB: Transport endpoint is not connected
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/unless:
crm_mon: Error: cluster is not available on this node
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns: Exec
try 1/360
Debug: Exec[wait-for-settle](provider=posix): Executing '/sbin/pcs status |
grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: Executing: '/sbin/pcs status | grep -q 'partition with quorum' >
/dev/null 2>&1'
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns:
Sleeping for 10.0 seconds between tries
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns: Exec
try 2/360
Debug: Exec[wait-for-settle](provider=posix): Executing '/sbin/pcs status |
grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: Executing: '/sbin/pcs status | grep -q 'partition with quorum' >
/dev/null 2>&1'
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns:
Sleeping for 10.0 seconds between tries
---

How does the corosync.conf file get created? Is it related to the pcsd
error saying that config sync can't proceed due to the cluster not having a
minimum of two members?

Thanks,
James H

On Mon, 28 Dec 2020 at 11:26, YATIN KAREL <yatinkarel at gmail.com> wrote:

> Hi James,
>
> On Sun, Dec 27, 2020 at 4:04 PM James Hirst <jdhirst12 at gmail.com> wrote:
>
>> HI All,
>>
>> I am attempting to set up a single controller overcloud with tripleo
>> Victoria. I keep running into issues where pcsd is attempting to be started
>> in puppet step 1 on the controller and it fails. I attempted to solve this
>> by simply removing the pacemaker service from my roles_data.yaml file, but
>> then I ran into other errors requiring that the pacemaker service be
>> enabled.
>>
>> HA Deployment is enabled by default since Ussuri release[1] with [2]. So
> pacemaker will be deployed by default whether you set up 1 or more
> controller nodes since Ussuri. Without pacemaker deployment is possible but
> would need more changes(apart from removing pacemaker from roles_data.yaml
> file), like adjusting resource_registry to use non pacemaker resources. HA
> with 1 Controller works fine as we have green jobs[3][4] running with both
> 1 controller/3 controllers, so would recommend to look why pcsd is failing
> for you and proceed with HA. But if you still want to go without pacemaker
> then can try adjusting resource-registry to enable/disable pacemaker
> resources
>
>
>> I have ControllerCount set to 1, which according to the docs is all I
>> need to do to tell tripleo that I'm not using HA.
>>
>> Docs might be outdated if it specifies just setting ControllerCount to 1
> is enough to deploy without a pacemaker, you can report a bug or send a
> patch to fix that with the docs link you using.
>
>
> Thanks,
>> James H
>> _______________________________________________
>> users mailing list
>> users at lists.rdoproject.org
>> http://lists.rdoproject.org/mailman/listinfo/users
>>
>> To unsubscribe: users-unsubscribe at lists.rdoproject.org
>>
>
>
> [1]
> https://docs.openstack.org/releasenotes/tripleo-heat-templates/ussuri.html#relnotes-12-3-0-stable-ussuri-other-notes
> [2] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/359060
> [3]
> https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_1comp-featureset002-victoria/0bccbf6/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
> [4]
> https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-victoria/a5dd4bc/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
>
>
> Thanks and regards
> Yatin Karel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/users/attachments/20201228/0e103b3d/attachment-0001.html>