Hi Yatin,
Thank you for the confirmation! I re-enabled the pacemaker and haproxy
roles and I have been since digging into why HA has been failing and I am
seeing the following:
1. pacemaker.service won't start due to Corosync not running.
2. Corosync seems to be failing to start due to not having the
/etc/corosync/corosync.conf file as it does not exist.
3. The pcsd log file shows the following errors:
---
Config files sync started
Config files sync skipped, this host does not seem to be in a cluster of at
least 2 nodes
---
This is what originally led me to believe that it wouldn't work without a
proper HA environment with 3 nodes.
The overcloud deployment itself simply times out at "Wait for puppet host
configuration to finish". I saw that step_1 seems to be where things are
failing (due to pacemaker), and when running it manually, I am seeing the
following messages:
---
Debug: Executing: '/bin/systemctl is-enabled -- corosync'
Debug: Executing: '/bin/systemctl is-enabled -- pacemaker'
Debug: Executing: '/bin/systemctl is-active -- pcsd'
Debug: Executing: '/bin/systemctl is-enabled -- pcsd'
Debug: Exec[check-for-local-authentication](provider=posix): Executing
check '/sbin/pcs status pcsd controller 2>&1 | grep 'Unable to
authenticate''
Debug: Executing: '/sbin/pcs status pcsd controller 2>&1 | grep 'Unable to
authenticate''
Debug:
/Stage[main]/Pacemaker::Corosync/Exec[check-for-local-authentication]:
'/bin/echo 'local pcsd auth failed, triggering a reauthentication''
won't
be executed because of failed check 'onlyif'
Debug:
/Stage[main]/Pacemaker::Corosync/Exec[reauthenticate-across-all-nodes]:
'/sbin/pcs host auth
controller.cloud.hirstgroup.net -u hacluster -p
oaJOCgGDxRfJ1dLK' won't be executed because of failed check 'refreshonly'
Debug:
/Stage[main]/Pacemaker::Corosync/Exec[auth-successful-across-all-nodes]:
'/sbin/pcs host auth
controller.cloud.hirstgroup.net -u hacluster -p
oaJOCgGDxRfJ1dLK' won't be executed because of failed check 'refreshonly'
Debug: Exec[wait-for-settle](provider=posix): Executing check '/sbin/pcs
status | grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: Executing: '/sbin/pcs status | grep -q 'partition with quorum' >
/dev/null 2>&1'
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/unless:
Error: error running crm_mon, is pacemaker running?
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/unless:
Could not connect to the CIB: Transport endpoint is not connected
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/unless:
crm_mon: Error: cluster is not available on this node
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns: Exec
try 1/360
Debug: Exec[wait-for-settle](provider=posix): Executing '/sbin/pcs status |
grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: Executing: '/sbin/pcs status | grep -q 'partition with quorum' >
/dev/null 2>&1'
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns:
Sleeping for 10.0 seconds between tries
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns: Exec
try 2/360
Debug: Exec[wait-for-settle](provider=posix): Executing '/sbin/pcs status |
grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: Executing: '/sbin/pcs status | grep -q 'partition with quorum' >
/dev/null 2>&1'
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns:
Sleeping for 10.0 seconds between tries
---
How does the corosync.conf file get created? Is it related to the pcsd
error saying that config sync can't proceed due to the cluster not having a
minimum of two members?
Thanks,
James H
On Mon, 28 Dec 2020 at 11:26, YATIN KAREL <yatinkarel(a)gmail.com> wrote:
Hi James,
On Sun, Dec 27, 2020 at 4:04 PM James Hirst <jdhirst12(a)gmail.com> wrote:
> HI All,
>
> I am attempting to set up a single controller overcloud with tripleo
> Victoria. I keep running into issues where pcsd is attempting to be started
> in puppet step 1 on the controller and it fails. I attempted to solve this
> by simply removing the pacemaker service from my roles_data.yaml file, but
> then I ran into other errors requiring that the pacemaker service be
> enabled.
>
> HA Deployment is enabled by default since Ussuri release[1] with [2]. So
pacemaker will be deployed by default whether you set up 1 or more
controller nodes since Ussuri. Without pacemaker deployment is possible but
would need more changes(apart from removing pacemaker from roles_data.yaml
file), like adjusting resource_registry to use non pacemaker resources. HA
with 1 Controller works fine as we have green jobs[3][4] running with both
1 controller/3 controllers, so would recommend to look why pcsd is failing
for you and proceed with HA. But if you still want to go without pacemaker
then can try adjusting resource-registry to enable/disable pacemaker
resources
> I have ControllerCount set to 1, which according to the docs is all I
> need to do to tell tripleo that I'm not using HA.
>
> Docs might be outdated if it specifies just setting ControllerCount to 1
is enough to deploy without a pacemaker, you can report a bug or send a
patch to fix that with the docs link you using.
Thanks,
> James H
> _______________________________________________
> users mailing list
> users(a)lists.rdoproject.org
>
http://lists.rdoproject.org/mailman/listinfo/users
>
> To unsubscribe: users-unsubscribe(a)lists.rdoproject.org
>
[1]
https://docs.openstack.org/releasenotes/tripleo-heat-templates/ussuri.htm...
[2]
https://review.opendev.org/c/openstack/tripleo-heat-templates/+/359060
[3]
https://logserver.rdoproject.org/openstack-periodic-integration-stable1/o...
[4]
https://logserver.rdoproject.org/openstack-periodic-integration-stable1/o...
Thanks and regards
Yatin Karel