Thank you for the confirmation! I re-enabled the pacemaker and haproxy roles and I have been since digging into why HA has been failing and I am seeing the following:
1. pacemaker.service won't start due to Corosync not running.
2. Corosync seems to be failing to start due to not having the /etc/corosync/corosync.conf file as it does not exist.
3. The pcsd log file shows the following errors:
This is what originally led me to believe that it wouldn't work without a proper HA environment with 3 nodes.
The overcloud deployment itself simply times out at "Wait for puppet host configuration to finish". I saw that step_1 seems to be where things are failing (due to pacemaker), and when running it manually, I am seeing the following messages:
Debug: Executing: '/bin/systemctl is-enabled -- corosync'
Debug: Executing: '/bin/systemctl is-enabled -- pacemaker'
Debug: Executing: '/bin/systemctl is-active -- pcsd'
Debug: Executing: '/bin/systemctl is-enabled -- pcsd'
Debug: Exec[check-for-local-authentication](provider=posix): Executing check '/sbin/pcs status pcsd controller 2>&1 | grep 'Unable to authenticate''
Debug: Executing: '/sbin/pcs status pcsd controller 2>&1 | grep 'Unable to authenticate''
Debug: /Stage[main]/Pacemaker::Corosync/Exec[check-for-local-authentication]: '/bin/echo 'local pcsd auth failed, triggering a reauthentication'' won't be executed because of failed check 'onlyif'
Debug: /Stage[main]/Pacemaker::Corosync/Exec[reauthenticate-across-all-nodes]: '/sbin/pcs host auth
controller.cloud.hirstgroup.net -u hacluster -p oaJOCgGDxRfJ1dLK' won't be executed because of failed check 'refreshonly'
Debug: /Stage[main]/Pacemaker::Corosync/Exec[auth-successful-across-all-nodes]: '/sbin/pcs host auth
controller.cloud.hirstgroup.net -u hacluster -p oaJOCgGDxRfJ1dLK' won't be executed because of failed check 'refreshonly'
Debug: Exec[wait-for-settle](provider=posix): Executing check '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/unless: Error: error running crm_mon, is pacemaker running?
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/unless: Could not connect to the CIB: Transport endpoint is not connected
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/unless: crm_mon: Error: cluster is not available on this node
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns: Exec try 1/360
Debug: Exec[wait-for-settle](provider=posix): Executing '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns: Sleeping for 10.0 seconds between tries
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns: Exec try 2/360
Debug: Exec[wait-for-settle](provider=posix): Executing '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns: Sleeping for 10.0 seconds between tries
How does the corosync.conf file get created? Is it related to the pcsd error saying that config sync can't proceed due to the cluster not having a minimum of two members?