[rdo-list] [tripleo] Troubles deploying Libery with HA setup

Luca 'remix_tj' Lorenzetto lorenzetto.luca at gmail.com
Wed Aug 3 08:17:47 UTC 2016


On Wed, Aug 3, 2016 at 6:36 AM, Michele Baldessari <michele at acksyn.org> wrote:
> Hi Luca,
[cut]

Hi Michele,

> What happens at this step is that "pcs cluster auth opsctrl0 opsctrl1
> opsctrl2..." will set up a secret key between the three nodes and then
> configure corosync (/etc/corosync/corosync.conf) and pacemaker and then
> start both services on all three nodes.

Thank you for the explanation.


> What you need to verify is why
> this command is stuck. It is likely either due to networking issues, dns
> issues or firewalling issues. You can quickly try and strace the pcs
> process and see on which network connections it waits for replies that
> never arrive.

I see this on netstat:

[heat-admin at opsctrl0 ~]$ sudo netstat -alptn | grep 2224
tcp        0      0 172.25.122.13:44378     172.25.122.12:2224
ESTABLISHED 26473/ruby
tcp        0      0 172.25.122.13:55286     172.25.122.13:2224
ESTABLISHED 26473/ruby
tcp        0      0 172.25.122.13:43808     172.25.122.14:2224
ESTABLISHED 26473/ruby
tcp6       2      0 :::2224                 :::*
LISTEN      26276/ruby
tcp6     200      0 172.25.122.13:2224      172.25.122.13:55286
ESTABLISHED -
tcp6     200      0 172.25.122.13:2224      172.25.122.12:59155
ESTABLISHED -
tcp6       0   1457 172.25.122.13:2224      172.25.122.14:39918
ESTABLISHED 26276/ruby

(similar on the other 3 nodes)

PID 26276 is /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
PID 26473 is /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb
auth which is a child process of /usr/bin/python2 /usr/sbin/pcs
cluster auth opsctrl0 opsctrl1 opsctrl2 -u hacluster -p password
--force


Stracing 26276 i continue to get:

select(11, [10], NULL, NULL, {2, 0})    = 0 (Timeout)


With lsof i see that there are these fds open:

ruby    26276 root    0r   CHR    1,3       0t0     1028 /dev/null
ruby    26276 root    1u   REG    8,2     32181 20889837 /var/log/pcsd/pcsd.log
ruby    26276 root    2u   REG    8,2     32181 20889837 /var/log/pcsd/pcsd.log
ruby    26276 root    3r  FIFO    0,8       0t0    72146 pipe
ruby    26276 root    4w  FIFO    0,8       0t0    72146 pipe
ruby    26276 root    5r  FIFO    0,8       0t0    72147 pipe
ruby    26276 root    6w  FIFO    0,8       0t0    72147 pipe
ruby    26276 root    7u  sock    0,6       0t0    64861 protocol: TCPv6
ruby    26276 root    9w   REG    8,2     32181 20889837 /var/log/pcsd/pcsd.log
ruby    26276 root   10u  IPv6  23096       0t0      TCP *:efi-mg (LISTEN)


i suppose that the one involved is the last one.

Additionally, since i've seen that a log file is open, i can report
that every minute the log get this lines:

I, [2016-08-03T04:14:05.213788 #26276]  INFO -- : Running:
/usr/sbin/corosync-cmapctl totem.cluster_name
I, [2016-08-03T04:14:05.214140 #26276]  INFO -- : CIB USER: hacluster, groups:
I, [2016-08-03T04:14:05.219656 #26276]  INFO -- : Return Value: 1
W, [2016-08-03T04:14:05.219834 #26276]  WARN -- : Cannot read config
'corosync.conf' from '/etc/corosync/corosync.conf': No such file or
directory - /etc/corosync/corosync.conf

I'm not able to understand what's happening.


-- 
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)

Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca at gmail.com>




More information about the dev mailing list