Hi!
I installed Munin on two of our machines. The master runs in OpenStack, the first node I
configured is the controller node of another OpenStack installation. Because of the high
number of disks the "diskstats" plugin generates a lot of output (6154 lines,
218698 bytes). This much data kills the TCP connection between the master and the node.
I can reproduce this with just a telnet to the node, port 4949 like this:
# telnet 192.168.104.61 4949
Trying 192.168.104.61...
Connected to 192.168.104.61.
Escape character is '^]'.
# munin node at
CNT64IB003.example.com
config diskstats
... lots of config data ...
graph_info This graph shows the number of IO operations pr second and the average size of
these requests. Lots of small requests should result in in lower throughput (separate
graph) and higher service time (separate graph). Please note Connection closed by foreign
host.
The connection goes from the VM on the internal network through an OpenStack router to the
external network and over a "real" router to the node. I have confirmed by using
a different hardware machine that connection is OK outside of OpenStack and also by using
an OpenStack VM as another Munin node, that the OpenStack L3/router is to blame, not the
internal networking. For completeness, this is up-to-date Grizzly RDO running on
up-to-date CentOS 6.4.
Networks:
* 192.168.163.0/24 OpenStack internal
* 192.168.142.0/24 OpenStack external
* 192.168.104.0/24 outside of OpenStack
Here is the tcpdump output from the Munin master:
13:35:02.464812 IP 192.168.163.12.46630 > 192.168.104.61.munin: Flags [.], ack 94081,
win 143, options [nop,nop,TS val 83989877 ecr 338498688], length 0
13:35:02.465086 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [.], seq
94081:95529, ack 727, win 114, options [nop,nop,TS val 338498690 ecr 83989877], length
1448
13:35:02.465282 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [.], seq
95529:98425, ack 727, win 114, options [nop,nop,TS val 338498690 ecr 83989877], length
2896
13:35:02.469397 IP 192.168.163.12.46630 > 192.168.104.61.munin: Flags [.], ack 98425,
win 205, options [nop,nop,TS val 83989881 ecr 338498690], length 0
13:35:02.469686 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [.], seq
98425:99873, ack 727, win 114, options [nop,nop,TS val 338498694 ecr 83989881], length
1448
13:35:02.469881 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [.], seq
99873:104217, ack 727, win 114, options [nop,nop,TS val 338498694 ecr 83989881], length
4344
13:35:02.474029 IP 192.168.163.12.46630 > 192.168.104.61.munin: Flags [.], ack 104217,
win 189, options [nop,nop,TS val 83989885 ecr 338498694], length 0
13:35:02.474292 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [R], seq
4001796938, win 0, length 0
13:35:02.605976 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [.], seq
39738:41186, ack 727, win 114, options [nop,nop,TS val 338498680 ecr 83989867], length
1448
13:35:02.606042 IP 192.168.163.12.46630 > 192.168.104.61.munin: Flags [R], seq
3552918741, win 0, length 0
13:35:02.615483 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [P.], seq
41186:41197, ack 727, win 114, options [nop,nop,TS val 338498680 ecr 83989867], length 11
13:35:02.615512 IP 192.168.163.12.46630 > 192.168.104.61.munin: Flags [R], seq
3552918741, win 0, length 0
13:35:02.618052 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [.], seq
41197:42645, ack 727, win 114, options [nop,nop,TS val 338498680 ecr 83989867], length
1448
You can see that the Munin node terminates the connection with a RESET packet
(13:35:02.474292). The Munin master isn't running NTP, so please disregard the
timestamps.
Now, here is a tcpdump output from the node:
13:35:17.485792 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [.], ack 94081,
win 143, options [nop,nop,TS val 83989877 ecr 338498688], length 0
13:35:17.485812 IP 192.168.104.61.munin > 192.168.142.11.46630: Flags [.], seq
94081:98425, ack 727, win 114, options [nop,nop,TS val 338498690 ecr 83989877], length
4344
13:35:17.490390 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [.], ack 98425,
win 205, options [nop,nop,TS val 83989881 ecr 338498690], length 0
13:35:17.490406 IP 192.168.104.61.munin > 192.168.142.11.46630: Flags [.], seq
98425:104217, ack 727, win 114, options [nop,nop,TS val 338498694 ecr 83989881], length
5792
13:35:17.492061 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [R], seq
3552918741, win 0, length 0
13:35:17.495034 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [.], ack 104217,
win 189, options [nop,nop,TS val 83989885 ecr 338498694], length 0
13:35:17.495055 IP 192.168.104.61.munin > 192.168.142.11.46630: Flags [R], seq
4001796938, win 0, length 0
13:35:17.501622 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [R], seq
3552918741, win 0, length 0
13:35:17.511234 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [R], seq
3552918741, win 0, length 0
13:35:17.520809 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [R], seq
3552918741, win 0, length 0
You can see that this shows the Master sending the RESET packet (13:35:17.492061).
I turned on debugging for the L3 agent, but can't tell anything from the output.
Without debugging, the log file has no new entries.
Please advise.
Best regards / Mit freundlichen Gr??en
Lutz Christoph
--
Lutz Christoph
arago Institut f?r komplexes Datenmanagement AG
Eschersheimer Landstra?e 526 - 532
60433 Frankfurt am Main
eMail: lchristoph(a)arago.de - www:
http://www.arago.de
Tel: 0172/6301004
Mobil: 0172/6301004
--
Bankverbindung: Frankfurter Sparkasse, BLZ: 500 502 01, Kto.-Nr.: 79343
Vorstand: Hans-Christian Boos, Martin Friedrich
Vorsitzender des Aufsichtsrats: Dr. Bernhard Walther
Sitz: Kronberg im Taunus - HRB 5731 - Registergericht: K?nigstein i.Ts
Ust.Idnr. DE 178572359 - Steuernummer 2603 003 228 43435