Re: [rdo-users] TripleO Monitoring Tool/Method

Friday, 23 October 2020

Hi,

yes of course I'm using STF, and it's not complicated.
It's always a good idea to separate your monitoring stack from the
monitored infrastructure. How would you know your stack is down, if
notifications are also sent from that stack?

With the tripleo-heat-templates you linked, you basically enable legacy
telemetry (ceilometer, aodh, gnocchi).

If you are running 40 computes, that is not a small stack anymore. I
would suggest (recommend) to use ceph as backend.

Also, depending on your use-case and your settings (for collectd) you
may want to lower the interval, the parameter is
CollectdDefaultPollingInterval, I have set it here to something like 5
secs, but in your case, I would suggest to use 600 (same as for Ceilometer).

Matthias

On 23/10/2020 11:09, Khodayar Doustar wrote:
...
 Matthias,

 Thanks a lot for your answer.
 Yes, you win the bet :) I've used swift and currently struggling to
 disable collectd to make my cloud usable again! :))

 I've seen this STF (Service Telemetry Framework) but it seems a little
 bit too complicated. I should implement an OKD cluster to monitor my
 openstack, isn't it too much work?
 Have you tried it yourself?

 If I understand correctly, with your first and main opinion you mean
 adding this files to my overcloud deploy command:

 /usr/share/openstack-tripleo-heat-templates/environments/enable-legacy-telemetry.yaml
 /usr/share/openstack-tripleo-heat-templates/environments/services/collectd.yaml

 and for performance tuning I've checked this page:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/...

<https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/...

 Is that what you mean?
 If so I should make my cloud usable again and just change GnocchiBackend
 to a path to a file on a shared file system (i.e. NFS) because I have 4
 controller nodes, because the rest is exactly what I've done up to now.

 Thanks a lot,
 Khodayar

 On Fri, Oct 23, 2020 at 10:01 AM Matthias Runge <mrunge(a)redhat.com
 <mailto:mrunge@redhat.com>> wrote:

     On 22/10/2020 17:46, Khodayar Doustar wrote:
     > Hi everybody,
     >
     > I am searching for a good and useful method to monitor my 40 nodes
     cloud.
     >
     > I have tried
     >
     > - Prometheus + Grafana (with
     > https://github.com/openstack-exporter/openstack-exporter
     <https://github.com/openstack-exporter/openstack-exporter>
     > <https://github.com/openstack-exporter/openstack-exporter
     <https://github.com/openstack-exporter/openstack-exporter>>) but it
     > cannot monitor nodes load and cpu usage etc.
     > and 
     > - Gnocchi +Collectd + Grafana but it enforces unbelievable load on
     nodes
     > and make the whole cloud completely unusable!
     >
     > I've tried to use Graphite + Grafana but I failed.
     >
     > Do you have any suggestions?

     Hi,

     yes, I have some opinions here.

     My proposal here is:

     - use collectd to collect low level metrics from your baremetal machines
     - use ceilometer to collect OpenStack related info, like project usage,
     etc. That is nothing you'd get by using node-exporter
     - hook them both together and send metrics over to something called
     Service Telemetry Framework. The configuration *is* included in tripleo.
     The website has documentation available
     https://infrawatch.github.io/documentation
     <https://infrawatch.github.io/documentation>
     - graphite + grafana (plus collectd) is also a single node setup and
     won't provide you reliability.
     - collectd also provides the ability to send events, which can be acted
     on. That is not included if you use node-exporter, openstack-exporter
     etc. Prometheus monitoring creates events from metrics, but will be slow
     to detect failed components.

     Since prometheus is meant to be single server, there is no HA per se in
     prometheus. That makes handling prometheus on standalone machines a bit
     awkward, or you'd have a infrastructure taking care of that.

     In your tests with gnocchi, collectd and grafana, I bet you used swift
     as backend for gnocchi storage. That is not a good idea and may lead to
     bad performance.

     Matthias

     -- 
     Matthias Runge <mrunge(a)redhat.com <mailto:mrunge@redhat.com>>

     Red Hat GmbH, http://www.de.redhat.com/ <http://www.de.redhat.com/>;,
     Registered seat: Grasbrunn,
     Commercial register: Amtsgericht Muenchen, HRB 153243,
     Man.Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael
     O'Neil

     _______________________________________________
     users mailing list
     users(a)lists.rdoproject.org <mailto:users@lists.rdoproject.org>
     http://lists.rdoproject.org/mailman/listinfo/users
     <http://lists.rdoproject.org/mailman/listinfo/users>

     To unsubscribe: users-unsubscribe(a)lists.rdoproject.org
     <mailto:users-unsubscribe@lists.rdoproject.org>

-- 
Matthias Runge <mrunge(a)redhat.com&gt;

Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Man.Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neil

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [rdo-users] TripleO Monitoring Tool/Method