[rdo-users] TripleO Monitoring Tool/Method

Khodayar Doustar khodayard at gmail.com
Fri Oct 23 09:09:07 UTC 2020


Matthias,

Thanks a lot for your answer.
Yes, you win the bet :) I've used swift and currently struggling to disable
collectd to make my cloud usable again! :))

I've seen this STF (Service Telemetry Framework) but it seems a little bit
too complicated. I should implement an OKD cluster to monitor my openstack,
isn't it too much work?
Have you tried it yourself?

If I understand correctly, with your first and main opinion you mean adding
this files to my overcloud deploy command:

/usr/share/openstack-tripleo-heat-templates/environments/enable-legacy-telemetry.yaml
/usr/share/openstack-tripleo-heat-templates/environments/services/collectd.yaml

and for performance tuning I've checked this page:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/deployment_recommendations_for_specific_red_hat_openstack_platform_services/config-recommend-telemetry_config-recommend-telemetry#config_telemetry-small-overcloud_config-recommend-telemetry

Is that what you mean?
If so I should make my cloud usable again and just change GnocchiBackend to
a path to a file on a shared file system (i.e. NFS) because I have 4
controller nodes, because the rest is exactly what I've done up to now.

Thanks a lot,
Khodayar

On Fri, Oct 23, 2020 at 10:01 AM Matthias Runge <mrunge at redhat.com> wrote:

> On 22/10/2020 17:46, Khodayar Doustar wrote:
> > Hi everybody,
> >
> > I am searching for a good and useful method to monitor my 40 nodes cloud.
> >
> > I have tried
> >
> > - Prometheus + Grafana (with
> > https://github.com/openstack-exporter/openstack-exporter
> > <https://github.com/openstack-exporter/openstack-exporter>) but it
> > cannot monitor nodes load and cpu usage etc.
> > and
> > - Gnocchi +Collectd + Grafana but it enforces unbelievable load on nodes
> > and make the whole cloud completely unusable!
> >
> > I've tried to use Graphite + Grafana but I failed.
> >
> > Do you have any suggestions?
>
>
> Hi,
>
> yes, I have some opinions here.
>
> My proposal here is:
>
> - use collectd to collect low level metrics from your baremetal machines
> - use ceilometer to collect OpenStack related info, like project usage,
> etc. That is nothing you'd get by using node-exporter
> - hook them both together and send metrics over to something called
> Service Telemetry Framework. The configuration *is* included in tripleo.
> The website has documentation available
> https://infrawatch.github.io/documentation
> - graphite + grafana (plus collectd) is also a single node setup and
> won't provide you reliability.
> - collectd also provides the ability to send events, which can be acted
> on. That is not included if you use node-exporter, openstack-exporter
> etc. Prometheus monitoring creates events from metrics, but will be slow
> to detect failed components.
>
> Since prometheus is meant to be single server, there is no HA per se in
> prometheus. That makes handling prometheus on standalone machines a bit
> awkward, or you'd have a infrastructure taking care of that.
>
> In your tests with gnocchi, collectd and grafana, I bet you used swift
> as backend for gnocchi storage. That is not a good idea and may lead to
> bad performance.
>
> Matthias
>
> --
> Matthias Runge <mrunge at redhat.com>
>
> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
> Commercial register: Amtsgericht Muenchen, HRB 153243,
> Man.Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neil
>
> _______________________________________________
> users mailing list
> users at lists.rdoproject.org
> http://lists.rdoproject.org/mailman/listinfo/users
>
> To unsubscribe: users-unsubscribe at lists.rdoproject.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/users/attachments/20201023/697c7399/attachment-0001.html>


More information about the users mailing list