[rdo-users] TripleO Monitoring Tool/Method
Matthias Runge
mrunge at redhat.com
Fri Oct 23 13:36:56 UTC 2020
Hi,
yes of course I'm using STF, and it's not complicated.
It's always a good idea to separate your monitoring stack from the
monitored infrastructure. How would you know your stack is down, if
notifications are also sent from that stack?
With the tripleo-heat-templates you linked, you basically enable legacy
telemetry (ceilometer, aodh, gnocchi).
If you are running 40 computes, that is not a small stack anymore. I
would suggest (recommend) to use ceph as backend.
Also, depending on your use-case and your settings (for collectd) you
may want to lower the interval, the parameter is
CollectdDefaultPollingInterval, I have set it here to something like 5
secs, but in your case, I would suggest to use 600 (same as for Ceilometer).
Matthias
On 23/10/2020 11:09, Khodayar Doustar wrote:
> Matthias,
>
> Thanks a lot for your answer.
> Yes, you win the bet :) I've used swift and currently struggling to
> disable collectd to make my cloud usable again! :))
>
> I've seen this STF (Service Telemetry Framework) but it seems a little
> bit too complicated. I should implement an OKD cluster to monitor my
> openstack, isn't it too much work?
> Have you tried it yourself?
>
> If I understand correctly, with your first and main opinion you mean
> adding this files to my overcloud deploy command:
>
> /usr/share/openstack-tripleo-heat-templates/environments/enable-legacy-telemetry.yaml
> /usr/share/openstack-tripleo-heat-templates/environments/services/collectd.yaml
>
> and for performance tuning I've checked this page:
> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/deployment_recommendations_for_specific_red_hat_openstack_platform_services/config-recommend-telemetry_config-recommend-telemetry#config_telemetry-small-overcloud_config-recommend-telemetry
> <https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/deployment_recommendations_for_specific_red_hat_openstack_platform_services/config-recommend-telemetry_config-recommend-telemetry#config_telemetry-small-overcloud_config-recommend-telemetry>
>
> Is that what you mean?
> If so I should make my cloud usable again and just change GnocchiBackend
> to a path to a file on a shared file system (i.e. NFS) because I have 4
> controller nodes, because the rest is exactly what I've done up to now.
>
> Thanks a lot,
> Khodayar
>
> On Fri, Oct 23, 2020 at 10:01 AM Matthias Runge <mrunge at redhat.com
> <mailto:mrunge at redhat.com>> wrote:
>
> On 22/10/2020 17:46, Khodayar Doustar wrote:
> > Hi everybody,
> >
> > I am searching for a good and useful method to monitor my 40 nodes
> cloud.
> >
> > I have tried
> >
> > - Prometheus + Grafana (with
> > https://github.com/openstack-exporter/openstack-exporter
> <https://github.com/openstack-exporter/openstack-exporter>
> > <https://github.com/openstack-exporter/openstack-exporter
> <https://github.com/openstack-exporter/openstack-exporter>>) but it
> > cannot monitor nodes load and cpu usage etc.
> > and
> > - Gnocchi +Collectd + Grafana but it enforces unbelievable load on
> nodes
> > and make the whole cloud completely unusable!
> >
> > I've tried to use Graphite + Grafana but I failed.
> >
> > Do you have any suggestions?
>
>
> Hi,
>
> yes, I have some opinions here.
>
> My proposal here is:
>
> - use collectd to collect low level metrics from your baremetal machines
> - use ceilometer to collect OpenStack related info, like project usage,
> etc. That is nothing you'd get by using node-exporter
> - hook them both together and send metrics over to something called
> Service Telemetry Framework. The configuration *is* included in tripleo.
> The website has documentation available
> https://infrawatch.github.io/documentation
> <https://infrawatch.github.io/documentation>
> - graphite + grafana (plus collectd) is also a single node setup and
> won't provide you reliability.
> - collectd also provides the ability to send events, which can be acted
> on. That is not included if you use node-exporter, openstack-exporter
> etc. Prometheus monitoring creates events from metrics, but will be slow
> to detect failed components.
>
> Since prometheus is meant to be single server, there is no HA per se in
> prometheus. That makes handling prometheus on standalone machines a bit
> awkward, or you'd have a infrastructure taking care of that.
>
> In your tests with gnocchi, collectd and grafana, I bet you used swift
> as backend for gnocchi storage. That is not a good idea and may lead to
> bad performance.
>
> Matthias
>
> --
> Matthias Runge <mrunge at redhat.com <mailto:mrunge at redhat.com>>
>
> Red Hat GmbH, http://www.de.redhat.com/ <http://www.de.redhat.com/>,
> Registered seat: Grasbrunn,
> Commercial register: Amtsgericht Muenchen, HRB 153243,
> Man.Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael
> O'Neil
>
> _______________________________________________
> users mailing list
> users at lists.rdoproject.org <mailto:users at lists.rdoproject.org>
> http://lists.rdoproject.org/mailman/listinfo/users
> <http://lists.rdoproject.org/mailman/listinfo/users>
>
> To unsubscribe: users-unsubscribe at lists.rdoproject.org
> <mailto:users-unsubscribe at lists.rdoproject.org>
>
--
Matthias Runge <mrunge at redhat.com>
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Man.Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neil
More information about the users
mailing list