<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Matthias,<div><br></div><div>Thanks a lot for your answer.</div><div>Yes, you win the bet :) I've used swift and currently struggling to disable collectd to make my cloud usable again! :))</div><div><br></div><div>I've seen this STF (Service Telemetry Framework) but it seems a little bit too complicated. I should implement an OKD cluster to monitor my openstack, isn't it too much work?</div><div>Have you tried it yourself?</div><div><br></div><div>If I understand correctly, with your first and main opinion you mean adding this files to my overcloud deploy command:</div><div><br></div><div>/usr/share/openstack-tripleo-heat-templates/environments/enable-legacy-telemetry.yaml<br></div><div>/usr/share/openstack-tripleo-heat-templates/environments/services/collectd.yaml<br></div><div><br></div><div>and for performance tuning I've checked this page:</div><div><a href="https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/deployment_recommendations_for_specific_red_hat_openstack_platform_services/config-recommend-telemetry_config-recommend-telemetry#config_telemetry-small-overcloud_config-recommend-telemetry">https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/deployment_recommendations_for_specific_red_hat_openstack_platform_services/config-recommend-telemetry_config-recommend-telemetry#config_telemetry-small-overcloud_config-recommend-telemetry</a><br></div><div><br></div><div>Is that what you mean?</div><div>If so I should make my cloud usable again and just change GnocchiBackend to a path to a file on a shared file system (i.e. NFS) because I have 4 controller nodes, because the rest is exactly what I've done up to now.</div><div><br></div><div>Thanks a lot,</div><div>Khodayar</div></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Oct 23, 2020 at 10:01 AM Matthias Runge <<a href="mailto:mrunge@redhat.com">mrunge@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 22/10/2020 17:46, Khodayar Doustar wrote:<br>

> Hi everybody,<br>

> <br>

> I am searching for a good and useful method to monitor my 40 nodes cloud.<br>

> <br>

> I have tried<br>

> <br>

> - Prometheus + Grafana (with<br>

> <a href="https://github.com/openstack-exporter/openstack-exporter" rel="noreferrer" target="_blank">https://github.com/openstack-exporter/openstack-exporter</a><br>

> <<a href="https://github.com/openstack-exporter/openstack-exporter" rel="noreferrer" target="_blank">https://github.com/openstack-exporter/openstack-exporter</a>>) but it<br>

> cannot monitor nodes load and cpu usage etc.<br>

> and <br>

> - Gnocchi +Collectd + Grafana but it enforces unbelievable load on nodes<br>

> and make the whole cloud completely unusable!<br>

> <br>

> I've tried to use Graphite + Grafana but I failed.<br>

> <br>

> Do you have any suggestions?<br>

<br>

<br>

Hi,<br>

<br>

yes, I have some opinions here.<br>

<br>

My proposal here is:<br>

<br>

- use collectd to collect low level metrics from your baremetal machines<br>

- use ceilometer to collect OpenStack related info, like project usage,<br>

etc. That is nothing you'd get by using node-exporter<br>

- hook them both together and send metrics over to something called<br>

Service Telemetry Framework. The configuration *is* included in tripleo.<br>

The website has documentation available<br>

<a href="https://infrawatch.github.io/documentation" rel="noreferrer" target="_blank">https://infrawatch.github.io/documentation</a><br>

- graphite + grafana (plus collectd) is also a single node setup and<br>

won't provide you reliability.<br>

- collectd also provides the ability to send events, which can be acted<br>

on. That is not included if you use node-exporter, openstack-exporter<br>

etc. Prometheus monitoring creates events from metrics, but will be slow<br>

to detect failed components.<br>

<br>

Since prometheus is meant to be single server, there is no HA per se in<br>

prometheus. That makes handling prometheus on standalone machines a bit<br>

awkward, or you'd have a infrastructure taking care of that.<br>

<br>

In your tests with gnocchi, collectd and grafana, I bet you used swift<br>

as backend for gnocchi storage. That is not a good idea and may lead to<br>

bad performance.<br>

<br>

Matthias<br>

<br>

-- <br>

Matthias Runge <<a href="mailto:mrunge@redhat.com" target="_blank">mrunge@redhat.com</a>><br>

<br>

Red Hat GmbH, <a href="http://www.de.redhat.com/" rel="noreferrer" target="_blank">http://www.de.redhat.com/</a>, Registered seat: Grasbrunn,<br>

Commercial register: Amtsgericht Muenchen, HRB 153243,<br>

Man.Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neil<br>

<br>

_______________________________________________<br>

users mailing list<br>

<a href="mailto:users@lists.rdoproject.org" target="_blank">users@lists.rdoproject.org</a><br>

<a href="http://lists.rdoproject.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.rdoproject.org/mailman/listinfo/users</a><br>

<br>

To unsubscribe: <a href="mailto:users-unsubscribe@lists.rdoproject.org" target="_blank">users-unsubscribe@lists.rdoproject.org</a><br>

</blockquote></div>