[rdo-dev] Graphite and Grafana in RDO-Cloud

Mon Jan 8 04:24:08 UTC 2018

Hello David and Wesley, please find some comments inlined bellow.

On January 5, 2018 6:39 pm, Wesley Hayutin wrote:
> On Fri, Jan 5, 2018 at 12:36 PM, David Moreau Simard <dms at redhat.com> wrote:
> 
>> There are already plans [1] to add the software factory implementation of
>> Grafana on review.rdoproject.org, you can see what it looks like on
>> softwarefactory-project.io [2].
>>
>> The backend to this grafana implementation is currently influxdb, not
>> graphite.
>> However, there are ongoing discussions to either both graphite and
>> influxdb simultaneously or optionally either.
>>
>> We're interested in leveraging this influxdb (or graphite) and grafana
>> implementation for monitoring data in general (uptime, resources, disk
>> space, load, etc.) so our goals align here.
>> We both agree that using graphite would be a plus in order to re-use the
>> same queries in the grafana dashboard but at the same time, influxdb is
>> more "modern" and easier to work with -- this is why we might end up
>> deploying both, we'll see.
>>
>> [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1514086
>> [2]: https://softwarefactory-project.io/grafana/
>>

Note that the current influxdb/grafana integration is for instance system
metric (cpu, mem, network and i/o). We are working on getting zuul and
nodepool metric but the upstream query needs to be adapted for influxdb,
hence we may look at integrating graphite/carbon too so that is easier.
There is also this tool that can make influxdb a backend for graphite:
https://github.com/InfluxGraph/influxgraph

Also note that we are integrating grafyaml to the config repo so that
grafana dashboards can be proposed and updated by regular user too.

>>
> This is great news David, thank you for sharing.
> Given that this is already in plan software factory and we have an
> immediate need I'm wondering how to proceed.
> Does the RDO Infra team have an estimate when graphite/influxdb/grafana
> will be moved to production?

While we could setup the grafana/influxdb service, and we should
in the near future, it seems like this ci use-case needs some more
tinkering and I think it would be easier to start with another
dedicated setup until the requirements are better defined.

> Some possibilities come to mind, depending on when it moves to prod
> 
> 1.  The TripleO-CI team waits for prod
> 2.  TripleO CI would stand up a test instance of graphite/influxdb and
> grapha and start to work out what we need to send and how to send data
> 3.  Is it possible to use the stage instance RDO SF as a testbed for
> TripleO-CI's work?  Meaning we send metrics and use the stage instance with
> a backing up the data in mind?
> 
> What do you think?
> Thanks
> 
> 

I think 1. will happen shortly, and this will bring a grafana setup
accessible from the top-menu.

Though I think 2. is probably easier to begin with, and we could
configure the new graphite/influxdb backend in the existing grafana.

Not sure what you mean by 3. If there is a graphite/influxdb service in
rdo-prod tenant, then you could use it for tripleo-ci work of course.
The backup of RDO SF is managed by this playbook:
https://softwarefactory-project.io/r/gitweb?p=software-factory/sf-ops.git;a=blob;f=backup/ansible/backup.yml
We could add_host the new backend and backup it's data similarly.

Here are some more thoughts:

Dependending on how the metrics are pushed, we may need some kind of
authorization mechanism and a job secret to allow external clients to
push new metrics.

It seems like we could setup post run to push job metrics. Perhaps
we could leverage ara sqldump to extract per task duration.

Software Factory may also automatically setup job duration graph
dashboard per project, here is a new user-story to track this work:
https://tree.taiga.io/project/morucci-software-factory/us/897

Alternatively, we could also use the zuul sql reporter database, which
already record the start/end time of each job. Here is a gnuplot of that
data:
https://fedorapeople.org/~tdecacqu/tripleo-ci/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset018-pike.png
This could probably be integrated in the zuul-web dashboard upstream.

Alternatively, the elasticsearch data could also be used to constructed a
similar graph in kibana, though it seems like it's missing a duration field.

Regards,
-Tristan

> 
> 
>>
>>
>> David Moreau Simard
>> Senior Software Engineer | OpenStack RDO
>>
>> dmsimard = [irc, github, twitter]
>>
>> On Fri, Jan 5, 2018 at 12:13 PM, Wesley Hayutin <whayutin at redhat.com>
>> wrote:
>>
>>> Greetings,
>>>
>>> At the end of 2017, a number of the upstream multinode scenario jobs
>>> started to run over our required deployment times [1].  In an effort to
>>> better understand the performance of the deployment and CI the tripleo
>>> cores requested that a Graphite and Grafana server be stood up such that we
>>> can analyze the core issues more effectively.
>>>
>>> There is a certain amount of urgency with the issue as our upstream
>>> coverage is impacted.  The TripleO-CI team is working on the deployment of
>>> both tools in a dev-ops style in RDO-Cloud this sprint.  Nothing yet has
>>> been deployed.
>>>
>>> The TripleO CI team is also working with upstream infra to send metric
>>> and data to the upstream Graphite and Grafana servers.  It is not clear yet
>>> if we have permission or access to the upstream tools.
>>>
>>> I wanted to publically announce this work to the RDO infra community to
>>> inform and to gather any feedback anyone may have.  There are two scopes of
>>> work here, the initial tooling to stand up the infra and the longer term
>>> maintenance of the tools.  Perhaps there are plans to build these into RDO
>>> SF already.. etc.
>>>
>>> Please reply with your comments and concerns.
>>> Thank you!
>>>
>>>
>>> [1] https://github.com/openstack-infra/tripleo-ci/commit/7a2
>>> edf70eccfc7002d26fd1ce1eef803ce8d0ba8
>>>
>>>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at lists.rdoproject.org
>>> http://lists.rdoproject.org/mailman/listinfo/dev
>>>
>>> To unsubscribe: dev-unsubscribe at lists.rdoproject.org
>>>
>>>
>>
> _______________________________________________
> dev mailing list
> dev at lists.rdoproject.org
> http://lists.rdoproject.org/mailman/listinfo/dev
> 
> To unsubscribe: dev-unsubscribe at lists.rdoproject.org
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20180108/7a6f7469/attachment.sig>