[rdo-dev] Graphite and Grafana in RDO-Cloud

Wesley Hayutin whayutin at redhat.com
Mon Jan 8 21:09:43 UTC 2018


On Sun, Jan 7, 2018 at 11:24 PM, Tristan Cacqueray <tdecacqu at redhat.com>
wrote:

> Hello David and Wesley, please find some comments inlined bellow.
>
> On January 5, 2018 6:39 pm, Wesley Hayutin wrote:
>
>> On Fri, Jan 5, 2018 at 12:36 PM, David Moreau Simard <dms at redhat.com>
>> wrote:
>>
>> There are already plans [1] to add the software factory implementation of
>>> Grafana on review.rdoproject.org, you can see what it looks like on
>>> softwarefactory-project.io [2].
>>>
>>> The backend to this grafana implementation is currently influxdb, not
>>> graphite.
>>> However, there are ongoing discussions to either both graphite and
>>> influxdb simultaneously or optionally either.
>>>
>>> We're interested in leveraging this influxdb (or graphite) and grafana
>>> implementation for monitoring data in general (uptime, resources, disk
>>> space, load, etc.) so our goals align here.
>>> We both agree that using graphite would be a plus in order to re-use the
>>> same queries in the grafana dashboard but at the same time, influxdb is
>>> more "modern" and easier to work with -- this is why we might end up
>>> deploying both, we'll see.
>>>
>>> [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1514086
>>> [2]: https://softwarefactory-project.io/grafana/
>>>
>>>
> Note that the current influxdb/grafana integration is for instance system
> metric (cpu, mem, network and i/o). We are working on getting zuul and
> nodepool metric but the upstream query needs to be adapted for influxdb,
> hence we may look at integrating graphite/carbon too so that is easier.
> There is also this tool that can make influxdb a backend for graphite:
> https://github.com/InfluxGraph/influxgraph
>
> Also note that we are integrating grafyaml to the config repo so that
> grafana dashboards can be proposed and updated by regular user too.
>
>
>>> This is great news David, thank you for sharing.
>> Given that this is already in plan software factory and we have an
>> immediate need I'm wondering how to proceed.
>> Does the RDO Infra team have an estimate when graphite/influxdb/grafana
>> will be moved to production?
>>
>
> While we could setup the grafana/influxdb service, and we should
> in the near future, it seems like this ci use-case needs some more
> tinkering and I think it would be easier to start with another
> dedicated setup until the requirements are better defined.
>
>
> Some possibilities come to mind, depending on when it moves to prod
>>
>> 1.  The TripleO-CI team waits for prod
>> 2.  TripleO CI would stand up a test instance of graphite/influxdb and
>> grapha and start to work out what we need to send and how to send data
>> 3.  Is it possible to use the stage instance RDO SF as a testbed for
>> TripleO-CI's work?  Meaning we send metrics and use the stage instance
>> with
>> a backing up the data in mind?
>>
>> What do you think?
>> Thanks
>>
>>
>>
> I think 1. will happen shortly, and this will bring a grafana setup
> accessible from the top-menu.
>
> Though I think 2. is probably easier to begin with, and we could
> configure the new graphite/influxdb backend in the existing grafana.
>
> Not sure what you mean by 3. If there is a graphite/influxdb service in
> rdo-prod tenant, then you could use it for tripleo-ci work of course.
> The backup of RDO SF is managed by this playbook:
> https://softwarefactory-project.io/r/gitweb?p=software-
> factory/sf-ops.git;a=blob;f=backup/ansible/backup.yml
> We could add_host the new backend and backup it's data similarly.
>
>
> Here are some more thoughts:
>
> Dependending on how the metrics are pushed, we may need some kind of
> authorization mechanism and a job secret to allow external clients to
> push new metrics.
>
> It seems like we could setup post run to push job metrics. Perhaps
> we could leverage ara sqldump to extract per task duration.
>
> Software Factory may also automatically setup job duration graph
> dashboard per project, here is a new user-story to track this work:
> https://tree.taiga.io/project/morucci-software-factory/us/897
>
>
> Alternatively, we could also use the zuul sql reporter database, which
> already record the start/end time of each job. Here is a gnuplot of that
> data:
> https://fedorapeople.org/~tdecacqu/tripleo-ci/periodic-tripl
> eo-ci-centos-7-multinode-1ctlr-featureset018-pike.png
> This could probably be integrated in the zuul-web dashboard upstream.
>
> Alternatively, the elasticsearch data could also be used to constructed a
> similar graph in kibana, though it seems like it's missing a duration
> field.
>
>
> Regards,
> -Tristan


Thanks for the feedback David, Tristan.
We will be discussing your feedback tomorrow directly after the tripleo
meeting on #tripleo.

You guys are always welcome to join, just ping on #oooq / #tripleo for
details about the meeting.
We're going to spend about 20min in a Q&A session about the tools.

We'll follow up with our plans to this thread.

Thanks all!


>
>
>
>>
>>
>>>
>>> David Moreau Simard
>>> Senior Software Engineer | OpenStack RDO
>>>
>>> dmsimard = [irc, github, twitter]
>>>
>>> On Fri, Jan 5, 2018 at 12:13 PM, Wesley Hayutin <whayutin at redhat.com>
>>> wrote:
>>>
>>> Greetings,
>>>>
>>>> At the end of 2017, a number of the upstream multinode scenario jobs
>>>> started to run over our required deployment times [1].  In an effort to
>>>> better understand the performance of the deployment and CI the tripleo
>>>> cores requested that a Graphite and Grafana server be stood up such
>>>> that we
>>>> can analyze the core issues more effectively.
>>>>
>>>> There is a certain amount of urgency with the issue as our upstream
>>>> coverage is impacted.  The TripleO-CI team is working on the deployment
>>>> of
>>>> both tools in a dev-ops style in RDO-Cloud this sprint.  Nothing yet has
>>>> been deployed.
>>>>
>>>> The TripleO CI team is also working with upstream infra to send metric
>>>> and data to the upstream Graphite and Grafana servers.  It is not clear
>>>> yet
>>>> if we have permission or access to the upstream tools.
>>>>
>>>> I wanted to publically announce this work to the RDO infra community to
>>>> inform and to gather any feedback anyone may have.  There are two
>>>> scopes of
>>>> work here, the initial tooling to stand up the infra and the longer term
>>>> maintenance of the tools.  Perhaps there are plans to build these into
>>>> RDO
>>>> SF already.. etc.
>>>>
>>>> Please reply with your comments and concerns.
>>>> Thank you!
>>>>
>>>>
>>>> [1] https://github.com/openstack-infra/tripleo-ci/commit/7a2
>>>> edf70eccfc7002d26fd1ce1eef803ce8d0ba8
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev at lists.rdoproject.org
>>>> http://lists.rdoproject.org/mailman/listinfo/dev
>>>>
>>>> To unsubscribe: dev-unsubscribe at lists.rdoproject.org
>>>>
>>>>
>>>>
>>> _______________________________________________
>> dev mailing list
>> dev at lists.rdoproject.org
>> http://lists.rdoproject.org/mailman/listinfo/dev
>>
>> To unsubscribe: dev-unsubscribe at lists.rdoproject.org
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20180108/a9a0a9b3/attachment-0001.html>


More information about the dev mailing list