[rdo-list] CentOS OpsTools (logging, monitoring, etc.) SIG proposal

Fri May 20 23:55:23 UTC 2016

Great!

I'm currently working on how to bring log- and application- performance
monitoring under the same roof for cloud-native and highly distributed
applications on top of OpenStack w/ Cloud Foundry or OpenShift and
Kubernetes add-ons and define some best practices (needs) to build a
simple, though effective cloud native application monitoring solution for
BizDevOps (yet another buzz :-)).

My 10 BizDevOps needs are:

   1. Bring log and performance monitoring under the same roof, by
   providing a seamless correlation between log and performance metrics.
   2. Provide intuitive pre-built monitoring interfaces and dashboards for
   everybody and for different roles and organizations (BizDevOps) (note:
   people lack the time and sometimes the skills to configure a monitoring
   tool).
   3. Build dedicated dashboards for transaction and correlation analysis
   to figure out the usual suspects like, memory leaks, garbage collection,
   saturated thread pools and hundreds of unusual suspects which might be the
   root cause of problems.
   4. Enhance the quality of logs (on paas and apps level) and define
   custom metrics which are specific to our cloud-native applications and
   visualize these metrics on custom dashboards for tenants w/ different roles.
   5. Analyze long term-trends such as how big is my database and how fast
   is it growing? How quickly is my daily-active user count growing?
   6. Implement innovative ideas such as data mining, forecasting and
   advanced analytics support to provide added value to the monitoring
   solution.
   7. Get alerts on issues before customers notice, use the monitoring tool
   as an early warning system, and analyze application performance before and
   after new code deployments.
   8. If using remediation actions which are triggered through the
   monitoring solution, first require human approval before the script is
   executed (this provides a better understanding of the root cause of the
   problem and how to eliminate it in long term).
   9. Implement a simple, though an effective alerting system with clear
   alerting escalation path and low noise (rules that generate alerts for
   developers or operators should be simple to understand and represent a
   clear failure).
   10. Combine heavy use of white-box monitoring with modest but critical
   uses of black-box monitoring and learn from others like Google about how
   they are monitoring their highly distributed systems:
   https://www.oreilly.com/ideas/monitoring-distributed-systems

To achieve the above needs, I'm investigating the following tools to bring
log and performance monitoring under the same roof for my current needs:

   - ELK / EFK Stack
   - Hawkular:  http://www.hawkular.org/
   - Stagemonitor http://www.stagemonitor.org/
   - cAdvisor https://github.com/google/cadvisor

and I think these BizDevOps-Tools might be the right choice to start with
and I'd be happy to be of help.

Cheers,
Arash

On Fri, May 20, 2016 at 9:08 PM, Matthias Runge <mrunge at redhat.com> wrote:

> On 20/05/16 16:12, Rich Megginson wrote:
> > We are trying to start up a CentOS OpsTools SIG
> > https://wiki.centos.org/SpecialInterestGroup for logging, monitoring,
> etc.
> >
> > The intention is that this would be the upstream for development and
> > packaging of tools related to logging (EFK stack, etc.), monitoring, and
> > other opstools, as a single place where packages can be consumed by RDO,
> > OpenShift Origin, and other upstream projects - pool our resources,
> > share the lessons learned, and enable cross project log aggregation and
> > correlation (e.g. running OpenShift on top of OpenStack on top of
> > Ceph/Gluster - do my OpenShift application errors correlate with Nova
> > errors?  file system errors?).  This would also be a place for
> > installers (puppet manifests, ansible playbooks), and possibly
> > testing/CI and containers.
> >
> > If you are interested, please chime in in the email thread:
> > https://lists.centos.org/pipermail/centos-devel/2016-May/014777.html
> >
> Thank you for the reminder, Rich.
>
> We already have quite a few interested persons. The reason, why I didn't
> mention this here was, that it has a broader focus than just RDO.
>
> On the other side, it clearly will be usable with RDO, and it will help
> RDO operators to get to the root of occurring issues.
>
> If any of you is interested or can help, please join us on centos-devel
> mailing list and express your interest there. It will help us to speed
> things up.
> --
> Matthias Runge <mrunge at redhat.com>
>
> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
> Commercial register: Amtsgericht Muenchen, HRB 153243,
> Managing Directors: Paul Argiry, Charles Cachera, Michael Cunningham,
>                     Michael O'Neill
>
> _______________________________________________
> rdo-list mailing list
> rdo-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rdo-list
>
> To unsubscribe: rdo-list-unsubscribe at redhat.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20160521/5e92c9a7/attachment.html>