[rdo-list] CentOS OpsTools (logging, monitoring, etc.) SIG proposal

Sat May 21 00:01:07 UTC 2016

On 05/20/2016 05:55 PM, Arash Kaffamanesh wrote:
> Great!
>
> I'm currently working on how to bring log- and application- 
> performance monitoring under the same roof for cloud-native and highly 
> distributed applications on top of OpenStack w/ Cloud Foundry or 
> OpenShift and Kubernetes add-ons and define some best practices 
> (needs) to build a simple, though effective cloud native application 
> monitoring solution for BizDevOps (yet another buzz :-)).
>
> My 10 BizDevOps needs are:
>
>  1. Bring log and performance monitoring under the same roof, by
>     providing a seamless correlation between log and performance metrics.
>  2. Provide intuitive pre-built monitoring interfaces and dashboards
>     for everybody and for different roles and organizations
>     (BizDevOps) (note: people lack the time and sometimes the skills
>     to configure a monitoring tool).
>  3. Build dedicated dashboards for transaction and correlation
>     analysis to figure out the usual suspects like, memory leaks,
>     garbage collection, saturated thread pools and hundreds of unusual
>     suspects which might be the root cause of problems.
>  4. Enhance the quality of logs (on paas and apps level) and define
>     custom metrics which are specific to our cloud-native applications
>     and visualize these metrics on custom dashboards for tenants w/
>     different roles.
>  5. Analyze long term-trends such as how big is my database and how
>     fast is it growing? How quickly is my daily-active user count growing?
>  6. Implement innovative ideas such as data mining, forecasting and
>     advanced analytics support to provide added value to the
>     monitoring solution.
>  7. Get alerts on issues before customers notice, use the monitoring
>     tool as an early warning system, and analyze application
>     performance before and after new code deployments.
>  8. If using remediation actions which are triggered through the
>     monitoring solution, first require human approval before the
>     script is executed (this provides a better understanding of the
>     root cause of the problem and how to eliminate it in long term).
>  9. Implement a simple, though an effective alerting system with clear
>     alerting escalation path and low noise (rules that generate alerts
>     for developers or operators should be simple to understand and
>     represent a clear failure).
> 10. Combine heavy use of white-box monitoring with modest but critical
>     uses of black-box monitoring and learn from others like Google
>     about how they are monitoring their highly distributed systems:
>     https://www.oreilly.com/ideas/monitoring-distributed-systems
>

These are good.

> To achieve the above needs, I'm investigating the following tools to 
> bring log and performance monitoring under the same roof for my 
> current needs:
>
>   * ELK / EFK Stack
>   * Hawkular: http://www.hawkular.org/
>

EFK is already being used by OpenShift and RDO, and Hawkular is already 
being used by OpenShift - these will be among our first packages to support.

>   * Stagemonitor http://www.stagemonitor.org/
>   * cAdvisor https://github.com/google/cadvisor
>
>
> and I think these BizDevOps-Tools might be the right choice to start 
> with and I'd be happy to be of help.
>
> Cheers,
> Arash
>
>
> On Fri, May 20, 2016 at 9:08 PM, Matthias Runge <mrunge at redhat.com 
> <mailto:mrunge at redhat.com>> wrote:
>
>     On 20/05/16 16:12, Rich Megginson wrote:
>     > We are trying to start up a CentOS OpsTools SIG
>     > https://wiki.centos.org/SpecialInterestGroup for logging,
>     monitoring, etc.
>     >
>     > The intention is that this would be the upstream for development and
>     > packaging of tools related to logging (EFK stack, etc.),
>     monitoring, and
>     > other opstools, as a single place where packages can be consumed
>     by RDO,
>     > OpenShift Origin, and other upstream projects - pool our resources,
>     > share the lessons learned, and enable cross project log
>     aggregation and
>     > correlation (e.g. running OpenShift on top of OpenStack on top of
>     > Ceph/Gluster - do my OpenShift application errors correlate with
>     Nova
>     > errors?  file system errors?).  This would also be a place for
>     > installers (puppet manifests, ansible playbooks), and possibly
>     > testing/CI and containers.
>     >
>     > If you are interested, please chime in in the email thread:
>     > https://lists.centos.org/pipermail/centos-devel/2016-May/014777.html
>     >
>     Thank you for the reminder, Rich.
>
>     We already have quite a few interested persons. The reason, why I
>     didn't
>     mention this here was, that it has a broader focus than just RDO.
>
>     On the other side, it clearly will be usable with RDO, and it will
>     help
>     RDO operators to get to the root of occurring issues.
>
>     If any of you is interested or can help, please join us on
>     centos-devel
>     mailing list and express your interest there. It will help us to speed
>     things up.
>     --
>     Matthias Runge <mrunge at redhat.com <mailto:mrunge at redhat.com>>
>
>     Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
>     Commercial register: Amtsgericht Muenchen, HRB 153243,
>     Managing Directors: Paul Argiry, Charles Cachera, Michael Cunningham,
>                         Michael O'Neill
>
>     _______________________________________________
>     rdo-list mailing list
>     rdo-list at redhat.com <mailto:rdo-list at redhat.com>
>     https://www.redhat.com/mailman/listinfo/rdo-list
>
>     To unsubscribe: rdo-list-unsubscribe at redhat.com
>     <mailto:rdo-list-unsubscribe at redhat.com>
>
>
>
>
> _______________________________________________
> rdo-list mailing list
> rdo-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rdo-list
>
> To unsubscribe: rdo-list-unsubscribe at redhat.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20160520/67d9c9d5/attachment.html>