[rdo-list] [rdo cloud] Ceph issues seems to be solved

David Manchado Cuesta dmanchad at redhat.com
Thu Oct 19 13:16:02 UTC 2017


All,

We are glad to let you know the issues we've been having on the ceph
cluster (latency, performance) for the last 6w have been solved since
yesterday ~17:00 UTC.
We will keep an eye on this issue until Monday before considering it
completely solved but looks promising.

The root casue was the cache policy for the OSD disks was not the
expected one so we weren't taking any advantage on the cache on the
RAID controller.
The change might be due to the fact that servers were for some time
unplugged from the PDU and BBU might have completely discharged
reverting to default settings (just a theory).

We have confirmed that the cache policy change is persistent across
server reboots so we should not hit this problem again (crossing
fingers)

Thanks to all the colleagues that have given us a hand!

David Manchado
Senior Software Engineer - SysOps Team
Red Hat




More information about the dev mailing list