[rdo-users] Poor Ceph Performance

John Fulton johfulto at redhat.com
Mon Nov 26 14:13:41 UTC 2018


On Sun, Nov 25, 2018 at 11:29 PM Cody <codeology.lab at gmail.com> wrote:
>
> Hello,
>
> My tripleO cluster is deployed with Ceph. Both Cinder and Nova use RBD
> as backend. While all essential functions work, services involving
> Ceph are getting very poor performance. E.g., it takes several hours
> to upload an 8GB image into Cinder and about 20 minutes to completely
> boot up an instance (from launch to ssh ready).
>
> Running 'ceph -s' shows a top write speed at 6~700 KiB/s during image
> upload and read speed 2 MiB/s during instance launch.
>
> I used the default scheme for network isolation and a single 1G port
> for all VLAN traffics on each overcloud node. I haven't set jumbo
> frame on the storage network VLAN yet, but think the performance
> should not be this bad with MTU 1500. Something must be wrong. Any
> suggestions for debugging?

Hi Cody,

If you're using queens or rocky, then ceph luminous was deployed in
containers. Though tripleo did the overall deployment, ceph-ansible
would have done the actual ceph deployment and configuration and you
can determine the ceph-ansible version via 'rpm -q ceph-ansible' on
your undercloud. It probably makes sense for you to pass along what
you mentioned above in addition to some other info, which I'll note
below, to the ceph-users list
(http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com), who will be
focused on ceph itself. When you contact them (I'm on the list too)
also let them know the following:

1. How many OSD servers you have and how many OSDs per server
2. What type of disks you're using per OSD and how you set up journaling
3. Specs of your servers themselves (OpenStack controller servers w/
CPU X and Ram Y for Ceph monitors and Ceph Storage servers RAM/CPU
info)
4. Did you override the RAM/CPU for the Mon, Mgr, and OSD containers?
If so, what did you override them to?

TripleO can pass any parameter you would normally pass to ceph-ansible
as described in the following:

 https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html#customizing-ceph-conf-with-ceph-ansible

So if you let them know things in terms of a containerized
ceph-ansible luminous deployment and the ceph.conf and they have
suggestions, then you can apply the suggestions back to ceph-ansible
through tripleo as described above. If you start troubleshooting the
cluster as per this troubleshooting guide [2] and share the results
that would also help.

I've gotten better performance than you describe on a completely
virtualized deployment using my PC [1] using quickstart with the
defaults that TripleO passes using queens and rocky. Though, TripleO
tends to favor the defaults which ceph-ansible uses. However, with a
single 1G port for all network traffic I don't expect great
performance.

Feel free to CC me when you email ceph-users and feel free to share on
rdo-users a link to the thread you started there in case anyone else
on this list is interested.

  John

[1] http://blog.johnlikesopenstack.com/2018/08/pc-for-tripleo-quickstart.html
[2] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/pdf/troubleshooting_guide/Red_Hat_Ceph_Storage-3-Troubleshooting_Guide-en-US.pdf

> Thank you very much.
>
> Best regards,
> Cody
> _______________________________________________
> users mailing list
> users at lists.rdoproject.org
> http://lists.rdoproject.org/mailman/listinfo/users
>
> To unsubscribe: users-unsubscribe at lists.rdoproject.org


More information about the users mailing list