[rdo-users] Poor Ceph Performance

Tue Nov 27 18:47:56 UTC 2018

Hi there,

The performance issue was caused by a failed OS drive on one of the
storage nodes. Here is a link [1] to the thread at ceph-ansible ML
with useful tips on using the 'fio' to test storage devices, in case
anyone is interested.

Thank you very much to all.

Best regards,
Cody

[1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-November/031547.html

On Mon, Nov 26, 2018 at 1:13 PM Cody <codeology.lab at gmail.com> wrote:
>
> Hi Donny,
>
> Thank you for the reply.
>
> > What kind of images are you using?
> The image used is CentOS 7 cloud image in RAW format (approx. 8GB in size).
>
> >Also how are you uploading the images?
> I was uploading the image file from the undercloud node.
>
> Thank you very much.
>
> Best regards,
> Cody
> On Mon, Nov 26, 2018 at 10:57 AM Donny Davis <donny at fortnebula.com> wrote:
> >
> > Also how are you uploading the images?
> >
> > On Mon, Nov 26, 2018 at 10:54 AM Donny Davis <donny at fortnebula.com> wrote:
> >>
> >> What kind of images are you using?
> >>
> >> On Mon, Nov 26, 2018 at 9:14 AM John Fulton <johfulto at redhat.com> wrote:
> >>>
> >>> On Sun, Nov 25, 2018 at 11:29 PM Cody <codeology.lab at gmail.com> wrote:
> >>> >
> >>> > Hello,
> >>> >
> >>> > My tripleO cluster is deployed with Ceph. Both Cinder and Nova use RBD
> >>> > as backend. While all essential functions work, services involving
> >>> > Ceph are getting very poor performance. E.g., it takes several hours
> >>> > to upload an 8GB image into Cinder and about 20 minutes to completely
> >>> > boot up an instance (from launch to ssh ready).
> >>> >
> >>> > Running 'ceph -s' shows a top write speed at 6~700 KiB/s during image
> >>> > upload and read speed 2 MiB/s during instance launch.
> >>> >
> >>> > I used the default scheme for network isolation and a single 1G port
> >>> > for all VLAN traffics on each overcloud node. I haven't set jumbo
> >>> > frame on the storage network VLAN yet, but think the performance
> >>> > should not be this bad with MTU 1500. Something must be wrong. Any
> >>> > suggestions for debugging?
> >>>
> >>> Hi Cody,
> >>>
> >>> If you're using queens or rocky, then ceph luminous was deployed in
> >>> containers. Though tripleo did the overall deployment, ceph-ansible
> >>> would have done the actual ceph deployment and configuration and you
> >>> can determine the ceph-ansible version via 'rpm -q ceph-ansible' on
> >>> your undercloud. It probably makes sense for you to pass along what
> >>> you mentioned above in addition to some other info, which I'll note
> >>> below, to the ceph-users list
> >>> (http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com), who will be
> >>> focused on ceph itself. When you contact them (I'm on the list too)
> >>> also let them know the following:
> >>>
> >>> 1. How many OSD servers you have and how many OSDs per server
> >>> 2. What type of disks you're using per OSD and how you set up journaling
> >>> 3. Specs of your servers themselves (OpenStack controller servers w/
> >>> CPU X and Ram Y for Ceph monitors and Ceph Storage servers RAM/CPU
> >>> info)
> >>> 4. Did you override the RAM/CPU for the Mon, Mgr, and OSD containers?
> >>> If so, what did you override them to?
> >>>
> >>> TripleO can pass any parameter you would normally pass to ceph-ansible
> >>> as described in the following:
> >>>
> >>>  https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html#customizing-ceph-conf-with-ceph-ansible
> >>>
> >>> So if you let them know things in terms of a containerized
> >>> ceph-ansible luminous deployment and the ceph.conf and they have
> >>> suggestions, then you can apply the suggestions back to ceph-ansible
> >>> through tripleo as described above. If you start troubleshooting the
> >>> cluster as per this troubleshooting guide [2] and share the results
> >>> that would also help.
> >>>
> >>> I've gotten better performance than you describe on a completely
> >>> virtualized deployment using my PC [1] using quickstart with the
> >>> defaults that TripleO passes using queens and rocky. Though, TripleO
> >>> tends to favor the defaults which ceph-ansible uses. However, with a
> >>> single 1G port for all network traffic I don't expect great
> >>> performance.
> >>>
> >>> Feel free to CC me when you email ceph-users and feel free to share on
> >>> rdo-users a link to the thread you started there in case anyone else
> >>> on this list is interested.
> >>>
> >>>   John
> >>>
> >>> [1] http://blog.johnlikesopenstack.com/2018/08/pc-for-tripleo-quickstart.html
> >>> [2] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/pdf/troubleshooting_guide/Red_Hat_Ceph_Storage-3-Troubleshooting_Guide-en-US.pdf
> >>>
> >>> > Thank you very much.
> >>> >
> >>> > Best regards,
> >>> > Cody
> >>> > _______________________________________________
> >>> > users mailing list
> >>> > users at lists.rdoproject.org
> >>> > http://lists.rdoproject.org/mailman/listinfo/users
> >>> >
> >>> > To unsubscribe: users-unsubscribe at lists.rdoproject.org
> >>> _______________________________________________
> >>> users mailing list
> >>> users at lists.rdoproject.org
> >>> http://lists.rdoproject.org/mailman/listinfo/users
> >>>
> >>> To unsubscribe: users-unsubscribe at lists.rdoproject.org