[rdo-users] Poor Ceph Performance

Cody codeology.lab at gmail.com
Mon Nov 26 18:09:43 UTC 2018


Hi John,

Thank you so much for the reply. I will make a post to the Ceph ML and
add the link back to this thread.

Here I am attaching the cluster specs below just for the reference. It
uses 9 baremetal nodes (1 Undercloud, 3 Controllers (HA), 3 Ceph, 2
Compute) with following details:

Undercloud & Compute nodes:
CPU: E3-1230V2 @3.7GHz
RAM: 16GB
Ports: 1Gbps for provisioning; 1Gbps for external/VLANs

Controller nodes (with Ceph mon & mgr) :
CPU: 2 x E5-2603 @1.8GHz
RAM: 16GB
Ports: 1Gbps for provisioning; 1Gbps for VLANs

Ceph nodes:
CPU: 2 x E5-2603 @1.8GHz
RAM: 16GB
Ports: 1Gbps for provisioning; 1Gbps for VLANs
Journaling: 1 SSD (SATA3, consumer grade)
OSDs: 2 x 2TB @ 7200rpm (SATA3, consumer grade)

Switch:
HUAWEI S1700 Series (24 x 1Gbps ports, 56Gbps switching capacity)

The gears are old and under-configured especially for their RAM
capacity. But this is just for PoC with minimal usages and with no
sign of CPU/RAM starvation during the test.

On the software side, it is running Queens release. The ceph-ansible
version is 3.1.6 and is using filestore with non-collocated setup.


Best regards,
Cody



On Mon, Nov 26, 2018 at 9:13 AM John Fulton <johfulto at redhat.com> wrote:
>
> On Sun, Nov 25, 2018 at 11:29 PM Cody <codeology.lab at gmail.com> wrote:
> >
> > Hello,
> >
> > My tripleO cluster is deployed with Ceph. Both Cinder and Nova use RBD
> > as backend. While all essential functions work, services involving
> > Ceph are getting very poor performance. E.g., it takes several hours
> > to upload an 8GB image into Cinder and about 20 minutes to completely
> > boot up an instance (from launch to ssh ready).
> >
> > Running 'ceph -s' shows a top write speed at 6~700 KiB/s during image
> > upload and read speed 2 MiB/s during instance launch.
> >
> > I used the default scheme for network isolation and a single 1G port
> > for all VLAN traffics on each overcloud node. I haven't set jumbo
> > frame on the storage network VLAN yet, but think the performance
> > should not be this bad with MTU 1500. Something must be wrong. Any
> > suggestions for debugging?
>
> Hi Cody,
>
> If you're using queens or rocky, then ceph luminous was deployed in
> containers. Though tripleo did the overall deployment, ceph-ansible
> would have done the actual ceph deployment and configuration and you
> can determine the ceph-ansible version via 'rpm -q ceph-ansible' on
> your undercloud. It probably makes sense for you to pass along what
> you mentioned above in addition to some other info, which I'll note
> below, to the ceph-users list
> (http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com), who will be
> focused on ceph itself. When you contact them (I'm on the list too)
> also let them know the following:
>
> 1. How many OSD servers you have and how many OSDs per server
> 2. What type of disks you're using per OSD and how you set up journaling
> 3. Specs of your servers themselves (OpenStack controller servers w/
> CPU X and Ram Y for Ceph monitors and Ceph Storage servers RAM/CPU
> info)
> 4. Did you override the RAM/CPU for the Mon, Mgr, and OSD containers?
> If so, what did you override them to?
>
> TripleO can pass any parameter you would normally pass to ceph-ansible
> as described in the following:
>
>  https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html#customizing-ceph-conf-with-ceph-ansible
>
> So if you let them know things in terms of a containerized
> ceph-ansible luminous deployment and the ceph.conf and they have
> suggestions, then you can apply the suggestions back to ceph-ansible
> through tripleo as described above. If you start troubleshooting the
> cluster as per this troubleshooting guide [2] and share the results
> that would also help.
>
> I've gotten better performance than you describe on a completely
> virtualized deployment using my PC [1] using quickstart with the
> defaults that TripleO passes using queens and rocky. Though, TripleO
> tends to favor the defaults which ceph-ansible uses. However, with a
> single 1G port for all network traffic I don't expect great
> performance.
>
> Feel free to CC me when you email ceph-users and feel free to share on
> rdo-users a link to the thread you started there in case anyone else
> on this list is interested.
>
>   John
>
> [1] http://blog.johnlikesopenstack.com/2018/08/pc-for-tripleo-quickstart.html
> [2] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/pdf/troubleshooting_guide/Red_Hat_Ceph_Storage-3-Troubleshooting_Guide-en-US.pdf
>
> > Thank you very much.
> >
> > Best regards,
> > Cody
> > _______________________________________________
> > users mailing list
> > users at lists.rdoproject.org
> > http://lists.rdoproject.org/mailman/listinfo/users
> >
> > To unsubscribe: users-unsubscribe at lists.rdoproject.org


More information about the users mailing list