[ceph-users] Bluestore performance 50% of filestore

David Turner drakonstein at gmail.com
Tue Nov 14 14:37:58 PST 2017


You have to configure the size of the db partition in the config file for
the cluster.  If you're db partition is 1GB, then I can all but guarantee
that you're using your HDD for your blocks.db very quickly into your
testing.  There have been multiple threads recently about what size the db
partition should be and it seems to be based on how many objects your OSD
is likely to have on it.  The recommendation has been to err on the side of
bigger.  If you're running 10TB OSDs and anticipate filling them up, then
you probably want closer to an 80GB+ db partition.  That's why I asked how
full your cluster was and how large your HDDs are.

Here's a link to one of the recent ML threads on this topic.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020822.html
On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov <radonm at bu.edu>
wrote:

> Block-db partition is the default 1GB (is there a way to modify this?
> journals are 5GB in filestore case) and usage is low:
>
>
>
> [root at kumo-ceph02 ~]# ceph df
>
> GLOBAL:
>
>     SIZE        AVAIL      RAW USED     %RAW USED
>
>     100602G     99146G        1455G          1.45
>
> POOLS:
>
>     NAME              ID     USED       %USED     MAX AVAIL     OBJECTS
>
>     kumo-vms          1      19757M      0.02        31147G        5067
>
>     kumo-volumes      2        214G      0.18        31147G       55248
>
>     kumo-images       3        203G      0.17        31147G       66486
>
>     kumo-vms3         11     45824M      0.04        31147G       11643
>
>     kumo-volumes3     13     10837M         0        31147G        2724
>
>     kumo-images3      15     82450M      0.09        31147G       10320
>
>
>
> - Rado
>
>
>
> *From:* David Turner [mailto:drakonstein at gmail.com]
> *Sent:* Tuesday, November 14, 2017 4:40 PM
> *To:* Mark Nelson <mnelson at redhat.com>
> *Cc:* Milanov, Radoslav Nikiforov <radonm at bu.edu>;
> ceph-users at lists.ceph.com
>
>
> *Subject:* Re: [ceph-users] Bluestore performance 50% of filestore
>
>
>
> How big was your blocks.db partition for each OSD and what size are your
> HDDs?  Also how full is your cluster?  It's possible that your blocks.db
> partition wasn't large enough to hold the entire db and it had to spill
> over onto the HDD which would definitely impact performance.
>
>
>
> On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson <mnelson at redhat.com> wrote:
>
> How big were the writes in the windows test and how much concurrency was
> there?
>
> Historically bluestore does pretty well for us with small random writes
> so your write results surprise me a bit.  I suspect it's the low queue
> depth.  Sometimes bluestore does worse with reads, especially if
> readahead isn't enabled on the client.
>
> Mark
>
> On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
> > Hi Mark,
> > Yes RBD is in write back, and the only thing that changed was converting
> OSDs to bluestore. It is 7200 rpm drives and triple replication. I also get
> same results (bluestore 2 times slower) testing continuous writes on a 40GB
> partition on a Windows VM, completely different tool.
> >
> > Right now I'm going back to filestore for the OSDs so additional tests
> are possible if that helps.
> >
> > - Rado
> >
> > -----Original Message-----
> > From: ceph-users [mailto:ceph-users-bounces at lists.ceph.com] On Behalf
> Of Mark Nelson
> > Sent: Tuesday, November 14, 2017 4:04 PM
> > To: ceph-users at lists.ceph.com
> > Subject: Re: [ceph-users] Bluestore performance 50% of filestore
> >
> > Hi Radoslav,
> >
> > Is RBD cache enabled and in writeback mode?  Do you have client side
> readahead?
> >
> > Both are doing better for writes than you'd expect from the native
> performance of the disks assuming they are typical 7200RPM drives and you
> are using 3X replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small
> file size, I'd expect that you might be getting better journal coalescing
> in filestore.
> >
> > Sadly I imagine you can't do a comparison test at this point, but I'd be
> curious how it would look if you used libaio with a high iodepth and a much
> bigger partition to do random writes over.
> >
> > Mark
> >
> > On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:
> >> Hi
> >>
> >> We have 3 node, 27 OSDs cluster running Luminous 12.2.1
> >>
> >> In filestore configuration there are 3 SSDs used for journals of 9
> >> OSDs on each hosts (1 SSD has 3 journal paritions for 3 OSDs).
> >>
> >> I've converted filestore to bluestore by wiping 1 host a time and
> >> waiting for recovery. SSDs now contain block-db - again one SSD
> >> serving
> >> 3 OSDs.
> >>
> >>
> >>
> >> Cluster is used as storage for Openstack.
> >>
> >> Running fio on a VM in that Openstack reveals bluestore performance
> >> almost twice slower than filestore.
> >>
> >> fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G
> >> --numjobs=2 --time_based --runtime=180 --group_reporting
> >>
> >> fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G
> >> --numjobs=2 --time_based --runtime=180 --group_reporting
> >>
> >>
> >>
> >>
> >>
> >> Filestore
> >>
> >>   write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec
> >>
> >>   write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec
> >>
> >>   write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec
> >>
> >>
> >>
> >>   read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec
> >>
> >>   read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec
> >>
> >>   read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec
> >>
> >>
> >>
> >> Bluestore
> >>
> >>   write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec
> >>
> >>   write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec
> >>
> >>   write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec
> >>
> >>
> >>
> >>   read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec
> >>
> >>   read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec
> >>
> >>   read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec
> >>
> >>
> >>
> >>
> >>
> >> - Rado
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users at lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users at lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171114/ed3905ab/attachment.html>


More information about the ceph-users mailing list