[ceph-users] Bluestore performance 50% of filestore

David Turner drakonstein at gmail.com
Tue Nov 14 15:13:11 PST 2017


I'd probably say 50GB to leave some extra space over-provisioned.  50GB
should definitely prevent any DB operations from spilling over to the HDD.

On Tue, Nov 14, 2017, 5:43 PM Milanov, Radoslav Nikiforov <radonm at bu.edu>
wrote:

> Thank you,
>
> It is 4TB OSDs and they might become full someday, I’ll try 60GB db
> partition – this is the max OSD capacity.
>
>
>
> - Rado
>
>
>
> *From:* David Turner [mailto:drakonstein at gmail.com]
> *Sent:* Tuesday, November 14, 2017 5:38 PM
>
>
> *To:* Milanov, Radoslav Nikiforov <radonm at bu.edu>
>
> *Cc:* Mark Nelson <mnelson at redhat.com>; ceph-users at lists.ceph.com
>
>
> *Subject:* Re: [ceph-users] Bluestore performance 50% of filestore
>
>
>
> You have to configure the size of the db partition in the config file for
> the cluster.  If you're db partition is 1GB, then I can all but guarantee
> that you're using your HDD for your blocks.db very quickly into your
> testing.  There have been multiple threads recently about what size the db
> partition should be and it seems to be based on how many objects your OSD
> is likely to have on it.  The recommendation has been to err on the side of
> bigger.  If you're running 10TB OSDs and anticipate filling them up, then
> you probably want closer to an 80GB+ db partition.  That's why I asked how
> full your cluster was and how large your HDDs are.
>
>
>
> Here's a link to one of the recent ML threads on this topic.
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020822.html
>
> On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov <radonm at bu.edu>
> wrote:
>
> Block-db partition is the default 1GB (is there a way to modify this?
> journals are 5GB in filestore case) and usage is low:
>
>
>
> [root at kumo-ceph02 ~]# ceph df
>
> GLOBAL:
>
>     SIZE        AVAIL      RAW USED     %RAW USED
>
>     100602G     99146G        1455G          1.45
>
> POOLS:
>
>     NAME              ID     USED       %USED     MAX AVAIL     OBJECTS
>
>     kumo-vms          1      19757M      0.02        31147G        5067
>
>     kumo-volumes      2        214G      0.18        31147G       55248
>
>     kumo-images       3        203G      0.17        31147G       66486
>
>     kumo-vms3         11     45824M      0.04        31147G       11643
>
>     kumo-volumes3     13     10837M         0        31147G        2724
>
>     kumo-images3      15     82450M      0.09        31147G       10320
>
>
>
> - Rado
>
>
>
> *From:* David Turner [mailto:drakonstein at gmail.com]
> *Sent:* Tuesday, November 14, 2017 4:40 PM
> *To:* Mark Nelson <mnelson at redhat.com>
> *Cc:* Milanov, Radoslav Nikiforov <radonm at bu.edu>;
> ceph-users at lists.ceph.com
>
>
> *Subject:* Re: [ceph-users] Bluestore performance 50% of filestore
>
>
>
> How big was your blocks.db partition for each OSD and what size are your
> HDDs?  Also how full is your cluster?  It's possible that your blocks.db
> partition wasn't large enough to hold the entire db and it had to spill
> over onto the HDD which would definitely impact performance.
>
>
>
> On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson <mnelson at redhat.com> wrote:
>
> How big were the writes in the windows test and how much concurrency was
> there?
>
> Historically bluestore does pretty well for us with small random writes
> so your write results surprise me a bit.  I suspect it's the low queue
> depth.  Sometimes bluestore does worse with reads, especially if
> readahead isn't enabled on the client.
>
> Mark
>
> On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
> > Hi Mark,
> > Yes RBD is in write back, and the only thing that changed was converting
> OSDs to bluestore. It is 7200 rpm drives and triple replication. I also get
> same results (bluestore 2 times slower) testing continuous writes on a 40GB
> partition on a Windows VM, completely different tool.
> >
> > Right now I'm going back to filestore for the OSDs so additional tests
> are possible if that helps.
> >
> > - Rado
> >
> > -----Original Message-----
> > From: ceph-users [mailto:ceph-users-bounces at lists.ceph.com] On Behalf
> Of Mark Nelson
> > Sent: Tuesday, November 14, 2017 4:04 PM
> > To: ceph-users at lists.ceph.com
> > Subject: Re: [ceph-users] Bluestore performance 50% of filestore
> >
> > Hi Radoslav,
> >
> > Is RBD cache enabled and in writeback mode?  Do you have client side
> readahead?
> >
> > Both are doing better for writes than you'd expect from the native
> performance of the disks assuming they are typical 7200RPM drives and you
> are using 3X replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small
> file size, I'd expect that you might be getting better journal coalescing
> in filestore.
> >
> > Sadly I imagine you can't do a comparison test at this point, but I'd be
> curious how it would look if you used libaio with a high iodepth and a much
> bigger partition to do random writes over.
> >
> > Mark
> >
> > On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:
> >> Hi
> >>
> >> We have 3 node, 27 OSDs cluster running Luminous 12.2.1
> >>
> >> In filestore configuration there are 3 SSDs used for journals of 9
> >> OSDs on each hosts (1 SSD has 3 journal paritions for 3 OSDs).
> >>
> >> I've converted filestore to bluestore by wiping 1 host a time and
> >> waiting for recovery. SSDs now contain block-db - again one SSD
> >> serving
> >> 3 OSDs.
> >>
> >>
> >>
> >> Cluster is used as storage for Openstack.
> >>
> >> Running fio on a VM in that Openstack reveals bluestore performance
> >> almost twice slower than filestore.
> >>
> >> fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G
> >> --numjobs=2 --time_based --runtime=180 --group_reporting
> >>
> >> fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G
> >> --numjobs=2 --time_based --runtime=180 --group_reporting
> >>
> >>
> >>
> >>
> >>
> >> Filestore
> >>
> >>   write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec
> >>
> >>   write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec
> >>
> >>   write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec
> >>
> >>
> >>
> >>   read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec
> >>
> >>   read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec
> >>
> >>   read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec
> >>
> >>
> >>
> >> Bluestore
> >>
> >>   write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec
> >>
> >>   write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec
> >>
> >>   write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec
> >>
> >>
> >>
> >>   read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec
> >>
> >>   read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec
> >>
> >>   read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec
> >>
> >>
> >>
> >>
> >>
> >> - Rado
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users at lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users at lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171114/dc430dc1/attachment.html>


More information about the ceph-users mailing list