[ceph-users] BlueStore questions about workflow and performance

Alex Gorbachev ag at iss-integration.com
Tue Oct 3 06:48:28 PDT 2017


Hi Mark, great to hear from you!

On Tue, Oct 3, 2017 at 9:16 AM Mark Nelson <mnelson at redhat.com> wrote:

>
>
> On 10/03/2017 07:59 AM, Alex Gorbachev wrote:
> > Hi Sam,
> >
> > On Mon, Oct 2, 2017 at 6:01 PM Sam Huracan <nowitzki.sammy at gmail.com
> > <mailto:nowitzki.sammy at gmail.com>> wrote:
> >
> >     Anyone can help me?
> >
> >     On Oct 2, 2017 17:56, "Sam Huracan" <nowitzki.sammy at gmail.com
> >     <mailto:nowitzki.sammy at gmail.com>> wrote:
> >
> >         Hi,
> >
> >         I'm reading this document:
> >
> http://storageconference.us/2017/Presentations/CephObjectStore-slides.pdf
> >
> >         I have 3 questions:
> >
> >         1. BlueStore writes both data (to raw block device) and metadata
> >         (to RockDB) simultaneously, or sequentially?
> >
> >         2. From my opinion, performance of BlueStore can not compare to
> >         FileStore using SSD Journal, because performance of raw disk is
> >         less than using buffer. (this is buffer purpose). How do you
> think?
> >
> >         3.  Do setting Rock DB and Rock DB Wal in SSD only enhance
> >         write, read performance? or both?
> >
> >         Hope your answer,
> >
> >
> > I am researching the same thing, but recommend you look
> > at http://ceph.com/community/new-luminous-bluestore
> >
> > And also search for Bluestore cache to answer some questions.  My test
> > Luminous cluster so far is not as performant as I would like, but I have
> > not yet put a serious effort into tuning it, amd it does seem stable.
> >
> > Hth, Alex
>
> Hi Alex,
>
> If you see anything specific please let us know.  There are a couple of
> corner cases where bluestore is likely to be slower than filestore
> (specifically small sequential reads/writes with no client side cache or
> read ahead).  I've also seen some cases where filestore has higher read
> throughput potential (4MB seq reads with multiple NVMe drives per OSD
> node).  In many other cases bluestore is faster (and sometimes much
> faster) than filestore in our tests.  Writes in general tend to be
> faster and high volume object creation is much faster with much lower
> tail latencies (filestore really suffers in this test due to PG splitting).


I have two pretty well tuned filestore Jewel clusters running SATA HDDs on
dedicated hardware.  For the Luminous cluster, I wanted to do a POC on a
VMWare fully meshed (trendy moniker: hyperconverged) setup, using only
SSDs, Luminous and Bluestore.  Our workloads are unusual in that RBDs are
exported via iSCSI or NFS back to VMWare and consumed by e.g. Windows VMs
(we support heathcare and corporate business systems), or Linux VMs direct
from Ceph.

What I did so far is dedicate a hardware JBOD with an Areca HBA (you turned
me on to those a few years ago :) to each OSD VM. Using 6 Smartstorage SSD
OSDs per each OSD VM with 3 of these VMs total and 2x 20 Gb shared network
uplinks, I am getting about a third of performance of my hardware Jewel
cluster with 24 Lenovo enterprise SATA drives, measured as 4k block reads
and writes in single and 32 multiple streams.

Not apples to apples definitely, so I plan to play with Bluestore cache.
One question: does Bluestore distinguish between SSD and HDD based on CRUSH
class assignment?

I will check the effect of giving a lot of RAM and CPU cores to OSD VMs, as
well as increasing spindles and using different JBODs.

Thank you for reaching out.

Regards,
Alex


>
> Mark
>
> >
> >
> >
> >
> >     _______________________________________________
> >     ceph-users mailing list
> >     ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > --
> > --
> > Alex Gorbachev
> > Storcium
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users at lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-- 
--
Alex Gorbachev
Storcium
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171003/f4c94721/attachment.html>


More information about the ceph-users mailing list