[ceph-users] bluestore - wal,db on faster devices?
mnelson at redhat.com
Wed Nov 8 11:45:46 PST 2017
You've got the right idea. RBD is probably going to benefit less since
you have a small number of large objects and little extra OMAP data.
Having the allocation and object metadata on flash certainly shouldn't
hurt, and you should still have less overhead for small (<64k) writes.
With RGW however you also have to worry about bucket index updates
during writes and that's a big potential bottleneck that you don't need
to worry about with RBD.
On 11/08/2017 01:01 PM, Wolfgang Lendl wrote:
> Hi Mark,
> thanks for your reply!
> I'm a big fan of keeping things simple - this means that there has to be
> a very good reason to put the WAL and DB on a separate device otherwise
> I'll keep it collocated (and simpler).
> as far as I understood - putting the WAL,DB on a faster (than hdd)
> device makes more sense in cephfs and rgw environments (more metadata) -
> and less sense in rbd environments - correct?
> On 11/08/2017 02:21 PM, Mark Nelson wrote:
>> Hi Wolfgang,
>> In bluestore the WAL serves sort of a similar purpose to filestore's
>> journal, but bluestore isn't dependent on it for guaranteeing
>> durability of large writes. With bluestore you can often get higher
>> large-write throughput than with filestore when using HDD-only or
>> flash-only OSDs.
>> Bluestore also stores allocation, object, and cluster metadata in the
>> DB. That, in combination with the way bluestore stores objects,
>> dramatically improves behavior during certain workloads. A big one is
>> creating millions of small objects as quickly as possible. In
>> filestore, PG splitting has a huge impact on performance and tail
>> latency. Bluestore is much better just on HDD, and putting the DB and
>> WAL on flash makes it better still since metadata no longer is a
>> Bluestore does have a couple of shortcomings vs filestore currently.
>> The allocator is not as good as XFS's and can fragment more over time.
>> There is no server-side readahead so small sequential read performance
>> is very dependent on client-side readahead. There's still a number of
>> optimizations to various things ranging from threading and locking in
>> the shardedopwq to pglog and dup_ops that potentially could improve
>> I have a blog post that we've been working on that explores some of
>> these things but I'm still waiting on review before I publish it.
>> On 11/08/2017 05:53 AM, Wolfgang Lendl wrote:
>>> it's clear to me getting a performance gain from putting the journal on
>>> a fast device (ssd,nvme) when using filestore backend.
>>> it's not when it comes to bluestore - are there any resources,
>>> performance test, etc. out there how a fast wal,db device impacts
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
More information about the ceph-users