[ceph-users] bluestore - wal,db on faster devices?

Mark Nelson mnelson at redhat.com
Wed Nov 8 05:21:28 PST 2017

Hi Wolfgang,

In bluestore the WAL serves sort of a similar purpose to filestore's 
journal, but bluestore isn't dependent on it for guaranteeing durability 
of large writes.  With bluestore you can often get higher large-write 
throughput than with filestore when using HDD-only or flash-only OSDs.

Bluestore also stores allocation, object, and cluster metadata in the 
DB.  That, in combination with the way bluestore stores objects, 
dramatically improves behavior during certain workloads.  A big one is 
creating millions of small objects as quickly as possible.  In 
filestore, PG splitting has a huge impact on performance and tail 
latency.  Bluestore is much better just on HDD, and putting the DB and 
WAL on flash makes it better still since metadata no longer is a bottleneck.

Bluestore does have a couple of shortcomings vs filestore currently. 
The allocator is not as good as XFS's and can fragment more over time. 
There is no server-side readahead so small sequential read performance 
is very dependent on client-side readahead.  There's still a number of 
optimizations to various things ranging from threading and locking in 
the shardedopwq to pglog and dup_ops that potentially could improve 

I have a blog post that we've been working on that explores some of 
these things but I'm still waiting on review before I publish it.


On 11/08/2017 05:53 AM, Wolfgang Lendl wrote:
> Hello,
> it's clear to me getting a performance gain from putting the journal on
> a fast device (ssd,nvme) when using filestore backend.
> it's not when it comes to bluestore - are there any resources,
> performance test, etc. out there how a fast wal,db device impacts
> performance?
> br
> wolfgang

More information about the ceph-users mailing list