[ceph-users] Bluestore vs. Filestore

Sage Weil sage at newdream.net
Wed Oct 3 13:23:14 PDT 2018

On Tue, 2 Oct 2018, jesper at krogh.cc wrote:
> Hi.
> Based on some recommendations we have setup our CephFS installation using
> bluestore*. We're trying to get a strong replacement for "huge" xfs+NFS
> server - 100TB-ish size.
> Current setup is - a sizeable Linux host with 512GB of memory - one large
> Dell MD1200 or MD1220 - 100TB + a Linux kernel NFS server.
> Since our "hot" dataset is < 400GB we can actually serve the hot data
> directly out of the host page-cache and never really touch the "slow"
> underlying drives. Except when new bulk data are written where a Perc with
> BBWC is consuming the data.
> In the CephFS + Bluestore world, Ceph is "deliberatly" bypassing the host
> OS page-cache, so even when we have 4-5 x 256GB memory** in the OSD hosts
> it is really hard to create a synthetic test where they hot data does not
> end up being read out of the underlying disks. Yes, the
> client side page cache works very well, but in our scenario we have 30+
> hosts pulling the same data over NFS.
> Is bluestore just a "bad fit" .. Filestore "should" do the right thing? Is
> the recommendation to make an SSD "overlay" on the slow drives?
> Thoughts?

1. This sounds like it is primarily a matter of configuring the bluestore 
cache size.  This is the main downside of bluestore: it doesn't magically 
use any available RAM as a cache (like the OS page cache).

2. There are two other important options that control bluestore cache 

 bluestore_default_buffered_read (default true)
 bluestore_default_buffered_write (default false)

Given your description it sounds like the default is fine: newly written 
data won't land it cache, but once it is read it will be there.  If you 
want recent writes to land it cache you can change the second option to 

3. Because we don't ues the page cache, an OSD restart also drops the 
cache, so be sure to allow things to warm up after a restart before 
drawing conclusions about steady-state performance.

Hope that helps!

More information about the ceph-users mailing list