[ceph-users] Bluestore OSD_DATA, WAL & DB

Mark Nelson mnelson at redhat.com
Fri Nov 3 06:22:59 PDT 2017

On 11/03/2017 04:08 AM, Jorge Pinilla López wrote:
> well I haven't found any recomendation either  but I think that
> sometimes the SSD space is being wasted.

If someone wanted to write it, you could have bluefs share some of the 
space on the drive for hot object data and release space as needed for 
the DB.  I'd very much recommend keeping the promotion rate incredibly low.

> I was thinking about making an OSD from the rest of my SSD space, but it
> wouldnt scale in case more speed is needed.

I think there's a temptation to try to shove more stuff on the SSD, but 
honestly I'm not sure it's a great idea.  These drives are already 
handling WAL and DB traffic, potentially for multiple OSDs.  If you have 
a very read centric workload or are using drives with high write 
endurance that's one thing.  From a monetary perspective, think 
carefully about how much drive endurance and mttf matter to you.

> Other option I asked was to use bcache or a mix between bcache and small
> DB partitions but I was only reply with corruption problems so I decided
> not to do it.
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021535.html
> I think a good idea would be to use the space needed to store the Hot DB
> and the rest use it as a cache (at least a read cache)

Given that bluestore is already storing all of the metadata in rocksdb, 
putting the DB partition on flash is already going to buy you a lot. 
Having said that, something that could let the DB and a cache 
share/reclaim space on the SSD could be interesting.  It won't be a cure 
all, but at least could provide a small improvement so long as the 
promotion overhead is kept very low.

> I dont really know a lot about this topic but I think that maybe giving
> 50GB of a really expensive SSD is pointless with its only using 10GB.

Think of it less as "space" and more of it as cells of write endurance. 
That's really what you are buying.  Whether that's a small drive with 
high write endurance or a big drive with low write endurance.  Some may 
have better properties for reads, some may have power-loss-protection 
that allows O_DSYNC writes to go much faster.  As far as the WAL and DB 
goes, it's all about how many writes you can get out of the drive before 
it goes kaput.

> El 02/11/2017 a las 21:45, Martin Overgaard Hansen escribió:
>> Hi, it seems like I’m in the same boat as everyone else in
>> this particular thread.
>> I’m also unable to find any guidelines or recommendations regarding
>> sizing of the wal and / or db.
>> I want to bring this subject back in the light and hope someone can
>> provide insight regarding the issue, thanks.
>> Best Regards,
>> Martin Overgaard Hansen
>> MultiHouse IT Partner A/S
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> --
> ------------------------------------------------------------------------
> *Jorge Pinilla López*
> jorpilo at unizar.es
> Estudiante de ingenieria informática
> Becario del area de sistemas (SICUZ)
> Universidad de Zaragoza
> PGP-KeyID: A34331932EBC715A
> <http://pgp.rediris.es:11371/pks/lookup?op=get&search=0xA34331932EBC715A>
> ------------------------------------------------------------------------
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

More information about the ceph-users mailing list