[ceph-users] [Bluestore] Some of my osd's uses BlueFS slow storage for db - why?

Nick Fisk nick at fisk.me.uk
Fri Feb 22 03:22:03 PST 2019

>On 2/16/19 12:33 AM, David Turner wrote:
>> The answer is probably going to be in how big your DB partition is vs 
>> how big your HDD disk is.  From your output it looks like you have a 
>> 6TB HDD with a 28GB Blocks.DB partition.  Even though the DB used 
>> size isn't currently full, I would guess that at some point since 
>> this OSD was created that it did fill up and what you're seeing is 
>> the part of the DB that spilled over to the data disk. This is why 
>> the official recommendation (that is quite cautious, but cautious 
>> because some use cases will use this up) for a blocks.db partition is 
>> 4% of the data drive.  For your 6TB disks that's a recommendation of 
>> 240GB per DB partition.  Of course the actual size of the DB needed 
>> is dependent on your use case.  But pretty much every use case for a 
>> 6TB disk needs a bigger partition than 28GB.
>My current db size of osd.33 is 7910457344 bytes, and osd.73 is
>2013265920+4685037568 bytes. 7544Mbyte (24.56% of db_total_bytes) vs
>6388Mbyte (6.69% of db_total_bytes).
>Why osd.33 is not used slow storage at this case?

Bluestore/RocksDB will only put the next level up size of DB on flash if the whole size will fit.
These sizes are roughly 3GB,30GB,300GB. Anything in-between those sizes are pointless. Only ~3GB of SSD will ever be used out of a
28GB partition. Likewise a 240GB partition is also pointless as only ~30GB will be used.

I'm currently running 30GB partitions on my cluster with a mix of 6,8,10TB disks. The 10TB's are about 75% full and use around 14GB,
this is on mainly 3x Replica RBD(4MB objects)


