[ceph-users] Ceph block storage - block.db useless?

vitalif at yourcmc.ru vitalif at yourcmc.ru
Tue Mar 12 05:46:22 PDT 2019

block.db is very unlikely to ever grow to 250GB with a 6TB data device.

However, there seems to be a funny "issue" with all block.db sizes 
except 4, 30, and 286 GB being useless, because RocksDB puts the data on 
the fast storage only if it thinks the whole LSM level will fit there. 
Ceph's RocksDB options set WAL to 1GB and leave the default 
max_bytes_for_level_base unchanged so it's 256MB. Multiplier is also 
left at 10. So WAL=1GB, L1=256MB, L2=2560MB, L3=25600MB. So RocksDB will 
put L2 to the block.db only if the block.db's size exceeds 
1GB+256MB+2560MB (which rounds up to 4GB), and it will put L3 to the 
block.db only if its size exceeds 1GB+256MB+2560MB+25600MB = almost 

> Hello,
> i was wondering about ceph block.db to be nearly empty and I started
> to investigate.
> The recommendations from ceph are that block.db should be at least
> 4% the size of block. So my OSD configuration looks like this:
> wal.db   - not explicit specified
> block.db - 250GB of SSD storage
> block    - 6TB
> Since wal is written to block.db if not available i didn't configured
> wal. With the size of 250GB we are slightly above 4%.
> So everything should be "fine". But the block.db only contains
> about 10GB of data.
> If figured out that an object in block.db gets "amplified" so
> the space consumption is much higher than the object itself
> would need.
> I'm using ceph as storage backend for openstack and raw images
> with a size of 10GB and more are common. So if i understand
> this correct i have to consider that a 10GB images may
> consume 100GB of block.db.
> Beside the facts that the image may have a size of 100G and
> they are only used for initial reads unitl all changed
> blocks gets written to a SSD-only pool i was question me
> if i need a block.db and if it would be better to
> save the amount of SSD space used for block.db and just
> create a 10GB wal.db?
> Has anyone done this before? Anyone who had sufficient SSD space
> but stick with wal.db to save SSD space?
> If i'm correct the block.db will never be used for huge images.
> And even though it may be used for one or two images does this make
> sense? The images are used initially to read all unchanged blocks from
> it. After a while each VM should access the images pool less and
> less due to the changes made in the VM.
> Any thoughts about this?
> Best regards
> --
> Benjamin Zapiec <benjamin.zapiec at gonicus.de> (System Engineer)
> * GONICUS GmbH * Moehnestrasse 55 (Kaiserhaus) * D-59755 Arnsberg
> * Tel.: +49 2932 916-0 * Fax: +49 2932 916-245
> * http://www.GONICUS.de
> * Sitz der Gesellschaft: Moehnestrasse 55 * D-59755 Arnsberg
> * Geschaeftsfuehrer: Rainer Luelsdorf, Alfred Schroeder
> * Vorsitzender des Beirats: Juergen Michels
> * Amtsgericht Arnsberg * HRB 1968
> Wir erfüllen unsere Informationspflichten zum Datenschutz gem. der
> Artikel 13
> und 14 DS-GVO durch Veröffentlichung auf unserer Internetseite unter:
> https://www.gonicus.de/datenschutz oder durch Zusendung auf Ihre
> formlose Anfrage.
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

More information about the ceph-users mailing list