[ceph-users] Sharing Bluestore WAL

Richard Hesketh richard.hesketh at rd.bbc.co.uk
Fri Nov 24 02:39:33 PST 2017


On 23/11/17 17:19, meike.talbach at women-at-work.org wrote:
> Hello,
> 
> in our preset Ceph cluster we used to have 12 HDD OSDs per host.
> All OSDs shared a common SSD for journaling.
> The SSD was used as root device and the 12 journals were files in the /usr/share directory, like this:
> 
> OSD 1 - data /dev/sda - journal /usr/share/sda
> OSD 2 - data /dev/sdb - journal /usr/share/sdb
> ...
> 
> We now want to migrate to Bluestore and continue to use this approach.
> I tried to use "ceph-deploy osd prepare test04:sdc --bluestore --block-db /var/local/sdc-block --block-wal /var/local/sdc-wal" to setup an OSD which essentially works.
> 
> However I'm wondering is this correct at all.
> And how can I make sure that the sdc-block and sdc-wal to not fill up the SSD disk.
> Is there any option to limit the file size and what are the recommended value of such an option?
> 
> Thank you
> 
> Meike

The maximum size of the WAL is dependent on cluster configuration values, but it will always be relatively small. There is no maximum DB size or, as it stands, good estimates for how large a DB may realistically grow. The expected behaviour is that if the DB outgrows its device it will spill over onto the data device. I don't believe there is any option that would let you effectively limit the size of files if you're using flat files to back your devices.

Using files for your DB/WAL is not recommended practice - you have the space problems that you mention and you'll also be suffering a performance hit by sticking a filesystem layer in the middle of things. Realistically, you should partition your SSD and provide entire partitions as the devices on which to store your OSD DBs. There is no point in specifying the WAL as a separate device unless you're doing something advanced; it will be stored alongside the DB on the DB device if not otherwise specified, and since you're putting them on the same device anyway you get no advantage to splitting them. With everything partitioned off correctly, you don't have to worry about Ceph data enroaching on your root FS space.

I would also worry that unless that one SSD is very large, 12 HDDs : 1 SSD could be overdoing it. Filestore journals sustained a lot of writing but didn't need to be very large, comparatively; Bluestore database w/ WAL is a lot lighter on the I/O but does need considerably more standing space since it's actually permanently storing metadata rather than just write journalling. If it's the case that you've only got a few GB of space you can spare for each DB, you're probably going to overgrow that very quickly and you won't see much actual benefit from using the SSD.

Rich

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171124/9ee6a785/attachment.sig>


More information about the ceph-users mailing list