[ceph-users] Filestore to Bluestore migration question

Alfredo Deza adeza at redhat.com
Wed Oct 31 04:03:22 PDT 2018


On Wed, Oct 31, 2018 at 5:22 AM Hector Martin <hector at marcansoft.com> wrote:
>
> On 31/10/2018 05:55, Hayashida, Mami wrote:
> > I am relatively new to Ceph and need some advice on Bluestore migration.
> > I tried migrating a few of our test cluster nodes from Filestore to
> > Bluestore by following this
> > (http://docs.ceph.com/docs/luminous/rados/operations/bluestore-migration/)
> > as the cluster is currently running 12.2.9. The cluster, originally set
> > up by my predecessors, was running Jewel until I upgraded it recently to
> > Luminous.
> >
> > OSDs in each OSD host is set up in such a way that for ever 10 data HDD
> > disks, there is one SSD drive that is holding their journals.  For
> > example, osd.0 data is on /dev/sdh and its Filestore journal is on a
> > partitioned part of /dev/sda. So, lsblk shows something like
> >
> > sda       8:0    0 447.1G  0 disk
> > ├─sda1    8:1    0    40G  0 part # journal for osd.0
> >
> > sdh       8:112  0   3.7T  0 disk
> > └─sdh1    8:113  0   3.7T  0 part /var/lib/ceph/osd/ceph-0
> >
>
> The BlueStore documentation states that the wal will automatically use
> the db volume if it fits, so if you're using a single SSD I think
> there's no good reason to split out the wal, if I'm understanding it
> correctly.

This is correct, no need for wal in this case.

>
> You should be using ceph-volume, since ceph-disk is deprecated. If
> you're sharing the SSD as wal/db for a bunch of OSDs, I think you're
> going to have to create the LVs yourself first. The data HDDs should be
> PVs (I don't think it matters if they're partitions or whole disk PVs as
> long as LVM discovers them) each part of a separate VG (e.g. hdd0-hdd9)
> containing a single LV. Then the SSD should itself be an LV for a
> separate shared SSD VG (e.g. ssd).
>
> So something like (assuming sda is your wal SSD and sdb and onwards are
> your OSD HDDs):
> pvcreate /dev/sda
> pvcreate /dev/sdb
> pvcreate /dev/sdc
> ...
>
> vgcreate ssd /dev/sda
> vgcreate hdd0 /dev/sdb
> vgcreate hdd1 /dev/sdc
> ...
>
> lvcreate -L 40G -n db0 ssd
> lvcreate -L 40G -n db1 ssd
> ...
>
> lvcreate -L 100%VG -n data0 hdd0
> lvcreate -L 100%VG -n data1 hdd1
> ...
>
> ceph-volume lvm prepare --bluestore --data hdd0/data0 --block.db ssd/db0
> ceph-volume lvm prepare --bluestore --data hdd1/data1 --block.db ssd/db1
> ...
>
> ceph-volume lvm activate --all
>
> I think it might be possible to just let ceph-volume create the PV/VG/LV
> for the data disks and only manually create the DB LVs, but it shouldn't
> hurt to do it on your own and just give ready-made LVs to ceph-volume
> for everything.

Another alternative here is to use the new `lvm batch` subcommand to
do all of this in one go:

ceph-volume lvm batch /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde
/dev/sdf /dev/sdg /dev/sdh

Will detect that sda is an SSD and will create the LVs for you for
block.db (one for each spinning disk). For each spinning disk, it will
place data on them.

The one caveat is that you no longer control OSD IDs, and they are
created with whatever the monitors are giving out.

This operation is not supported from ceph-deploy either.
>
> --
> Hector Martin (hector at marcansoft.com)
> Public Key: https://marcan.st/marcan.asc
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


More information about the ceph-users mailing list