[ceph-users] ceph-volume lvm batch OSD replacement

Alfredo Deza adeza at redhat.com
Tue Mar 19 04:00:44 PDT 2019


On Tue, Mar 19, 2019 at 6:47 AM Dan van der Ster <dan at vanderster.com> wrote:
>
> Hi all,
>
> We've just hit our first OSD replacement on a host created with
> `ceph-volume lvm batch` with mixed hdds+ssds.
>
> The hdd /dev/sdq was prepared like this:
>    # ceph-volume lvm batch /dev/sd[m-r] /dev/sdac --yes
>
> Then /dev/sdq failed and was then zapped like this:
>   # ceph-volume lvm zap /dev/sdq --destroy
>
> The zap removed the pv/vg/lv from sdq, but left behind the db on
> /dev/sdac (see P.S.)

That is correct behavior for the zap command used.

>
> Now we're replaced /dev/sdq and we're wondering how to proceed. We see
> two options:
>   1. reuse the existing db lv from osd.240 (Though the osd fsid will
> change when we re-create, right?)

This is possible but you are right that in the current state, the FSID
and other cluster data exist in the LV metadata. To reuse this LV for
a new (replaced) OSD
then you would need to zap the LV *without* the --destroy flag, which
would clear all metadata on the LV and do a wipefs. The command would
need the full path to
the LV associated with osd.240, something like:

ceph-volume lvm zap /dev/ceph-osd-lvs/db-lv-240

>   2. remove the db lv from sdac then run
>         # ceph-volume lvm batch /dev/sdq /dev/sdac
>      which should do the correct thing.

This would also work if the db lv is fully removed with --destroy

>
> This is all v12.2.11 btw.
> If (2) is the prefered approached, then it looks like a bug that the
> db lv was not destroyed by lvm zap --destroy.

Since /dev/sdq was passed in to zap, just that one device was removed,
so this is working as expected.

Alternatively, zap has the ability to destroy or zap LVs associated
with an OSD ID. I think this is not released yet for Luminous but
should be in the next release (which seems to be what you want)

>
> Once we sort this out, we'd be happy to contribute to the ceph-volume
> lvm batch doc.
>
> Thanks!
>
> Dan
>
> P.S:
>
> ===== osd.240 ======
>
>   [  db]    /dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
>
>       type                      db
>       osd id                    240
>       cluster fsid              b4f463a0-c671-43a8-bd36-e40ab8d233d2
>       cluster name              ceph
>       osd fsid                  d4d1fb15-a30a-4325-8628-706772ee4294
>       db device
> /dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
>       encrypted                 0
>       db uuid                   iWWdyU-UhNu-b58z-ThSp-Bi3B-19iA-06iJIc
>       cephx lockbox secret
>       block uuid                u4326A-Q8bH-afPb-y7Y6-ftNf-TE1X-vjunBd
>       block device
> /dev/ceph-f78ff8a3-803d-4b6d-823b-260b301109ac/osd-data-9e4bf34d-1aa3-4c0a-9655-5dba52dcfcd7
>       vdo                       0
>       crush device class        None
>       devices                   /dev/sdac


More information about the ceph-users mailing list