[ceph-users] ceph-volume lvm batch OSD replacement
adeza at redhat.com
Tue Mar 19 04:00:44 PDT 2019
On Tue, Mar 19, 2019 at 6:47 AM Dan van der Ster <dan at vanderster.com> wrote:
> Hi all,
> We've just hit our first OSD replacement on a host created with
> `ceph-volume lvm batch` with mixed hdds+ssds.
> The hdd /dev/sdq was prepared like this:
> # ceph-volume lvm batch /dev/sd[m-r] /dev/sdac --yes
> Then /dev/sdq failed and was then zapped like this:
> # ceph-volume lvm zap /dev/sdq --destroy
> The zap removed the pv/vg/lv from sdq, but left behind the db on
> /dev/sdac (see P.S.)
That is correct behavior for the zap command used.
> Now we're replaced /dev/sdq and we're wondering how to proceed. We see
> two options:
> 1. reuse the existing db lv from osd.240 (Though the osd fsid will
> change when we re-create, right?)
This is possible but you are right that in the current state, the FSID
and other cluster data exist in the LV metadata. To reuse this LV for
a new (replaced) OSD
then you would need to zap the LV *without* the --destroy flag, which
would clear all metadata on the LV and do a wipefs. The command would
need the full path to
the LV associated with osd.240, something like:
ceph-volume lvm zap /dev/ceph-osd-lvs/db-lv-240
> 2. remove the db lv from sdac then run
> # ceph-volume lvm batch /dev/sdq /dev/sdac
> which should do the correct thing.
This would also work if the db lv is fully removed with --destroy
> This is all v12.2.11 btw.
> If (2) is the prefered approached, then it looks like a bug that the
> db lv was not destroyed by lvm zap --destroy.
Since /dev/sdq was passed in to zap, just that one device was removed,
so this is working as expected.
Alternatively, zap has the ability to destroy or zap LVs associated
with an OSD ID. I think this is not released yet for Luminous but
should be in the next release (which seems to be what you want)
> Once we sort this out, we'd be happy to contribute to the ceph-volume
> lvm batch doc.
> ===== osd.240 ======
> [ db] /dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
> type db
> osd id 240
> cluster fsid b4f463a0-c671-43a8-bd36-e40ab8d233d2
> cluster name ceph
> osd fsid d4d1fb15-a30a-4325-8628-706772ee4294
> db device
> encrypted 0
> db uuid iWWdyU-UhNu-b58z-ThSp-Bi3B-19iA-06iJIc
> cephx lockbox secret
> block uuid u4326A-Q8bH-afPb-y7Y6-ftNf-TE1X-vjunBd
> block device
> vdo 0
> crush device class None
> devices /dev/sdac
More information about the ceph-users