[ceph-users] Filestore to Bluestore migration question

Hayashida, Mami mami.hayashida at uky.edu
Fri Nov 2 14:03:27 PDT 2018


I followed all the steps Hector suggested, and almost everything seems to
have worked fine.  I say "almost" because one out of the 10 osds I was
migrating could not be activated even though everything up to that point
worked just as well for that osd as the other ones. Here is the output for
that particular failure:

*****
ceph-volume lvm activate --all
...
--> Activating OSD ID 67 FSID 17cd6755-76f9-4160-906c-XXXXXX
Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-67
--> Absolute path not found for executable: restorecon
--> Ensure $PATH environment variable contains common executable locations
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev
/dev/hdd67/data67 --path /var/lib/ceph/osd/ceph-67
 stderr: failed to read label for /dev/hdd67/data67: (2) No such file or
directory
-->  RuntimeError: command returned non-zero exit status:

*******
I then checked to see if the rest of the migrated OSDs were back in by
calling the ceph osd tree command from the admin node.  Since they were
not, I tried to restart the first of the 10 newly migrated Bluestore osds
by calling

*******
systemctl start ceph-osd at 60

At that point, not only this particular service could not be started, but
ALL the OSDs (daemons) on the entire node shut down!!!!!

******
root at osd1:~# systemctl status ceph-osd at 60ceph-osd at 60.service - Ceph object storage daemon osd.60
   Loaded: loaded (/lib/systemd/system/ceph-osd at .service; enabled-runtime;
vendor preset: enabled)
   Active: inactive (dead) since Fri 2018-11-02 15:47:20 EDT; 1h 9min ago
  Process: 3473621 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id
%i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
  Process: 3473147 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
--cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 3473621 (code=exited, status=0/SUCCESS)

Oct 29 15:57:53 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-29
15:57:53.868856 7f68adaece00 -1 osd.60 48106 log_to_monitors {default=true}
Oct 29 15:57:53 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-29
15:57:53.874373 7f68adaece00 -1 osd.60 48106 mon_cmd_maybe_osd_create fail:
'you must complete the upgrade and 'ceph osd require-osd-release luminous'
before using crush device classes': (1) Operation not permitted
Oct 30 06:25:01 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-30
06:25:01.961720 7f687feb3700 -1 received  signal: Hangup from  PID: 3485955
task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
radosgw  UID: 0
Oct 31 06:25:02 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-31
06:25:02.110898 7f687feb3700 -1 received  signal: Hangup from  PID: 3500945
task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
radosgw  UID: 0
Nov 01 06:25:02 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-01
06:25:02.101548 7f687feb3700 -1 received  signal: Hangup from  PID: 3514774
task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
radosgw  UID: 0
Nov 02 06:25:02 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-02
06:25:01.997557 7f687feb3700 -1 received  signal: Hangup from  PID: 3528128
task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
radosgw  UID: 0
Nov 02 15:47:16 osd1.oxxxxx.uky.edu ceph-osd[3473621]: 2018-11-02
15:47:16.322229 7f687feb3700 -1 received  signal: Terminated from  PID: 1
task name: /lib/systemd/systemd --system --deserialize 20  UID: 0
Nov 02 15:47:16 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-02
15:47:16.322253 7f687feb3700 -1 osd.60 48504 *** Got signal Terminated ***
Nov 02 15:47:16 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-02
15:47:16.676625 7f687feb3700 -1 osd.60 48504 shutdown
Nov 02 16:34:05 osd1.oxxxxx.uky.edu systemd[1]: Stopped Ceph object storage
daemon osd.60.

**********
And ere is the output for one of the OSDs (osd.70 still using Filestore)
that shut down right when I tried to start osd.60

********

root at osd1:~# systemctl status ceph-osd at 70ceph-osd at 70.service - Ceph object storage daemon osd.70
   Loaded: loaded (/lib/systemd/system/ceph-osd at .service; enabled-runtime;
vendor preset: enabled)
   Active: inactive (dead) since Fri 2018-11-02 16:34:08 EDT; 2min 6s ago
  Process: 3473629 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id
%i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
  Process: 3473153 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
--cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 3473629 (code=exited, status=0/SUCCESS)

Oct 29 15:57:51 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-10-29
15:57:51.300563 7f530eec2e00 -1 osd.70 pg_epoch: 48095 pg[68.ces1( empty
local-lis/les=47489/47489 n=0 ec=6030/6030 lis/c 47488/47488 les/c/f
47489/47489/0 47485/47488/47488) [138,70,203]p138(0) r=1 lpr=0 crt=0'0
unknown NO
Oct 30 06:25:01 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-10-30
06:25:01.961743 7f52d8e44700 -1 received  signal: Hangup from  PID: 3485955
task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
radosgw  UID: 0
Oct 31 06:25:02 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-10-31
06:25:02.110920 7f52d8e44700 -1 received  signal: Hangup from  PID: 3500945
task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
radosgw  UID: 0
Nov 01 06:25:02 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-01
06:25:02.101568 7f52d8e44700 -1 received  signal: Hangup from  PID: 3514774
task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
radosgw  UID: 0
Nov 02 06:25:02 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02
06:25:01.997633 7f52d8e44700 -1 received  signal: Hangup from  PID: 3528128
task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
radosgw  UID: 0
Nov 02 16:34:05 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02
16:34:05.607714 7f52d8e44700 -1 received  signal: Terminated from  PID: 1
task name: /lib/systemd/systemd --system --deserialize 20  UID: 0
Nov 02 16:34:05 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02
16:34:05.607738 7f52d8e44700 -1 osd.70 48535 *** Got signal Terminated ***
Nov 02 16:34:05 osd1.xxxx.uky.edu systemd[1]: Stopping Ceph object storage
daemon osd.70...
Nov 02 16:34:05 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02
16:34:05.677348 7f52d8e44700 -1 osd.70 48535 shutdown
Nov 02 16:34:08 osd1.xxxx.uky.edu systemd[1]: Stopped Ceph object storage
daemon osd.70.

**************

So, at this point, ALL the OSDs on that node have been shut down.

For your information this is the output of lsblk command (selection)
*****
root at osd1:~# lsblk
NAME           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda              8:0    0 447.1G  0 disk
├─ssd0-db60    252:0    0    40G  0 lvm
├─ssd0-db61    252:1    0    40G  0 lvm
├─ssd0-db62    252:2    0    40G  0 lvm
├─ssd0-db63    252:3    0    40G  0 lvm
├─ssd0-db64    252:4    0    40G  0 lvm
├─ssd0-db65    252:5    0    40G  0 lvm
├─ssd0-db66    252:6    0    40G  0 lvm
├─ssd0-db67    252:7    0    40G  0 lvm
├─ssd0-db68    252:8    0    40G  0 lvm
└─ssd0-db69    252:9    0    40G  0 lvm
sdb              8:16   0 447.1G  0 disk
├─sdb1           8:17   0    40G  0 part
├─sdb2           8:18   0    40G  0 part

.....

sdh              8:112  0   3.7T  0 disk
└─hdd60-data60 252:10   0   3.7T  0 lvm
sdi              8:128  0   3.7T  0 disk
└─hdd61-data61 252:11   0   3.7T  0 lvm
sdj              8:144  0   3.7T  0 disk
└─hdd62-data62 252:12   0   3.7T  0 lvm
sdk              8:160  0   3.7T  0 disk
└─hdd63-data63 252:13   0   3.7T  0 lvm
sdl              8:176  0   3.7T  0 disk
└─hdd64-data64 252:14   0   3.7T  0 lvm
sdm              8:192  0   3.7T  0 disk
└─hdd65-data65 252:15   0   3.7T  0 lvm
sdn              8:208  0   3.7T  0 disk
└─hdd66-data66 252:16   0   3.7T  0 lvm
sdo              8:224  0   3.7T  0 disk
└─hdd67-data67 252:17   0   3.7T  0 lvm
sdp              8:240  0   3.7T  0 disk
└─hdd68-data68 252:18   0   3.7T  0 lvm
sdq             65:0    0   3.7T  0 disk
└─hdd69-data69 252:19   0   3.7T  0 lvm
sdr             65:16   0   3.7T  0 disk
└─sdr1          65:17   0   3.7T  0 part /var/lib/ceph/osd/ceph-70
.....

As a Ceph novice, I am totally clueless about the next step at this point.
Any help would be appreciated.

On Thu, Nov 1, 2018 at 3:16 PM, Hayashida, Mami <mami.hayashida at uky.edu>
wrote:

> Thank you, both of you.  I will try this out very soon.
>
> On Wed, Oct 31, 2018 at 8:48 AM, Alfredo Deza <adeza at redhat.com> wrote:
>
>> On Wed, Oct 31, 2018 at 8:28 AM Hayashida, Mami <mami.hayashida at uky.edu>
>> wrote:
>> >
>> > Thank you for your replies. So, if I use the method Hector suggested
>> (by creating PVs, VGs.... etc. first), can I add the --osd-id parameter to
>> the command as in
>> >
>> > ceph-volume lvm prepare --bluestore --data hdd0/data0 --block.db
>> ssd/db0  --osd-id 0
>> > ceph-volume lvm prepare --bluestore --data hdd1/data1 --block.db
>> ssd/db1  --osd-id 1
>> >
>> > so that Filestore -> Bluestore migration will not change the osd ID on
>> each disk?
>>
>> That looks correct.
>>
>> >
>> > And one more question.  Are there any changes I need to make to the
>> ceph.conf file?  I did comment out this line that was probably used for
>> creating Filestore (using ceph-deploy):  osd journal size = 40960
>>
>> Since you've pre-created the LVs the commented out line will not
>> affect anything.
>>
>> >
>> >
>> >
>> > On Wed, Oct 31, 2018 at 7:03 AM, Alfredo Deza <adeza at redhat.com> wrote:
>> >>
>> >> On Wed, Oct 31, 2018 at 5:22 AM Hector Martin <hector at marcansoft.com>
>> wrote:
>> >> >
>> >> > On 31/10/2018 05:55, Hayashida, Mami wrote:
>> >> > > I am relatively new to Ceph and need some advice on Bluestore
>> migration.
>> >> > > I tried migrating a few of our test cluster nodes from Filestore to
>> >> > > Bluestore by following this
>> >> > > (http://docs.ceph.com/docs/luminous/rados/operations/bluesto
>> re-migration/)
>> >> > > as the cluster is currently running 12.2.9. The cluster,
>> originally set
>> >> > > up by my predecessors, was running Jewel until I upgraded it
>> recently to
>> >> > > Luminous.
>> >> > >
>> >> > > OSDs in each OSD host is set up in such a way that for ever 10
>> data HDD
>> >> > > disks, there is one SSD drive that is holding their journals.  For
>> >> > > example, osd.0 data is on /dev/sdh and its Filestore journal is on
>> a
>> >> > > partitioned part of /dev/sda. So, lsblk shows something like
>> >> > >
>> >> > > sda       8:0    0 447.1G  0 disk
>> >> > > ├─sda1    8:1    0    40G  0 part # journal for osd.0
>> >> > >
>> >> > > sdh       8:112  0   3.7T  0 disk
>> >> > > └─sdh1    8:113  0   3.7T  0 part /var/lib/ceph/osd/ceph-0
>> >> > >
>> >> >
>> >> > The BlueStore documentation states that the wal will automatically
>> use
>> >> > the db volume if it fits, so if you're using a single SSD I think
>> >> > there's no good reason to split out the wal, if I'm understanding it
>> >> > correctly.
>> >>
>> >> This is correct, no need for wal in this case.
>> >>
>> >> >
>> >> > You should be using ceph-volume, since ceph-disk is deprecated. If
>> >> > you're sharing the SSD as wal/db for a bunch of OSDs, I think you're
>> >> > going to have to create the LVs yourself first. The data HDDs should
>> be
>> >> > PVs (I don't think it matters if they're partitions or whole disk
>> PVs as
>> >> > long as LVM discovers them) each part of a separate VG (e.g.
>> hdd0-hdd9)
>> >> > containing a single LV. Then the SSD should itself be an LV for a
>> >> > separate shared SSD VG (e.g. ssd).
>> >> >
>> >> > So something like (assuming sda is your wal SSD and sdb and onwards
>> are
>> >> > your OSD HDDs):
>> >> > pvcreate /dev/sda
>> >> > pvcreate /dev/sdb
>> >> > pvcreate /dev/sdc
>> >> > ...
>> >> >
>> >> > vgcreate ssd /dev/sda
>> >> > vgcreate hdd0 /dev/sdb
>> >> > vgcreate hdd1 /dev/sdc
>> >> > ...
>> >> >
>> >> > lvcreate -L 40G -n db0 ssd
>> >> > lvcreate -L 40G -n db1 ssd
>> >> > ...
>> >> >
>> >> > lvcreate -L 100%VG -n data0 hdd0
>> >> > lvcreate -L 100%VG -n data1 hdd1
>> >> > ...
>> >> >
>> >> > ceph-volume lvm prepare --bluestore --data hdd0/data0 --block.db
>> ssd/db0
>> >> > ceph-volume lvm prepare --bluestore --data hdd1/data1 --block.db
>> ssd/db1
>> >> > ...
>> >> >
>> >> > ceph-volume lvm activate --all
>> >> >
>> >> > I think it might be possible to just let ceph-volume create the
>> PV/VG/LV
>> >> > for the data disks and only manually create the DB LVs, but it
>> shouldn't
>> >> > hurt to do it on your own and just give ready-made LVs to ceph-volume
>> >> > for everything.
>> >>
>> >> Another alternative here is to use the new `lvm batch` subcommand to
>> >> do all of this in one go:
>> >>
>> >> ceph-volume lvm batch /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde
>> >> /dev/sdf /dev/sdg /dev/sdh
>> >>
>> >> Will detect that sda is an SSD and will create the LVs for you for
>> >> block.db (one for each spinning disk). For each spinning disk, it will
>> >> place data on them.
>> >>
>> >> The one caveat is that you no longer control OSD IDs, and they are
>> >> created with whatever the monitors are giving out.
>> >>
>> >> This operation is not supported from ceph-deploy either.
>> >> >
>> >> > --
>> >> > Hector Martin (hector at marcansoft.com)
>> >> > Public Key: https://marcan.st/marcan.asc
>> >> > _______________________________________________
>> >> > ceph-users mailing list
>> >> > ceph-users at lists.ceph.com
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> >
>> >
>> > --
>> > Mami Hayashida
>> > Research Computing Associate
>> >
>> > Research Computing Infrastructure
>> > University of Kentucky Information Technology Services
>> > 301 Rose Street | 102 James F. Hardymon Building
>> > Lexington, KY 40506-0495
>> > mami.hayashida at uky.edu
>> > (859)323-7521
>>
>
>
>
> --
> *Mami Hayashida*
>
> *Research Computing Associate*
> Research Computing Infrastructure
> University of Kentucky Information Technology Services
> 301 Rose Street | 102 James F. Hardymon Building
> Lexington, KY 40506-0495
> mami.hayashida at uky.edu
> (859)323-7521
>



-- 
*Mami Hayashida*

*Research Computing Associate*
Research Computing Infrastructure
University of Kentucky Information Technology Services
301 Rose Street | 102 James F. Hardymon Building
Lexington, KY 40506-0495
mami.hayashida at uky.edu
(859)323-7521
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181102/d4b18314/attachment.html>


More information about the ceph-users mailing list