[ceph-users] killing ceph-disk [was Re: ceph-volume: migration and disk partition support]

Matthew Vernon mv3 at sanger.ac.uk
Thu Oct 12 04:39:39 PDT 2017


On 09/10/17 16:09, Sage Weil wrote:
> To put this in context, the goal here is to kill ceph-disk in mimic.
> One proposal is to make it so new OSDs can *only* be deployed with LVM,
> and old OSDs with the ceph-disk GPT partitions would be started via
> ceph-volume support that can only start (but not deploy new) OSDs in that
> style.
> Is the LVM-only-ness concerning to anyone?
> Looking further forward, NVMe OSDs will probably be handled a bit
> differently, as they'll eventually be using SPDK and kernel-bypass (hence,
> no LVM).  For the time being, though, they would use LVM.

This seems the best point to jump in on this thread. We have a ceph 
(Jewel / Ubuntu 16.04) cluster with around 3k OSDs, deployed with 
ceph-ansible. They are plain-disk OSDs with journal on NVME partitions. 
I don't think this is an unusual configuration :)

I think to get rid of ceph-disk, we would want at least some of the 

* solid scripting for "move slowly through cluster migrating OSDs from 
disk to lvm" - 1 OSD at a time isn't going to produce unacceptable 
rebalance load, but it is going to take a long time, so such scripting 
would have to cope with being stopped and restarted and suchlike (and be 
able to use the correct journal partitions)

* ceph-ansible support for "some lvm, some plain disk" arrangements - 
presuming a "create new OSDs as lvm" approach when adding new OSDs or 
replacing failed disks

* support for plain disk (regardless of what provides it) that remains 
solid for some time yet

> On Fri, 6 Oct 2017, Alfredo Deza wrote:

>> Bluestore support should be the next step for `ceph-volume lvm`, and
>> while that is planned we are thinking of ways to improve the current
>> caveats (like OSDs not coming up) for clusters that have deployed OSDs
>> with ceph-disk.

These issues seem mostly to be down to timeouts being too short and the 
single global lock for activating OSDs.

> IMO we can't require any kind of data migration in order to upgrade, which
> means we either have to (1) keep ceph-disk around indefinitely, or (2)
> teach ceph-volume to start existing GPT-style OSDs.  Given all of the
> flakiness around udev, I'm partial to #2.  The big question for me is
> whether #2 alone is sufficient, or whether ceph-volume should also know
> how to provision new OSDs using partitions and no LVM.  Hopefully not?

I think this depends on how well tools such as ceph-ansible can cope 
with mixed OSD types (my feeling at the moment is "not terribly well", 
but I may be being unfair).



 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

More information about the ceph-users mailing list