[ceph-users] killing ceph-disk [was Re: ceph-volume: migration and disk partition support]
mv3 at sanger.ac.uk
Thu Oct 12 04:39:39 PDT 2017
On 09/10/17 16:09, Sage Weil wrote:
> To put this in context, the goal here is to kill ceph-disk in mimic.
> One proposal is to make it so new OSDs can *only* be deployed with LVM,
> and old OSDs with the ceph-disk GPT partitions would be started via
> ceph-volume support that can only start (but not deploy new) OSDs in that
> Is the LVM-only-ness concerning to anyone?
> Looking further forward, NVMe OSDs will probably be handled a bit
> differently, as they'll eventually be using SPDK and kernel-bypass (hence,
> no LVM). For the time being, though, they would use LVM.
This seems the best point to jump in on this thread. We have a ceph
(Jewel / Ubuntu 16.04) cluster with around 3k OSDs, deployed with
ceph-ansible. They are plain-disk OSDs with journal on NVME partitions.
I don't think this is an unusual configuration :)
I think to get rid of ceph-disk, we would want at least some of the
* solid scripting for "move slowly through cluster migrating OSDs from
disk to lvm" - 1 OSD at a time isn't going to produce unacceptable
rebalance load, but it is going to take a long time, so such scripting
would have to cope with being stopped and restarted and suchlike (and be
able to use the correct journal partitions)
* ceph-ansible support for "some lvm, some plain disk" arrangements -
presuming a "create new OSDs as lvm" approach when adding new OSDs or
replacing failed disks
* support for plain disk (regardless of what provides it) that remains
solid for some time yet
> On Fri, 6 Oct 2017, Alfredo Deza wrote:
>> Bluestore support should be the next step for `ceph-volume lvm`, and
>> while that is planned we are thinking of ways to improve the current
>> caveats (like OSDs not coming up) for clusters that have deployed OSDs
>> with ceph-disk.
These issues seem mostly to be down to timeouts being too short and the
single global lock for activating OSDs.
> IMO we can't require any kind of data migration in order to upgrade, which
> means we either have to (1) keep ceph-disk around indefinitely, or (2)
> teach ceph-volume to start existing GPT-style OSDs. Given all of the
> flakiness around udev, I'm partial to #2. The big question for me is
> whether #2 alone is sufficient, or whether ceph-volume should also know
> how to provision new OSDs using partitions and no LVM. Hopefully not?
I think this depends on how well tools such as ceph-ansible can cope
with mixed OSD types (my feeling at the moment is "not terribly well",
but I may be being unfair).
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the ceph-users