[ceph-users] CRUSH rule for 3 replicas across 2 hosts

Robert LeBlanc robert at leblancnet.us
Mon Apr 20 11:02:27 PDT 2015


We have a similar issue, but we wanted three copies across two racks. Turns
out, that we increased size to 4 and left min_size at 2. We didn't want to
risk having less than two copies and if we only had thee copies, losing a
rack would block I/O. Once we expand to a third rack, we will adjust our
rule and go to size 3. Searching the mailing list and docs proved
difficult, so I'll include my rule so that you can use it as a basis. You
should be able to just change rack to host and host to osd. If you want to
keep only three copies, the "extra" OSD chosen just won't be used as
Gregory mentions. Technically this rule should have "max_size 4", but I
won't set a pool over 4 copies so I didn't change it here.

If anyone has a better way of writing this rule (or one that would work for
both a two rack and 3+ rack configuration as mentioned above), I'd be open
to it. This is the first rule that I've really wrote on my own.

rule replicated_ruleset {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step choose firstn 2 type rack
        step chooseleaf firstn 2 type host
        step emit
}



On Mon, Apr 20, 2015 at 11:50 AM, Gregory Farnum <greg at gregs42.com> wrote:

> On Mon, Apr 20, 2015 at 10:46 AM, Colin Corr <colin at pc-doctor.com> wrote:
> > Greetings Cephers,
> >
> > I have hit a bit of a wall between the available documentation and my
> understanding of it with regards to CRUSH rules. I am trying to determine
> if it is possible to replicate 3 copies across 2 hosts, such that if one
> host is completely lost, at least 1 copy is available. The problem I am
> experiencing is that if I enable my host_rule for a data pool, the cluster
> never gets back to a clean state. All pgs in a pool with this rule will be
> stuck unclean.
> >
> > This is the rule:
> >
> > rule host_rule {
> >         ruleset 2
> >         type replicated
> >         min_size 1
> >         max_size 10
> >         step take default
> >         step chooseleaf firstn 0 type host
> >         step emit
> > }
> >
> > And if its pertinent, all nodes are running 0.80.9 on Ubuntu 14.04. Pool
> pg/pgp set to 2048, replicas 3. Tunables set to optimal.
> >
> > I assume that is happening because of simple math: 3 copies on 2 hosts.
> And crush is expecting a 3rd host to balance everything out since I defined
> host based. This rule runs fine on another 3 host test cluster. So, it
> would seem that the potential solutions are to change replication to 2
> copies or add a 3rd OSD host. But, with all of the cool bucket types and
> rule options, I suspect I am missing something here. Alas, I am hoping
> there is some (not so obvious to me) CRUSH magic that could be applied here.
>
> It's actually pretty hacky: you configure your CRUSH rule to return
> two OSDs from each host, but set your size to 3. You'll want to test
> this carefully with your installed version to make sure that works,
> though — older CRUSH implementations would crash if you did that. :(
>
> In slightly more detail, you'll need to change it so that instead of
> using "chooseleaf" you "choose" 2 hosts, and then choose or chooseleaf
> 2 OSDs from each of those hosts. If you search the list archives for
> CRUSH threads you'll find some other discussions about doing precisely
> this, and I think the CRUSH documentation should cover the more
> general bits of how the language works.
> -Greg
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150420/ce1f299a/attachment.htm>


More information about the ceph-users mailing list