[ceph-users] Erasure coded pools and ceph failure domain setup

Hector Martin hector at marcansoft.com
Mon Mar 4 00:37:23 PST 2019

On 02/03/2019 01:02, Ravi Patel wrote:
> Hello,
> My question is how crush distributes chunks throughout the cluster with 
> erasure coded pools. Currently, we have 4 OSD nodes with 36 drives(OSD 
> daemons) per node. If we use ceph_failire_domaon=host, then we are 
> necessarily limited to k=3,m=1, or k=2,m=2. We would like to explore 
> k>3, m>2 modes of coding but are unsure how the crush rule set will 
> distribute the chunks if we set the crush_failure_domain to OSD
> Ideally, we would like CRUSH to distribute the chunks hierarchically so 
> to spread them evenly across the nodes. For example, all chunks are on a 
> single node.
> Are chunks evenly spread by default? If not, how might we go about 
> configuring them?
You can write your own CRUSH rules to distribute chunks hierarchically. 
For example, you can have a k=6, m=2 code together with a rule that 
guarantees that each node gets two chunks. This means that if you lose a 
node you do not lose data (though depending on your min_size setting 
your pool might be unavailable at that point until you replace the node 
or add a new one and the chunks can be recovered). You would accomplish 
this with a rule that looks like this:

rule ec8 {
         id <some free id>
         type erasure
         min_size 7
         max_size 8
         step set_chooseleaf_tries 5
         step set_choose_tries 100
         step take default
         step choose indep 4 type host
         step chooseleaf indep 2 type osd
         step emit

This means the rule will first pick 4 hosts, then pick 2 OSDs per host, 
resulting in a total of 8 OSDs. This is appropriate for k=6 m=2 codes as 
well as k=5 m=2 codes (that will just leave one random OSD unused), 
hence min_size 7 max_size 8.

If you just set crush_failure_domain to OSD, then the rule will pick 
random OSDs without regard for the hosts; you will be able to use 
effectively any EC widths you want, but there will be no guarantees of 
data durability if you lose a whole host.

Hector Martin (hector at marcansoft.com)
Public Key: https://mrcn.st/pub

More information about the ceph-users mailing list