[ceph-users] A basic question on failure domain

Maged Mokhtar mmokhtar at petasan.org
Sat Oct 20 10:01:11 PDT 2018

On 20/10/18 05:28, Cody wrote:
> Hi folks,
> I have a rookie question. Does the number of the buckets chosen as the
> failure domain must be equal or greater than the number of replica (or
> k+m for erasure coding)?
> E.g., for an erasure code profile where k=4, m=2, failure domain=rack,
> does it only work when there are 6 or more racks in the CRUSH
> hierarchy? Or would it continue to iterate down the tree and
> eventually would work as long as there are 6 or more OSDs?
> Thank you very much.
> Best regards,
> Cody
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

The rule associated with the ec profile you mentioned, will indeed try 
to select 6 rack buckets then get an osd leaf from each. If you only had 
5 racks for example, it will return only 5 osds per PG, the pool will 
function but in degraded state (if pool min_size was 5). This rule will 
not return more that 1 osd per rack, if it did it will not achieving the 
failure domain you gave.
You can write a custom rule that uses 2 racks and select 3 hosts from 
each, and associate this with the k4 m2 pool, crush will not mind..it 
will do whatever you tell it, but if 1 rack fails your pool goes down, 
so would not be achieving  a failure domain at rack level unless you do 
have 6 or more racks.


More information about the ceph-users mailing list