[ceph-users] A basic question on failure domain
mmokhtar at petasan.org
Sat Oct 20 10:01:11 PDT 2018
On 20/10/18 05:28, Cody wrote:
> Hi folks,
> I have a rookie question. Does the number of the buckets chosen as the
> failure domain must be equal or greater than the number of replica (or
> k+m for erasure coding)?
> E.g., for an erasure code profile where k=4, m=2, failure domain=rack,
> does it only work when there are 6 or more racks in the CRUSH
> hierarchy? Or would it continue to iterate down the tree and
> eventually would work as long as there are 6 or more OSDs?
> Thank you very much.
> Best regards,
> ceph-users mailing list
> ceph-users at lists.ceph.com
The rule associated with the ec profile you mentioned, will indeed try
to select 6 rack buckets then get an osd leaf from each. If you only had
5 racks for example, it will return only 5 osds per PG, the pool will
function but in degraded state (if pool min_size was 5). This rule will
not return more that 1 osd per rack, if it did it will not achieving the
failure domain you gave.
You can write a custom rule that uses 2 racks and select 3 hosts from
each, and associate this with the k4 m2 pool, crush will not mind..it
will do whatever you tell it, but if 1 rack fails your pool goes down,
so would not be achieving a failure domain at rack level unless you do
have 6 or more racks.
More information about the ceph-users