[ceph-users] [EXTERNAL] Re: 2x replication: A BIG warning
Wido den Hollander
wido at 42on.com
Wed Dec 7 12:10:27 PST 2016
> Op 7 december 2016 om 21:04 schreef "Will.Boege" <Will.Boege at target.com>:
> Hi Wido,
> Just curious how blocking IO to the final replica provides protection from data loss? I’ve never really understood why this is a Ceph best practice. In my head all 3 replicas would be on devices that have roughly the same odds of physically failing or getting logically corrupted in any given minute. Not sure how blocking IO prevents this.
Say, disk #1 fails and you have #2 and #3 left. Now #2 fails leaving only #3 left.
By block you know that #2 and #3 still have the same data. Although #2 failed it could be that it is the host which went down but the disk itself is just fine. Maybe the SATA cable broke, you never know.
If disk #3 now fails you can still continue your operation if you bring #2 back. It has the same data on disk as #3 had before it failed. Since you didn't allow for any I/O on #3 when #2 went down earlier.
If you would have accepted writes on #3 while #1 and #2 were gone you have invalid/old data on #2 by the time it comes back.
Writes were made on #3 but that one really broke down. You managed to get #2 back, but it doesn't have the changes which #3 had.
The result is corrupted data.
Does this make sense?
> On 12/7/16, 9:11 AM, "ceph-users on behalf of LOIC DEVULDER" <ceph-users-bounces at lists.ceph.com on behalf of loic.devulder at mpsa.com> wrote:
> > -----Message d'origine-----
> > De : Wido den Hollander [mailto:wido at 42on.com]
> > Envoyé : mercredi 7 décembre 2016 16:01
> > À : ceph-users at ceph.com; LOIC DEVULDER - U329683 <loic.devulder at mpsa.com>
> > Objet : RE: [ceph-users] 2x replication: A BIG warning
> > > Op 7 december 2016 om 15:54 schreef LOIC DEVULDER
> > <loic.devulder at mpsa.com>:
> > >
> > >
> > > Hi Wido,
> > >
> > > > As a Ceph consultant I get numerous calls throughout the year to
> > > > help people with getting their broken Ceph clusters back online.
> > > >
> > > > The causes of downtime vary vastly, but one of the biggest causes is
> > > > that people use replication 2x. size = 2, min_size = 1.
> > >
> > > We are building a Ceph cluster for our OpenStack and for data integrity
> > reasons we have chosen to set size=3. But we want to continue to access
> > data if 2 of our 3 osd server are dead, so we decided to set min_size=1.
> > >
> > > Is it a (very) bad idea?
> > >
> > I would say so. Yes, downtime is annoying on your cloud, but data loss if
> > even worse, much more worse.
> > I would always run with min_size = 2 and manually switch to min_size = 1
> > if the situation really requires it at that moment.
> > Loosing two disks at the same time is something which doesn't happen that
> > much, but if it happens you don't want to modify any data on the only copy
> > which you still have left.
> > Setting min_size to 1 should be a manual action imho when size = 3 and you
> > loose two copies. In that case YOU decide at that moment if it is the
> > right course of action.
> > Wido
> Thanks for your quick response!
> That's make sense, I will try to convince my colleagues :-)
> ceph-users mailing list
> ceph-users at lists.ceph.com
More information about the ceph-users