[ceph-users] Disk Down Emergency

Caspar Smit casparsmit at supernas.eu
Thu Nov 16 05:46:22 PST 2017


2017-11-16 14:43 GMT+01:00 Wido den Hollander <wido at 42on.com>:

>
> > Op 16 november 2017 om 14:40 schreef Georgios Dimitrakakis <
> giorgis at acmac.uoc.gr>:
> >
> >
> >  @Sean Redmond: No I don't have any unfound objects. I only have "stuck
> >  unclean" with "active+degraded" status
> >  @Caspar Smit: The cluster is scrubbing ...
> >
> >  @All: My concern is because of one copy left for the data on the failed
> >  disk.
> >
>
> Let the Ceph recovery do it's work. Don't do anything manually now.
>
>
@Wido, i think his cluster might have stopped recovering because of
non-optimal tunables in firefly.


> >  If I just remove the OSD.0 from crush map does that copy all its data
> >  from the only one available copy to the rest unaffected disks which will
> >  consequently end in having again two copies on two different hosts?
> >
>
> Do NOT copy the data from osd.0 to another OSD. Let the Ceph recovery
> handle this.
>
> It is already marked as out and within 24 hours or so recovery will have
> finished.
>
> But a few things:
>
> - Firefly 0.80.9 is old
> - Never, never, never run with size=2
>
> Not trying to scare you, but it's a reality.
>
> Now let Ceph handle the rebalance and wait.
>
> Wido
>
> >  Best,
> >
> >  G.
> >
> >
> > > 2017-11-16 14:05 GMT+01:00 Georgios Dimitrakakis :
> > >
> > >> Dear cephers,
> > >>
> > >> I have an emergency on a rather small ceph cluster.
> > >>
> > >> My cluster consists of 2 OSD nodes with 10 disks x4TB each and 3
> > >> monitor nodes.
> > >>
> > >> The version of ceph running is Firefly v.0.80.9
> > >> (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
> > >>
> > >> The cluster originally was build with "Replicated size=2" and "Min
> > >> size=1" with the attached crush map,
> > >> which in my understanding this replicates data across hosts.
> > >>
> > >> The emergency comes from the violation of the golden rule: "Never
> > >> use 2 replicas on a production cluster"
> > >>
> > >> Unfortunately the customers never really understood well the risk
> > >> and now that one disk is down I am in the middle and I must do
> > >> everything in my power not to loose any data, thus I am requesting
> > >> your assistance.
> > >>
> > >> Here is the output of
> > >>
> > >> $ ceph osd tree
> > >> # id    weight  type name       up/down reweight
> > >> -1      72.6    root default
> > >> -2      36.3            host store1
> > >> 0       3.63                    osd.0   down
> > >> 0       ---> DISK DOWN
> > >> 1       3.63                    osd.1   up
> > >> 1
> > >> 2       3.63                    osd.2   up
> > >> 1
> > >> 3       3.63                    osd.3   up
> > >> 1
> > >> 4       3.63                    osd.4   up
> > >> 1
> > >> 5       3.63                    osd.5   up
> > >> 1
> > >> 6       3.63                    osd.6   up
> > >> 1
> > >> 7       3.63                    osd.7   up
> > >> 1
> > >> 8       3.63                    osd.8   up
> > >> 1
> > >> 9       3.63                    osd.9   up
> > >> 1
> > >> -3      36.3            host store2
> > >> 10      3.63                    osd.10  up      1
> > >> 11      3.63                    osd.11  up      1
> > >> 12      3.63                    osd.12  up      1
> > >> 13      3.63                    osd.13  up      1
> > >> 14      3.63                    osd.14  up      1
> > >> 15      3.63                    osd.15  up      1
> > >> 16      3.63                    osd.16  up      1
> > >> 17      3.63                    osd.17  up      1
> > >> 18      3.63                    osd.18  up      1
> > >> 19      3.63                    osd.19  up      1
> > >>
> > >> and here is the status of the cluster
> > >>
> > >> # ceph health
> > >> HEALTH_WARN 497 pgs degraded; 549 pgs stuck unclean; recovery
> > >> 51916/2552684 objects degraded (2.034%)
> > >>
> > >> Althoug OSD.0 is shown as mounted it cannot be started (probably
> > >> failed disk controller problem)
> > >>
> > >> # df -h
> > >> Filesystem      Size  Used Avail Use% Mounted on
> > >> /dev/sda3       251G  4.1G  235G   2% /
> > >> tmpfs            24G     0   24G   0% /dev/shm
> > >> /dev/sda1       239M  100M  127M  44% /boot
> > >> /dev/sdj1       3.7T  223G  3.5T   6%
> > >> /var/lib/ceph/osd/ceph-8
> > >> /dev/sdh1       3.7T  205G  3.5T   6%
> > >> /var/lib/ceph/osd/ceph-6
> > >> /dev/sdg1       3.7T  199G  3.5T   6%
> > >> /var/lib/ceph/osd/ceph-5
> > >> /dev/sde1       3.7T  180G  3.5T   5%
> > >> /var/lib/ceph/osd/ceph-3
> > >> /dev/sdi1       3.7T  187G  3.5T   6%
> > >> /var/lib/ceph/osd/ceph-7
> > >> /dev/sdf1       3.7T  193G  3.5T   6%
> > >> /var/lib/ceph/osd/ceph-4
> > >> /dev/sdd1       3.7T  212G  3.5T   6%
> > >> /var/lib/ceph/osd/ceph-2
> > >> /dev/sdk1       3.7T  210G  3.5T   6%
> > >> /var/lib/ceph/osd/ceph-9
> > >> /dev/sdb1       3.7T  164G  3.5T   5%
> > >> /var/lib/ceph/osd/ceph-0    ---> This is the problematic OSD
> > >> /dev/sdc1       3.7T  183G  3.5T   5%
> > >> /var/lib/ceph/osd/ceph-1
> > >>
> > >> # service ceph start osd.0
> > >> find: `/var/lib/ceph/osd/ceph-0: Input/output error
> > >> /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines
> > >> mon.store1 osd.6 osd.9 osd.1 osd.4 osd.3 osd.2 osd.8 osd.5 osd.7
> > >> mds.store1 mon.store3, /var/lib/ceph defines mon.store1 osd.6 osd.9
> > >> osd.1 osd.4 osd.3 osd.2 osd.8 osd.5 osd.7 mds.store1)
> > >>
> > >> I have found this:
> > >>
> > >
> > > http://ceph.com/geen-categorie/admin-guide-
> replacing-a-failed-disk-in-a-ceph-cluster/
> > >> [1]
> > >>
> > >> and I am looking for your guidance in order to properly perform all
> > >> actions in order not to loose any data and keep the ones of the
> > >> second copy.
> > >
> > > What guidance are you looking for besides the steps to replace a
> > > failed disk (which you already found) ?
> > > If i look at your situation, there is nothing down in terms of
> > > availability of pgs, just a failed drive which needs to be replaced.
> > >
> > > Is the cluster still recovering? It should reach HEALTH_OK again
> > > after
> > > rebalancing the cluster when an OSD goes down.
> > >
> > > If it stopped recovering it might have to do with the ceph tunables
> > > which are not set to optimal by default on firefly and that prevents
> > > further rebalancing.
> > > WARNING: Dont just set tunables to optimal because it will trigger a
> > > massive rebalance!
> > >
> > > Perhaps the second golden rule is to never run a CEPH production
> > > cluster without knowing (and testing) how to replace a failed drive.
> > > (Im not trying to be harsh here).
> > >
> > > Kind regards,
> > > Caspar
> > >
> > >
> > >> Best regards,
> > >>
> > >> G.
> > >> _______________________________________________
> > >> ceph-users mailing list
> > >> ceph-users at lists.ceph.com [2]
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]
> > >
> > >
> > >
> > > Links:
> > > ------
> > > [1]
> > >
> > > http://ceph.com/geen-categorie/admin-guide-
> replacing-a-failed-disk-in-a-ceph-cluster/
> > > [2] mailto:ceph-users at lists.ceph.com
> > > [3] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > [4] mailto:giorgis at acmac.uoc.gr
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users at lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171116/ed9137b8/attachment.html>


More information about the ceph-users mailing list