[ceph-users] Disk Down Emergency

Georgios Dimitrakakis giorgis at acmac.uoc.gr
Thu Nov 16 06:26:25 PST 2017


 Thank you all for your time and support.

 I don't see any backfilling in the logs and the number of 
 "active+degraded" as well as "active+remapped" and "active+clean" 
 objects is the same for some time now. The only thing I see is 
 "scrubbing".

 Wido, I cannot do anything with the data in osd.0 since although the 
 failed disk seems mounted I cannot see anything and I am getting an 
 "Input/output" error.

 So I guess the right action for now is to remove the OSD by issuing 
 "ceph osd crush remove  osd.0" as Sean suggested, correct?

 G.


>> Op 16 november 2017 om 14:46 schreef Caspar Smit 
>> <casparsmit at supernas.eu>:
>>
>>
>> 2017-11-16 14:43 GMT+01:00 Wido den Hollander <wido at 42on.com>:
>>
>> >
>> > > Op 16 november 2017 om 14:40 schreef Georgios Dimitrakakis <
>> > giorgis at acmac.uoc.gr>:
>> > >
>> > >
>> > >  @Sean Redmond: No I don't have any unfound objects. I only have 
>> "stuck
>> > >  unclean" with "active+degraded" status
>> > >  @Caspar Smit: The cluster is scrubbing ...
>> > >
>> > >  @All: My concern is because of one copy left for the data on 
>> the failed
>> > >  disk.
>> > >
>> >
>> > Let the Ceph recovery do it's work. Don't do anything manually 
>> now.
>> >
>> >
>> @Wido, i think his cluster might have stopped recovering because of
>> non-optimal tunables in firefly.
>>
>
> Ah, darn. Yes, that's been a long time ago. Could very well be the 
> case.
>
> He could try to remove osd.0 from the CRUSHMap and let recovery 
> progress.
>
> I would however advise him not to fiddle with the data on osd.0. Do
> not try to copy the data somewhere else and try to fix the OSD.
>
> Wido
>
>>
>> > >  If I just remove the OSD.0 from crush map does that copy all 
>> its data
>> > >  from the only one available copy to the rest unaffected disks 
>> which will
>> > >  consequently end in having again two copies on two different 
>> hosts?
>> > >
>> >
>> > Do NOT copy the data from osd.0 to another OSD. Let the Ceph 
>> recovery
>> > handle this.
>> >
>> > It is already marked as out and within 24 hours or so recovery 
>> will have
>> > finished.
>> >
>> > But a few things:
>> >
>> > - Firefly 0.80.9 is old
>> > - Never, never, never run with size=2
>> >
>> > Not trying to scare you, but it's a reality.
>> >
>> > Now let Ceph handle the rebalance and wait.
>> >
>> > Wido
>> >
>> > >  Best,
>> > >
>> > >  G.
>> > >
>> > >
>> > > > 2017-11-16 14:05 GMT+01:00 Georgios Dimitrakakis :
>> > > >
>> > > >> Dear cephers,
>> > > >>
>> > > >> I have an emergency on a rather small ceph cluster.
>> > > >>
>> > > >> My cluster consists of 2 OSD nodes with 10 disks x4TB each 
>> and 3
>> > > >> monitor nodes.
>> > > >>
>> > > >> The version of ceph running is Firefly v.0.80.9
>> > > >> (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
>> > > >>
>> > > >> The cluster originally was build with "Replicated size=2" and 
>> "Min
>> > > >> size=1" with the attached crush map,
>> > > >> which in my understanding this replicates data across hosts.
>> > > >>
>> > > >> The emergency comes from the violation of the golden rule: 
>> "Never
>> > > >> use 2 replicas on a production cluster"
>> > > >>
>> > > >> Unfortunately the customers never really understood well the 
>> risk
>> > > >> and now that one disk is down I am in the middle and I must 
>> do
>> > > >> everything in my power not to loose any data, thus I am 
>> requesting
>> > > >> your assistance.
>> > > >>
>> > > >> Here is the output of
>> > > >>
>> > > >> $ ceph osd tree
>> > > >> # id    weight  type name       up/down reweight
>> > > >> -1      72.6    root default
>> > > >> -2      36.3            host store1
>> > > >> 0       3.63                    osd.0   down
>> > > >> 0       ---> DISK DOWN
>> > > >> 1       3.63                    osd.1   up
>> > > >> 1
>> > > >> 2       3.63                    osd.2   up
>> > > >> 1
>> > > >> 3       3.63                    osd.3   up
>> > > >> 1
>> > > >> 4       3.63                    osd.4   up
>> > > >> 1
>> > > >> 5       3.63                    osd.5   up
>> > > >> 1
>> > > >> 6       3.63                    osd.6   up
>> > > >> 1
>> > > >> 7       3.63                    osd.7   up
>> > > >> 1
>> > > >> 8       3.63                    osd.8   up
>> > > >> 1
>> > > >> 9       3.63                    osd.9   up
>> > > >> 1
>> > > >> -3      36.3            host store2
>> > > >> 10      3.63                    osd.10  up      1
>> > > >> 11      3.63                    osd.11  up      1
>> > > >> 12      3.63                    osd.12  up      1
>> > > >> 13      3.63                    osd.13  up      1
>> > > >> 14      3.63                    osd.14  up      1
>> > > >> 15      3.63                    osd.15  up      1
>> > > >> 16      3.63                    osd.16  up      1
>> > > >> 17      3.63                    osd.17  up      1
>> > > >> 18      3.63                    osd.18  up      1
>> > > >> 19      3.63                    osd.19  up      1
>> > > >>
>> > > >> and here is the status of the cluster
>> > > >>
>> > > >> # ceph health
>> > > >> HEALTH_WARN 497 pgs degraded; 549 pgs stuck unclean; recovery
>> > > >> 51916/2552684 objects degraded (2.034%)
>> > > >>
>> > > >> Althoug OSD.0 is shown as mounted it cannot be started 
>> (probably
>> > > >> failed disk controller problem)
>> > > >>
>> > > >> # df -h
>> > > >> Filesystem      Size  Used Avail Use% Mounted on
>> > > >> /dev/sda3       251G  4.1G  235G   2% /
>> > > >> tmpfs            24G     0   24G   0% /dev/shm
>> > > >> /dev/sda1       239M  100M  127M  44% /boot
>> > > >> /dev/sdj1       3.7T  223G  3.5T   6%
>> > > >> /var/lib/ceph/osd/ceph-8
>> > > >> /dev/sdh1       3.7T  205G  3.5T   6%
>> > > >> /var/lib/ceph/osd/ceph-6
>> > > >> /dev/sdg1       3.7T  199G  3.5T   6%
>> > > >> /var/lib/ceph/osd/ceph-5
>> > > >> /dev/sde1       3.7T  180G  3.5T   5%
>> > > >> /var/lib/ceph/osd/ceph-3
>> > > >> /dev/sdi1       3.7T  187G  3.5T   6%
>> > > >> /var/lib/ceph/osd/ceph-7
>> > > >> /dev/sdf1       3.7T  193G  3.5T   6%
>> > > >> /var/lib/ceph/osd/ceph-4
>> > > >> /dev/sdd1       3.7T  212G  3.5T   6%
>> > > >> /var/lib/ceph/osd/ceph-2
>> > > >> /dev/sdk1       3.7T  210G  3.5T   6%
>> > > >> /var/lib/ceph/osd/ceph-9
>> > > >> /dev/sdb1       3.7T  164G  3.5T   5%
>> > > >> /var/lib/ceph/osd/ceph-0    ---> This is the problematic OSD
>> > > >> /dev/sdc1       3.7T  183G  3.5T   5%
>> > > >> /var/lib/ceph/osd/ceph-1
>> > > >>
>> > > >> # service ceph start osd.0
>> > > >> find: `/var/lib/ceph/osd/ceph-0: Input/output error
>> > > >> /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf 
>> defines
>> > > >> mon.store1 osd.6 osd.9 osd.1 osd.4 osd.3 osd.2 osd.8 osd.5 
>> osd.7
>> > > >> mds.store1 mon.store3, /var/lib/ceph defines mon.store1 osd.6 
>> osd.9
>> > > >> osd.1 osd.4 osd.3 osd.2 osd.8 osd.5 osd.7 mds.store1)
>> > > >>
>> > > >> I have found this:
>> > > >>
>> > > >
>> > > > http://ceph.com/geen-categorie/admin-guide-
>> > replacing-a-failed-disk-in-a-ceph-cluster/
>> > > >> [1]
>> > > >>
>> > > >> and I am looking for your guidance in order to properly 
>> perform all
>> > > >> actions in order not to loose any data and keep the ones of 
>> the
>> > > >> second copy.
>> > > >
>> > > > What guidance are you looking for besides the steps to replace 
>> a
>> > > > failed disk (which you already found) ?
>> > > > If i look at your situation, there is nothing down in terms of
>> > > > availability of pgs, just a failed drive which needs to be 
>> replaced.
>> > > >
>> > > > Is the cluster still recovering? It should reach HEALTH_OK 
>> again
>> > > > after
>> > > > rebalancing the cluster when an OSD goes down.
>> > > >
>> > > > If it stopped recovering it might have to do with the ceph 
>> tunables
>> > > > which are not set to optimal by default on firefly and that 
>> prevents
>> > > > further rebalancing.
>> > > > WARNING: Dont just set tunables to optimal because it will 
>> trigger a
>> > > > massive rebalance!
>> > > >
>> > > > Perhaps the second golden rule is to never run a CEPH 
>> production
>> > > > cluster without knowing (and testing) how to replace a failed 
>> drive.
>> > > > (Im not trying to be harsh here).
>> > > >
>> > > > Kind regards,
>> > > > Caspar
>> > > >
>> > > >
>> > > >> Best regards,
>> > > >>
>> > > >> G.
>> > > >> _______________________________________________
>> > > >> ceph-users mailing list
>> > > >> ceph-users at lists.ceph.com [2]
>> > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]
>> > > >
>> > > >
>> > > >
>> > > > Links:
>> > > > ------
>> > > > [1]
>> > > >
>> > > > http://ceph.com/geen-categorie/admin-guide-
>> > replacing-a-failed-disk-in-a-ceph-cluster/
>> > > > [2] mailto:ceph-users at lists.ceph.com
>> > > > [3] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > > [4] mailto:giorgis at acmac.uoc.gr
>> > > _______________________________________________
>> > > ceph-users mailing list
>> > > ceph-users at lists.ceph.com
>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users at lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
 Dr. Dimitrakakis Georgios

 Networks and Systems Administrator

 Archimedes Center for Modeling, Analysis & Computation (ACMAC)
 School of Sciences and Engineering
 University of Crete
 P.O. Box 2208
 710 - 03 Heraklion
 Crete, Greece

 Tel: +30 2810 393717
 Fax: +30 2810 393660

 E-mail: giorgis at acmac.uoc.gr


More information about the ceph-users mailing list