[ceph-users] ceph inconsistent pg missing ec object

Gregory Farnum gfarnum at redhat.com
Thu Nov 2 11:27:34 PDT 2017


Okay, after consulting with a colleague this appears to be an instance of
http://tracker.ceph.com/issues/21382. Assuming the object is one that
doesn't have snapshots, your easiest resolution is to use rados get to
retrieve the object (which, unlike recovery, should work) and then "rados
put" it back in to place.

This fix might be backported to Jewel for a later release, but it's tricky
so wasn't done proactively.
-Greg

On Fri, Oct 20, 2017 at 12:27 AM Stijn De Weirdt <stijn.deweirdt at ugent.be>
wrote:

> hi gregory,
>
> we more or less followed the instructions on the site (famous last
> words, i know ;)
>
> grepping for the error in the osd logs of the osds of the pg, the
> primary logs had "5.5e3s0 shard 59(5) missing
> 5:c7ae919b:::10014d3184b.00000000:head"
>
> we looked for the object using the find command, we got
>
> > [root at osd003 ~]# find /var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/
> -name "*10014d3184b.00000000*"
> >
> >
> /var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/DIR_3/DIR_E/DIR_5/DIR_7/DIR_9/10014d3184b.00000000__head_D98975E3__5_ffffffffffffffff_0
>
> then we ran this find on all 11 osds from the pg, and 10 out of 11 osds
> gave similar path (the suffix _[0-9a] matched the index of the osd in
> the list of osds reported by the pg, so i assumed that was the ec
> splitting up the data in 11 pieces)
>
> on one osd in the list of osds, there was no such object (the 6th one,
> index 5, so more assuming form our side that this was the 5 in 5:...
> from the logfile). so we assumed this was the missing object that the
> error reported. we have absolutely no clue why it was missing or what
> happened, nothing in any logs.
>
> what we did then was stop the osd that had the missing object, flush the
> journal and start the osd and ran repair. (the guide mentioned to delete
> an object, we did not delete anything, because we assumed the issue was
> the already missing object from the 6th osd)
>
> flushing the journal segfaulted, but the osd started fine again.
>
> the scrub errors did not disappear, so we did the same again on the
> primary (no deleting of anything; and again, the flush segfaulted).
>
> wrt the segfault, i attached the output of a segfaulting flush with
> debug on another osd.
>
>
> stijn
>
>
> On 10/20/2017 02:56 AM, Gregory Farnum wrote:
> > Okay, you're going to need to explain in very clear terms exactly what
> > happened to your cluster, and *exactly* what operations you performed
> > manually.
> >
> > The PG shards seem to have different views of the PG in question. The
> > primary has a different log_tail, last_user_version, and last_epoch_clean
> > from the others. Plus different log sizes? It's not making a ton of sense
> > at first glance.
> > -Greg
> >
> > On Thu, Oct 19, 2017 at 1:08 AM Stijn De Weirdt <stijn.deweirdt at ugent.be
> >
> > wrote:
> >
> >> hi greg,
> >>
> >> i attached the gzip output of the query and some more info below. if you
> >> need more, let me know.
> >>
> >> stijn
> >>
> >>> [root at mds01 ~]# ceph -s
> >>>     cluster 92beef0a-1239-4000-bacf-4453ab630e47
> >>>      health HEALTH_ERR
> >>>             1 pgs inconsistent
> >>>             40 requests are blocked > 512 sec
> >>>             1 scrub errors
> >>>             mds0: Behind on trimming (2793/30)
> >>>      monmap e1: 3 mons at {mds01=
> >> 1.2.3.4:6789/0,mds02=1.2.3.5:6789/0,mds03=1.2.3.6:6789/0}
> >>>             election epoch 326, quorum 0,1,2 mds01,mds02,mds03
> >>>       fsmap e238677: 1/1/1 up {0=mds02=up:active}, 2 up:standby
> >>>      osdmap e79554: 156 osds: 156 up, 156 in
> >>>             flags sortbitwise,require_jewel_osds
> >>>       pgmap v51003893: 4096 pgs, 3 pools, 387 TB data, 243 Mobjects
> >>>             545 TB used, 329 TB / 874 TB avail
> >>>                 4091 active+clean
> >>>                    4 active+clean+scrubbing+deep
> >>>                    1 active+clean+inconsistent
> >>>   client io 284 kB/s rd, 146 MB/s wr, 145 op/s rd, 177 op/s wr
> >>>   cache io 115 MB/s flush, 153 MB/s evict, 14 op/s promote, 3 PG(s)
> >> flushing
> >>
> >>> [root at mds01 ~]# ceph health detail
> >>> HEALTH_ERR 1 pgs inconsistent; 52 requests are blocked > 512 sec; 5
> osds
> >> have slow requests; 1 scrub errors; mds0: Behind on trimming (2782/30)
> >>> pg 5.5e3 is active+clean+inconsistent, acting
> >> [35,50,91,18,139,59,124,40,104,12,71]
> >>> 34 ops are blocked > 524.288 sec on osd.8
> >>> 6 ops are blocked > 524.288 sec on osd.67
> >>> 6 ops are blocked > 524.288 sec on osd.27
> >>> 1 ops are blocked > 524.288 sec on osd.107
> >>> 5 ops are blocked > 524.288 sec on osd.116
> >>> 5 osds have slow requests
> >>> 1 scrub errors
> >>> mds0: Behind on trimming (2782/30)(max_segments: 30, num_segments:
> 2782)
> >>
> >>> # zgrep -C 1 ERR ceph-osd.35.log.*.gz
> >>> ceph-osd.35.log.5.gz:2017-10-14 11:25:52.260668 7f34d6748700  0 --
> >> 10.141.16.13:6801/1001792 >> 1.2.3.11:6803/1951 pipe(0x56412da80800
> >> sd=273 :6801 s=2 pgs=3176 cs=31 l=0 c=0x564156e83b00).fault with
> nothing to
> >> send, going to standby
> >>> ceph-osd.35.log.5.gz:2017-10-14 11:26:06.071011 7f3511be4700 -1
> >> log_channel(cluster) log [ERR] : 5.5e3s0 shard 59(5) missing
> >> 5:c7ae919b:::10014d3184b.00000000:head
> >>> ceph-osd.35.log.5.gz:2017-10-14 11:28:36.465684 7f34ffdf5700  0 --
> >> 1.2.3.13:6801/1001792 >> 1.2.3.21:6829/1834 pipe(0x56414e2a2000 sd=37
> >> :6801 s=0 pgs=0 cs=0 l=0 c=0x5641470d2a00).accept connect_seq 33 vs
> >> existing 33 state standby
> >>> ceph-osd.35.log.5.gz:--
> >>> ceph-osd.35.log.5.gz:2017-10-14 11:43:35.570711 7f3508efd700  0 --
> >> 1.2.3.13:6801/1001792 >> 1.2.3.20:6825/1806 pipe(0x56413be34000 sd=138
> >> :6801 s=2 pgs=2763 cs=45 l=0 c=0x564132999480).fault with nothing to
> send,
> >> going to standby
> >>> ceph-osd.35.log.5.gz:2017-10-14 11:44:02.235548 7f3511be4700 -1
> >> log_channel(cluster) log [ERR] : 5.5e3s0 deep-scrub 1 missing, 0
> >> inconsistent objects
> >>> ceph-osd.35.log.5.gz:2017-10-14 11:44:02.235554 7f3511be4700 -1
> >> log_channel(cluster) log [ERR] : 5.5e3 deep-scrub 1 errors
> >>> ceph-osd.35.log.5.gz:2017-10-14 11:59:02.331454 7f34d6d4e700  0 --
> >> 1.2.3.13:6801/1001792 >> 1.2.3.11:6817/1941 pipe(0x56414d370800 sd=227
> >> :42104 s=2 pgs=3238 cs=89 l=0 c=0x56413122d200).fault with nothing to
> send,
> >> going to standby
> >>
> >>
> >>
> >> On 10/18/2017 10:19 PM, Gregory Farnum wrote:
> >>> It would help if you can provide the exact output of "ceph -s", "pg
> >> query",
> >>> and any other relevant data. You shouldn't need to do manual repair of
> >>> erasure-coded pools, since it has checksums and can tell which bits are
> >>> bad. Following that article may not have done you any good (though I
> >>> wouldn't expect it to hurt, either...)...
> >>> -Greg
> >>>
> >>> On Wed, Oct 18, 2017 at 5:56 AM Stijn De Weirdt <
> stijn.deweirdt at ugent.be
> >>>
> >>> wrote:
> >>>
> >>>> hi all,
> >>>>
> >>>> we have a ceph 10.2.7 cluster with a 8+3 EC pool.
> >>>> in that pool, there is a pg in inconsistent state.
> >>>>
> >>>> we followed
> http://ceph.com/geen-categorie/ceph-manually-repair-object/
> >> ,
> >>>> however, we are unable to solve our issue.
> >>>>
> >>>> from the primary osd logs, the reported pg had a missing object.
> >>>>
> >>>> we found a related object on the primary osd, and then looked for
> >>>> similar ones on the other osds in same path (i guess it is just has
> the
> >>>> index of the osd in the pg list of osds suffixed)
> >>>>
> >>>> one osd did not have such a file (the 10 others did).
> >>>>
> >>>> so we did the "stop osd/flush/start os/pg repair" on both the primary
> >>>> osd and on the osd with the missing EC part.
> >>>>
> >>>> however, the scrub error still exists.
> >>>>
> >>>> does anyone have any hints what to do in this case?
> >>>>
> >>>> stijn
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> ceph-users at lists.ceph.com
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171102/77856fef/attachment.html>


More information about the ceph-users mailing list