[ceph-users] ceph inconsistent pg missing ec object

Kenneth Waegeman kenneth.waegeman at ugent.be
Thu Nov 9 01:17:00 PST 2017


Hi Greg,

Thanks! This seems to have worked for at least 1 of 2 inconsistent pgs: 
The inconsistency disappeared after a new scrub. Still waiting for the 
result of the second pg. I tried to force deep-scrub with `ceph pg 
deep-scrub <pg>` yesterday, but today the last deep scrub is still from 
a week ago. Is there a way to actually deep-scrub immediately?

Thanks again!

Kenneth


On 02/11/17 19:27, Gregory Farnum wrote:
> Okay, after consulting with a colleague this appears to be an instance 
> of http://tracker.ceph.com/issues/21382. Assuming the object is one 
> that doesn't have snapshots, your easiest resolution is to use rados 
> get to retrieve the object (which, unlike recovery, should work) and 
> then "rados put" it back in to place.
>
> This fix might be backported to Jewel for a later release, but it's 
> tricky so wasn't done proactively.
> -Greg
>
> On Fri, Oct 20, 2017 at 12:27 AM Stijn De Weirdt 
> <stijn.deweirdt at ugent.be <mailto:stijn.deweirdt at ugent.be>> wrote:
>
>     hi gregory,
>
>     we more or less followed the instructions on the site (famous last
>     words, i know ;)
>
>     grepping for the error in the osd logs of the osds of the pg, the
>     primary logs had "5.5e3s0 shard 59(5) missing
>     5:c7ae919b:::10014d3184b.00000000:head"
>
>     we looked for the object using the find command, we got
>
>     > [root at osd003 ~]# find
>     /var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/ -name
>     "*10014d3184b.00000000*"
>     >
>     >
>     /var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/DIR_3/DIR_E/DIR_5/DIR_7/DIR_9/10014d3184b.00000000__head_D98975E3__5_ffffffffffffffff_0
>
>     then we ran this find on all 11 osds from the pg, and 10 out of 11
>     osds
>     gave similar path (the suffix _[0-9a] matched the index of the osd in
>     the list of osds reported by the pg, so i assumed that was the ec
>     splitting up the data in 11 pieces)
>
>     on one osd in the list of osds, there was no such object (the 6th one,
>     index 5, so more assuming form our side that this was the 5 in 5:...
>     from the logfile). so we assumed this was the missing object that the
>     error reported. we have absolutely no clue why it was missing or what
>     happened, nothing in any logs.
>
>     what we did then was stop the osd that had the missing object,
>     flush the
>     journal and start the osd and ran repair. (the guide mentioned to
>     delete
>     an object, we did not delete anything, because we assumed the
>     issue was
>     the already missing object from the 6th osd)
>
>     flushing the journal segfaulted, but the osd started fine again.
>
>     the scrub errors did not disappear, so we did the same again on the
>     primary (no deleting of anything; and again, the flush segfaulted).
>
>     wrt the segfault, i attached the output of a segfaulting flush with
>     debug on another osd.
>
>
>     stijn
>
>
>     On 10/20/2017 02:56 AM, Gregory Farnum wrote:
>     > Okay, you're going to need to explain in very clear terms
>     exactly what
>     > happened to your cluster, and *exactly* what operations you
>     performed
>     > manually.
>     >
>     > The PG shards seem to have different views of the PG in
>     question. The
>     > primary has a different log_tail, last_user_version, and
>     last_epoch_clean
>     > from the others. Plus different log sizes? It's not making a ton
>     of sense
>     > at first glance.
>     > -Greg
>     >
>     > On Thu, Oct 19, 2017 at 1:08 AM Stijn De Weirdt
>     <stijn.deweirdt at ugent.be <mailto:stijn.deweirdt at ugent.be>>
>     > wrote:
>     >
>     >> hi greg,
>     >>
>     >> i attached the gzip output of the query and some more info
>     below. if you
>     >> need more, let me know.
>     >>
>     >> stijn
>     >>
>     >>> [root at mds01 ~]# ceph -s
>     >>>     cluster 92beef0a-1239-4000-bacf-4453ab630e47
>     >>>      health HEALTH_ERR
>     >>>             1 pgs inconsistent
>     >>>             40 requests are blocked > 512 sec
>     >>>             1 scrub errors
>     >>>             mds0: Behind on trimming (2793/30)
>     >>>      monmap e1: 3 mons at {mds01=
>     >> 1.2.3.4:6789/0,mds02=1.2.3.5:6789/0,mds03=1.2.3.6:6789/0
>     <http://1.2.3.4:6789/0,mds02=1.2.3.5:6789/0,mds03=1.2.3.6:6789/0>}
>     >>>             election epoch 326, quorum 0,1,2 mds01,mds02,mds03
>     >>>       fsmap e238677: 1/1/1 up {0=mds02=up:active}, 2 up:standby
>     >>>      osdmap e79554: 156 osds: 156 up, 156 in
>     >>>             flags sortbitwise,require_jewel_osds
>     >>>       pgmap v51003893: 4096 pgs, 3 pools, 387 TB data, 243
>     Mobjects
>     >>>             545 TB used, 329 TB / 874 TB avail
>     >>>                 4091 active+clean
>     >>>                    4 active+clean+scrubbing+deep
>     >>>                    1 active+clean+inconsistent
>     >>>   client io 284 kB/s rd, 146 MB/s wr, 145 op/s rd, 177 op/s wr
>     >>>   cache io 115 MB/s flush, 153 MB/s evict, 14 op/s promote, 3
>     PG(s)
>     >> flushing
>     >>
>     >>> [root at mds01 ~]# ceph health detail
>     >>> HEALTH_ERR 1 pgs inconsistent; 52 requests are blocked > 512
>     sec; 5 osds
>     >> have slow requests; 1 scrub errors; mds0: Behind on trimming
>     (2782/30)
>     >>> pg 5.5e3 is active+clean+inconsistent, acting
>     >> [35,50,91,18,139,59,124,40,104,12,71]
>     >>> 34 ops are blocked > 524.288 sec on osd.8
>     >>> 6 ops are blocked > 524.288 sec on osd.67
>     >>> 6 ops are blocked > 524.288 sec on osd.27
>     >>> 1 ops are blocked > 524.288 sec on osd.107
>     >>> 5 ops are blocked > 524.288 sec on osd.116
>     >>> 5 osds have slow requests
>     >>> 1 scrub errors
>     >>> mds0: Behind on trimming (2782/30)(max_segments: 30,
>     num_segments: 2782)
>     >>
>     >>> # zgrep -C 1 ERR ceph-osd.35.log.*.gz
>     >>> ceph-osd.35.log.5.gz:2017-10-14 11:25:52.260668 7f34d6748700  0 --
>     >> 10.141.16.13:6801/1001792 <http://10.141.16.13:6801/1001792> >>
>     1.2.3.11:6803/1951 <http://1.2.3.11:6803/1951> pipe(0x56412da80800
>     >> sd=273 :6801 s=2 pgs=3176 cs=31 l=0 c=0x564156e83b00).fault
>     with nothing to
>     >> send, going to standby
>     >>> ceph-osd.35.log.5.gz:2017-10-14 11:26:06.071011 7f3511be4700 -1
>     >> log_channel(cluster) log [ERR] : 5.5e3s0 shard 59(5) missing
>     >> 5:c7ae919b:::10014d3184b.00000000:head
>     >>> ceph-osd.35.log.5.gz:2017-10-14 11:28:36.465684 7f34ffdf5700  0 --
>     >> 1.2.3.13:6801/1001792 <http://1.2.3.13:6801/1001792> >>
>     1.2.3.21:6829/1834 <http://1.2.3.21:6829/1834> pipe(0x56414e2a2000
>     sd=37
>     >> :6801 s=0 pgs=0 cs=0 l=0 c=0x5641470d2a00).accept connect_seq 33 vs
>     >> existing 33 state standby
>     >>> ceph-osd.35.log.5.gz:--
>     >>> ceph-osd.35.log.5.gz:2017-10-14 11:43:35.570711 7f3508efd700  0 --
>     >> 1.2.3.13:6801/1001792 <http://1.2.3.13:6801/1001792> >>
>     1.2.3.20:6825/1806 <http://1.2.3.20:6825/1806> pipe(0x56413be34000
>     sd=138
>     >> :6801 s=2 pgs=2763 cs=45 l=0 c=0x564132999480).fault with
>     nothing to send,
>     >> going to standby
>     >>> ceph-osd.35.log.5.gz:2017-10-14 11:44:02.235548 7f3511be4700 -1
>     >> log_channel(cluster) log [ERR] : 5.5e3s0 deep-scrub 1 missing, 0
>     >> inconsistent objects
>     >>> ceph-osd.35.log.5.gz:2017-10-14 11:44:02.235554 7f3511be4700 -1
>     >> log_channel(cluster) log [ERR] : 5.5e3 deep-scrub 1 errors
>     >>> ceph-osd.35.log.5.gz:2017-10-14 11:59:02.331454 7f34d6d4e700  0 --
>     >> 1.2.3.13:6801/1001792 <http://1.2.3.13:6801/1001792> >>
>     1.2.3.11:6817/1941 <http://1.2.3.11:6817/1941> pipe(0x56414d370800
>     sd=227
>     >> :42104 s=2 pgs=3238 cs=89 l=0 c=0x56413122d200).fault with
>     nothing to send,
>     >> going to standby
>     >>
>     >>
>     >>
>     >> On 10/18/2017 10:19 PM, Gregory Farnum wrote:
>     >>> It would help if you can provide the exact output of "ceph
>     -s", "pg
>     >> query",
>     >>> and any other relevant data. You shouldn't need to do manual
>     repair of
>     >>> erasure-coded pools, since it has checksums and can tell which
>     bits are
>     >>> bad. Following that article may not have done you any good
>     (though I
>     >>> wouldn't expect it to hurt, either...)...
>     >>> -Greg
>     >>>
>     >>> On Wed, Oct 18, 2017 at 5:56 AM Stijn De Weirdt
>     <stijn.deweirdt at ugent.be <mailto:stijn.deweirdt at ugent.be>
>     >>>
>     >>> wrote:
>     >>>
>     >>>> hi all,
>     >>>>
>     >>>> we have a ceph 10.2.7 cluster with a 8+3 EC pool.
>     >>>> in that pool, there is a pg in inconsistent state.
>     >>>>
>     >>>> we followed
>     http://ceph.com/geen-categorie/ceph-manually-repair-object/
>     >> ,
>     >>>> however, we are unable to solve our issue.
>     >>>>
>     >>>> from the primary osd logs, the reported pg had a missing object.
>     >>>>
>     >>>> we found a related object on the primary osd, and then looked for
>     >>>> similar ones on the other osds in same path (i guess it is
>     just has the
>     >>>> index of the osd in the pg list of osds suffixed)
>     >>>>
>     >>>> one osd did not have such a file (the 10 others did).
>     >>>>
>     >>>> so we did the "stop osd/flush/start os/pg repair" on both the
>     primary
>     >>>> osd and on the osd with the missing EC part.
>     >>>>
>     >>>> however, the scrub error still exists.
>     >>>>
>     >>>> does anyone have any hints what to do in this case?
>     >>>>
>     >>>> stijn
>     >>>> _______________________________________________
>     >>>> ceph-users mailing list
>     >>>> ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>     >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     >>>>
>     >>>
>     >>
>     >
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171109/7f05a4be/attachment.html>


More information about the ceph-users mailing list