[ceph-users] deep scrub error caused by missing object
ceph at elchaka.de
ceph at elchaka.de
Fri Oct 5 10:46:33 PDT 2018
I am Not sure if i could be a help but perhaps this Commands can help to find the objects in question...
Ceph Heath Detail
rados list-inconsistent-pg rbd
rados list-inconsistent-obj 2.10d
I guess it is also interresting to know you use bluestore or filestore...
Am 4. Oktober 2018 14:06:07 MESZ schrieb Roman Steinhart <roman at aternos.org>:
>since some weeks we have a small problem with one of the PG's on our
>Every time the pg 2.10d is deep scrubbing it fails because of this:
>2018-08-06 19:36:28.080707 osd.14 osd.14 *.*.*.110:6809/3935 133 :
>cluster [ERR] 2.10d scrub stat mismatch, got 397/398 objects, 0/0
>clones, 397/398 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0
>whiteouts, 2609281919/2609293215 bytes, 0/0 hit_set_archive bytes.
>2018-08-06 19:36:28.080905 osd.14 osd.14 *.*.*.110:6809/3935 134 :
>cluster [ERR] 2.10d scrub 1 errors
>As far as I understand ceph is missing an object on that osd.14 which
>should be stored on this osd. A small ceph pg repair 2.10d fixes the
>problem but as soon as a deep scrubbing job for that pg is running
>again(manual or automatically) the problem is back again.
>I tried to find out which object is missing, but a small search leads
>me to the result that there is no real way to find out which objects
>are stored in this PG or which object exactly is missing.
>That's why I've gone for some "unconventional" methods.
>I completely removed OSD.14 from the cluster. I waited until everything
>was balanced and then added the OSD again.
>Unfortunately the problem is still there.
>Some weeks later we've added a huge amount of OSD's to our cluster
>which had a big impact on the crush map.
>Since then the PG 2.10d was running on two other OSD's -> [119,93] (We
>have a replica of 2)
>Still the same error message, but another OSD:
>2018-10-03 03:39:22.776521 7f12d9979700 -1 log_channel(cluster) log
>[ERR] : 2.10d scrub stat mismatch, got 728/729 objects, 0/0 clones,
>728/729 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0
>whiteouts, 7281369687/7281381269 bytes, 0/0 hit_set_archive bytes.
>As a first step it would be enough for me to find out which the
>problematic object is. Then I am able to check if the object is
>critical, if any recovery is required or if I am able to just drop that
>object(That would be 90% of the case)
>I hope anyone is able to help me to get rid of this.
>It's not really a problem for us. Ceph runs despite this message
>without further problems.
>It's just a bit annoying that every time the error occurs our
>monitoring triggers a big alarm because Ceph is in ERROR status. :)
>Thanks in advance,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ceph-users