[ceph-users] Inconsistent PG, repair doesn't work

Brett Chancellor bchancellor at salesforce.com
Thu Oct 11 10:27:07 PDT 2018


This seems like a bug. If I'm kicking off a repair manually it should take
place immediately, and ignore flags such as max scrubs, or minimum scrub
window.

-Brett

On Thu, Oct 11, 2018 at 1:11 PM David Turner <drakonstein at gmail.com> wrote:

> As a part of a repair is queuing a deep scrub. As soon as the repair part
> is over the deep scrub continues until it is done.
>
> On Thu, Oct 11, 2018, 12:26 PM Brett Chancellor <
> bchancellor at salesforce.com> wrote:
>
>> Does the "repair" function use the same rules as a deep scrub? I couldn't
>> get one to kick off, until I temporarily increased the max_scrubs and
>> lowered the scrub_min_interval on all 3 OSDs for that placement group. This
>> ended up fixing the issue, so I'll leave this here in case somebody else
>> runs into it.
>>
>> sudo ceph tell 'osd.208' injectargs '--osd_max_scrubs 3'
>> sudo ceph tell 'osd.120' injectargs '--osd_max_scrubs 3'
>> sudo ceph tell 'osd.235' injectargs '--osd_max_scrubs 3'
>> sudo ceph tell 'osd.208' injectargs '--osd_scrub_min_interval 1.0'
>> sudo ceph tell 'osd.120' injectargs '--osd_scrub_min_interval 1.0'
>> sudo ceph tell 'osd.235' injectargs '--osd_scrub_min_interval 1.0'
>> sudo ceph pg repair 75.302
>>
>> -Brett
>>
>>
>> On Thu, Oct 11, 2018 at 8:42 AM Maks Kowalik <maks_kowalik at poczta.fm>
>> wrote:
>>
>>> Imho moving was not the best idea (a copying attempt would have told if
>>> the read error was the case here).
>>> Scrubs might don't want to start if there are many other scrubs ongoing.
>>>
>>> czw., 11 paź 2018 o 14:27 Brett Chancellor <bchancellor at salesforce.com>
>>> napisał(a):
>>>
>>>> I moved the file. But the cluster won't actually start any scrub/repair
>>>> I manually initiate.
>>>>
>>>> On Thu, Oct 11, 2018, 7:51 AM Maks Kowalik <maks_kowalik at poczta.fm>
>>>> wrote:
>>>>
>>>>> Based on the log output it looks like you're having a damaged file on
>>>>> OSD 235 where the shard is stored.
>>>>> To ensure if that's the case you should find the file (using
>>>>> 81d5654895863d as a part of its name) and try to copy it to another
>>>>> directory.
>>>>> If you get the I/O error while copying, the next steps would be to
>>>>> delete the file, run the scrub on 75.302 and take a deep look at the
>>>>> OSD.235 for any other errors.
>>>>>
>>>>> Kind regards,
>>>>> Maks
>>>>>
>>>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181011/2bc8bf5b/attachment.html>


More information about the ceph-users mailing list