[ceph-users] FAILED assert(p.same_interval_since) and unusable cluster

Jon Light jon at jonlight.com
Wed Nov 1 11:39:07 PDT 2017


I'm currently running 12.2.0. How should I go about applying the patch?
Should I upgrade to 12.2.1, apply the changes, and then recompile?

I really appreciate the patch.
Thanks

On Wed, Nov 1, 2017 at 11:10 AM, David Zafman <dzafman at redhat.com> wrote:

>
> Jon,
>
>     If you are able please test my tentative fix for this issue which is
> in https://github.com/ceph/ceph/pull/18673
>
>
> Thanks
>
> David
>
>
>
> On 10/30/17 1:13 AM, Jon Light wrote:
>
>> Hello,
>>
>> I have three OSDs that are crashing on start with a FAILED
>> assert(p.same_interval_since) error. I ran across a thread from a few days
>> ago about the same issue and a ticket was created here:
>> http://tracker.ceph.com/issues/21833.
>>
>> A very overloaded node in my cluster OOM'd many times which eventually led
>> to the problematic PGs and then the failed assert.
>>
>> I currently have 49 pgs inactive, 33 pgs down, 15 pgs incomplete as well
>> as
>> 0.028% of objects unfound. Presumably due to this, I can't add any data to
>> the FS or read some data. Just about any IO ends up in a good bit of stuck
>> requests.
>>
>> Hopefully a fix can come from the issue, but can anyone give me some
>> suggestions or guidance to get the cluster in a working state in the
>> meantime?
>>
>> Thanks
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171101/d5d423af/attachment.html>


More information about the ceph-users mailing list