[ceph-users] FAILED assert(p.same_interval_since) and unusable cluster

David Zafman dzafman at redhat.com
Wed Nov 1 11:10:30 PDT 2017


Jon,

     If you are able please test my tentative fix for this issue which 
is in https://github.com/ceph/ceph/pull/18673


Thanks

David


On 10/30/17 1:13 AM, Jon Light wrote:
> Hello,
>
> I have three OSDs that are crashing on start with a FAILED
> assert(p.same_interval_since) error. I ran across a thread from a few days
> ago about the same issue and a ticket was created here:
> http://tracker.ceph.com/issues/21833.
>
> A very overloaded node in my cluster OOM'd many times which eventually led
> to the problematic PGs and then the failed assert.
>
> I currently have 49 pgs inactive, 33 pgs down, 15 pgs incomplete as well as
> 0.028% of objects unfound. Presumably due to this, I can't add any data to
> the FS or read some data. Just about any IO ends up in a good bit of stuck
> requests.
>
> Hopefully a fix can come from the issue, but can anyone give me some
> suggestions or guidance to get the cluster in a working state in the
> meantime?
>
> Thanks
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



More information about the ceph-users mailing list