[ceph-users] Mimic offline problem

by morphin morphinwithyou at gmail.com
Tue Oct 2 07:13:40 PDT 2018


One of ceph experts indicated that bluestore is somewhat preview tech
(as for Redhat).
So it could be best to checkout bluestore and rocksdb. There are some
tools to check health and also repair. But there are limited
documentation.
Anyone who has experince with it?
Anyone lead/help to a proper check would be great.
Goktug Yildirim <goktug.yildirim at gmail.com>, 1 Eki 2018 Pzt, 22:55
tarihinde şunu yazdı:
>
> Hi all,
>
> We have recently upgraded from luminous to mimic. It’s been 6 days since this cluster is offline. The long short story is here: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/030078.html
>
> I’ve also CC’ed developers since I believe this is a bug. If this is not to correct way I apology and please let me know.
>
> For the 6 days lots of thing happened and there were some outcomes about the problem. Some of them was misjudged and some of them are not looked deeper.
> However the most certain diagnosis is this: each OSD causes very high disk I/O to its bluestore disk (WAL and DB are fine). After that OSDs become unresponsive or very very less responsive. For example "ceph tell osd.x version” stucks like for ever.
>
> So due to unresponsive OSDs cluster does not settle. This is our problem!
>
> This is the one we are very sure of. But we are not sure of the reason.
>
> Here is the latest ceph status:
> https://paste.ubuntu.com/p/2DyZ5YqPjh/.
>
> This is the status after we started all of the OSDs 24 hours ago.
> Some of the OSDs are not started. However it didnt make any difference when all of them was online.
>
> Here is the debug=20 log of an OSD which is same for all others:
> https://paste.ubuntu.com/p/8n2kTvwnG6/
> As we figure out there is a loop pattern. I am sure it wont caught from eye.
>
> This the full log the same OSD.
> https://www.dropbox.com/s/pwzqeajlsdwaoi1/ceph-osd.90.log?dl=0
>
> Here is the strace of the same OSD process:
> https://paste.ubuntu.com/p/8n2kTvwnG6/
>
> Recently we hear more to uprade mimic. I hope none get hurts as we do. I am sure we have done lots of mistakes to let this happening. And this situation may be a example for other user and could be a potential bug for ceph developer.
>
> Any help to figure out what is going on would be great.
>
> Best Regards,
> Goktug Yildirim


More information about the ceph-users mailing list