[ceph-users] Mimic offline problem

Goktug Yildirim goktug.yildirim at gmail.com
Tue Oct 2 13:01:47 PDT 2018

Thanks for the reply! My answers are inline.

> On 2 Oct 2018, at 21:51, Paul Emmerich <paul.emmerich at croit.io> wrote:
> (Didn't follow the whole story, so you might have already answered that)
> Did you check what the OSDs are doing during the period of high disk
> utilization?
> As in:
> * running perf top
Did not cross my mind. Thanks for the pop-up! Will do.
> * sampling a few stack traces from procfs or gdb
I have strace for OSD. https://paste.ubuntu.com/p/8n2kTvwnG6/
> * or just high log settings
They have default debug settings and also log disk is different. Indeed I have a fairly fast system. OS disks are Mirror SSD, WALs+DBs are mirrored NvME and OSD disks are NL-SAS. All hardware came from Dell (R730). Also 28 Core and 256GB RAM per server and 2x10Ge cluster and 2x10Gbe for public networks.
> * running "status" on the admin socket locally
I can run daemon and see status. I must have checked it but will do again.
> Paul
> Am Di., 2. Okt. 2018 um 20:02 Uhr schrieb Goktug Yildirim
> <goktug.yildirim at gmail.com>:
>> Hello Darius,
>> Thanks for reply!
>> The main problem is we can not query PGs. “ceph pg 67.54f query” does stucks and wait forever since OSD is unresponsive.
>> We are certain that OSD gets unresponsive as soon as it UP. And we are certain that OSD responds again after its disk utilization stops.
>> So we have a small test like that:
>> * Stop all OSDs (168 of them)
>> * Start OSD1. %95 osd disk utilization immediately starts. It takes 8 mins to finish. Only after that “ceph pg 67.54f query” works!
>> * While OSD1 is “up" start OSD2. As soon as OSD2 starts OSD1 & OSD2 starts %95 disk utilization. This takes 17 minutes to finish.
>> * Now start OSD3 and it is the same. All OSDs start high I/O and it takes 25 mins to settle.
>> * If you happen to start 5 of them at the same all of the OSDs start high I/O again. And it takes 1 hour to finish.
>> So in the light of these findings we flagged noup, started all OSDs. At first there was no I/O. After 10 minutes we unset noup. All of 168 OSD started to make high I/O. And we thought that if we wait long enough it will finish & OSDs will be responsive again. After 24hours they did not because I/O did not finish or even slowed down.
>> One can think that is a lot of data there to scan. But it is just 33TB.
>> So at short we dont know which PG is stuck so we can remove it.
>> However we met an weird thing half an hour ago. We exported the same PG from two different OSDs. One was 4.2GB and the other is 500KB! So we decided to export all OSDs for backup. Then we will delete strange sized ones and start the cluster all over. Maybe then we could solve the stucked or unfound PGs as you advise.
>> Any thought would be greatly appreciated.
>>> On 2 Oct 2018, at 18:16, Darius Kasparavičius <daznis at gmail.com> wrote:
>>> Hello,
>>> Currently you have 15 objects missing. I would recommend finding them
>>> and making backups of them. Ditch all other osds that are failing to
>>> start and concentrate on bringing online those that have missing
>>> objects. Then slowly turn off nodown and noout on the cluster and see
>>> if it stabilises. If it stabilises leave these setting if not turn
>>> them back on.
>>> Now get some of the pg's that are blocked and querry the pgs to check
>>> why they are blocked. Try removing as much blocks as possible and then
>>> remove the norebalance/norecovery flags and see if it starts to fix
>>> itself. On Tue, Oct 2, 2018 at 5:14 PM by morphin
>>> <morphinwithyou at gmail.com> wrote:
>>>> One of ceph experts indicated that bluestore is somewhat preview tech
>>>> (as for Redhat).
>>>> So it could be best to checkout bluestore and rocksdb. There are some
>>>> tools to check health and also repair. But there are limited
>>>> documentation.
>>>> Anyone who has experince with it?
>>>> Anyone lead/help to a proper check would be great.
>>>> Goktug Yildirim <goktug.yildirim at gmail.com>, 1 Eki 2018 Pzt, 22:55
>>>> tarihinde şunu yazdı:
>>>>> Hi all,
>>>>> We have recently upgraded from luminous to mimic. It’s been 6 days since this cluster is offline. The long short story is here: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/030078.html
>>>>> I’ve also CC’ed developers since I believe this is a bug. If this is not to correct way I apology and please let me know.
>>>>> For the 6 days lots of thing happened and there were some outcomes about the problem. Some of them was misjudged and some of them are not looked deeper.
>>>>> However the most certain diagnosis is this: each OSD causes very high disk I/O to its bluestore disk (WAL and DB are fine). After that OSDs become unresponsive or very very less responsive. For example "ceph tell osd.x version” stucks like for ever.
>>>>> So due to unresponsive OSDs cluster does not settle. This is our problem!
>>>>> This is the one we are very sure of. But we are not sure of the reason.
>>>>> Here is the latest ceph status:
>>>>> https://paste.ubuntu.com/p/2DyZ5YqPjh/.
>>>>> This is the status after we started all of the OSDs 24 hours ago.
>>>>> Some of the OSDs are not started. However it didnt make any difference when all of them was online.
>>>>> Here is the debug=20 log of an OSD which is same for all others:
>>>>> https://paste.ubuntu.com/p/8n2kTvwnG6/
>>>>> As we figure out there is a loop pattern. I am sure it wont caught from eye.
>>>>> This the full log the same OSD.
>>>>> https://www.dropbox.com/s/pwzqeajlsdwaoi1/ceph-osd.90.log?dl=0
>>>>> Here is the strace of the same OSD process:
>>>>> https://paste.ubuntu.com/p/8n2kTvwnG6/
>>>>> Recently we hear more to uprade mimic. I hope none get hurts as we do. I am sure we have done lots of mistakes to let this happening. And this situation may be a example for other user and could be a potential bug for ceph developer.
>>>>> Any help to figure out what is going on would be great.
>>>>> Best Regards,
>>>>> Goktug Yildirim
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users at lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> -- 
> Paul Emmerich
> Looking for help with your Ceph cluster? Contact us at https://croit.io
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90

More information about the ceph-users mailing list