[ceph-users] Libvirt hosts freeze after ceph osd+mon problem

Jason Dillaman jdillama at redhat.com
Tue Nov 7 05:16:41 PST 2017


If you are seeing this w/ librbd and krbd, I would suggest trying a
different version of QEMU and/or different host OS since loss of a disk
shouldn't hang it -- only potentially the guest OS.

On Tue, Nov 7, 2017 at 5:17 AM, Jan Pekař - Imatic <jan.pekar at imatic.cz>
wrote:

> I'm calling kill -STOP to simulate behavior, that occurred, when on one
> ceph node i was out of memory. Processes was not killed, but were somehow
> suspended/unresponsible (they couldn't create new threads etc), and that
> caused all virtuals (on other nodes) to hung.
> I decided to simulate it with kill -STOP MONPID OSDPID and I succeeded.
>
> After I stop MON with OSD, it took few seconds to get osd unresponsive
> messages, and exactly when I get final
> libceph: osd6 down
> all my virtuals stops responding (stop pinging, unable to use VNC etc)
> Tried with librdb disk definition or rbd map device attached inside
> QEMU/KVM virtuals.
>
> JP
>
>
> On 7.11.2017 10:57, Piotr Dałek wrote:
>
>> On 17-11-07 12:02 AM, Jan Pekař - Imatic wrote:
>>
>>> Hi,
>>>
>>> I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu
>>> 1:2.8+dfsg-6+deb9u3
>>> I'm running 3 nodes with 3 monitors and 8 osds on my nodes, all on IPV6.
>>>
>>> When I tested the cluster, I detected strange and severe problem.
>>> On first node I'm running qemu hosts with librados disk connection to
>>> the cluster and all 3 monitors mentioned in connection.
>>> On second node I stopped mon and osd with command
>>>
>>> kill -STOP MONPID OSDPID
>>>
>>> Within one minute all my qemu hosts on first node freeze, so they even
>>> don't respond to ping. [..]
>>>
>>
>> Why would you want to *stop* (as in, freeze) a process instead of killing
>> it?
>> Anyway, with processes still there, it may take a few minutes before
>> cluster realizes that daemons are stopped and kicks it out of cluster,
>> restoring normal behavior (assuming correctly set crush rules).
>>
>>
> --
> ============
> Ing. Jan Pekař
> jan.pekar at imatic.cz | +420603811737
> ----
> Imatic | Jagellonská 14 | Praha 3 | 130 00
> http://www.imatic.cz
> ============
> --
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171107/763bde51/attachment.html>


More information about the ceph-users mailing list