[ceph-users] Libvirt hosts freeze after ceph osd+mon problem

Jason Dillaman jdillama at redhat.com
Mon Nov 6 15:30:16 PST 2017


If you could install the debug packages and get a gdb backtrace from all
threads it would be helpful. librbd doesn't utilize any QEMU threads so
even if librbd was deadlocked, the worst case that I would expect would be
your guest OS complaining about hung kernel tasks related to disk IO (since
the disk wouldn't be responding).

On Mon, Nov 6, 2017 at 6:02 PM, Jan Pekař - Imatic <jan.pekar at imatic.cz>
wrote:

> Hi,
>
> I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu
> 1:2.8+dfsg-6+deb9u3
> I'm running 3 nodes with 3 monitors and 8 osds on my nodes, all on IPV6.
>
> When I tested the cluster, I detected strange and severe problem.
> On first node I'm running qemu hosts with librados disk connection to the
> cluster and all 3 monitors mentioned in connection.
> On second node I stopped mon and osd with command
>
> kill -STOP MONPID OSDPID
>
> Within one minute all my qemu hosts on first node freeze, so they even
> don't respond to ping. On VNC screen there is no error (disk or kernel
> panic), they just hung forever with no console response. Even starting MON
> and OSD on stopped host doesn't make them running. Destroying the qemu
> domain and running again is the only solution.
>
> This happens even if virtual machine has all primary OSD on other OSDs
> from that I have stopped - so it is not writing primary to the stopped OSD.
>
> If I stop only OSD and MON keep running, or I stop only MON and OSD keep
> running everything looks OK.
>
> When I stop MON and OSD, I can see in log  osd.0 1300 heartbeat_check: no
> reply from ... as usual when OSD fails. During this are virtuals still
> running, but after that they all stop.
>
> What should I send you to debug this problem? Without fixing that, ceph is
> not reliable to me.
>
> Thank you
> With regards
> Jan Pekar
> Imatic
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171106/023554d3/attachment.html>


More information about the ceph-users mailing list