[ceph-users] Hangs with qemu/libvirt/rbd when one host disappears

Brad Hubbard bhubbard at redhat.com
Tue Dec 5 18:28:17 PST 2017

On Wed, Dec 6, 2017 at 4:09 AM, Marcus Priesch <marcus at priesch.co.at> wrote:
> Dear Ceph Users,
> first of all, big thanks to all the devs and people who made all this
> possible, ceph is amazing !!!
> ok, so let me get to the point where i need your help:
> i have a cluster of 6 hosts, mixed with ssd's and hdd's.
> on 4 of the 6 hosts are 21 vm's running in total with less to no
> workload (web, mail, elasticsearch) for a couple of users.
> 4 nodes are running ubuntu server and 2 of them are running proxmox
> (because we are now in the process of migrating towards proxmox).
> i am running ceph luminous (have upgraded two weeks ago)
> ceph communication is carried out on a seperate 1Gbit Network where we
> plan to upgrade to bonded 2x10Gbit during the next couple of weeks.
> i have two pools defined where i only use disk images via libvirt/rbd.
> the hdd pool has two replicas and is for large (~4TB) backup images and
> the ssd pool has three replicas (two on ssd osd's and one on hdd osd's)
> for improved fail safety and faster access for "live data" and OS
> images.
> in the crush map i have two different rules for the two pools so that
> replicas always are stored on different hosts - i have verified this and
> it works. it is coded via the "host" attribute (host node1-hdd and host
> node1 are both actually on the same host)
> so, now comes the interesting part:
> when i turn off one of the hosts (lets say node7) that do only ceph,
> after some time the vm's stall and hang until the host comes up again.
> when i dont turn on the host again, after some time the cluster starts
> rebalancing ...
> yesterday i experienced that after a couple of hours of rebalancing the
> vm's continue working again - i think thats when the cluster has
> finished rebalancing ? havent really digged into this.
> well, today we turned off the same host (node7) again and i got stuck
> pg's again.
> this time i did some investigation and to my surprise i found the
> following in the output of ceph health detail:
> REQUEST_SLOW 17 slow requests are blocked > 32 sec
>     3 ops are blocked > 2097.15 sec
>     14 ops are blocked > 1048.58 sec
>     osds 9,10 have blocked requests > 1048.58 sec
>     osd.5 has blocked requests > 2097.15 sec
> i think the blocked requests are my problem, do they ?
> but neither osd's 9, 10 or 5 are located on host7 - so can anyone of you
> tell me why the requests to this nodes got stuck ?
> i have one pg in state "stuck unclean" which has its replicas on osd's
> 2, 3 and 15. 3 is on node7, but the first in the active set is 2 - i
> thought the "write op" should have gone there ... so why unclean ? the
> manual states "For stuck unclean placement groups, there is usually
> something preventing recovery from completing, like unfound objects" but
> there arent ...
> do i have a configuration issue here (amount of replicas?) or is this
> behavior simply just because my cluster network is too slow ?
> you can find detailed outputs here :
>         https://owncloud.priesch.co.at/index.php/s/toYdGekchqpbydY
> i hope any of you can help me shed any light on this ...
> at least the point of all is that a single host should be allowed to
> fail and the vm's continue running ... ;)

You don't really have six MONs do you (although I know the answer to
this question)? I think you need to take another look at some of the
docs about monitors.

> regards and thanks in advance,
> marcus.
> --
> Marcus Priesch
> open source consultant - solution provider
> www.priesch.co.at / office at priesch.co.at
> A-2122 Riedenthal, In Prandnern 31 / +43 650 62 72 870
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


More information about the ceph-users mailing list