[ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

Konstantin Shalygin k0ste at k0ste.ru
Fri Oct 19 00:50:27 PDT 2018


> since some time we experience service outages in our Ceph cluster
> whenever there is any change to the HEALTH status. E. g. swapping
> storage devices, adding storage devices, rebooting Ceph hosts, during
> backfills ect.
>
> Just now I had a recent situation, where several VMs hung after I
> rebooted one Ceph host. We have 3 replications for each PG, 3 mon, 3
> mgr, 3 mds and 71 osds spread over 9 hosts.
>
> We use Ceph as a storage backend for our Proxmox VE (PVE) environment.
> The outages are in the form of blocked virtual file systems of those
> virtual machines running in our PVE cluster.
>
> It feels similar to stuck and inactive PGs to me. Honestly though I'm
> not really sure on how to debug this problem or which log files to examine.
>
> OS: Debian 9
> Kernel: 4.12 based upon SLE15-SP1
>
> # ceph version
> ceph version 12.2.8-133-gded2f6836f
> (ded2f6836f6331a58f5c817fca7bfcd6c58795aa) luminous (stable)
>
> Can someone guide me? I'm more than happy to provide more information
> as needed.


What is your network?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181019/6ccf5ee7/attachment.html>


More information about the ceph-users mailing list