[ceph-users] Libvirt hosts freeze after ceph osd+mon problem
Jan Pekař - Imatic
jan.pekar at imatic.cz
Tue Nov 7 02:17:31 PST 2017
I'm calling kill -STOP to simulate behavior, that occurred, when on one
ceph node i was out of memory. Processes was not killed, but were
somehow suspended/unresponsible (they couldn't create new threads etc),
and that caused all virtuals (on other nodes) to hung.
I decided to simulate it with kill -STOP MONPID OSDPID and I succeeded.
After I stop MON with OSD, it took few seconds to get osd unresponsive
messages, and exactly when I get final
libceph: osd6 down
all my virtuals stops responding (stop pinging, unable to use VNC etc)
Tried with librdb disk definition or rbd map device attached inside
On 7.11.2017 10:57, Piotr Dałek wrote:
> On 17-11-07 12:02 AM, Jan Pekař - Imatic wrote:
>> I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu
>> I'm running 3 nodes with 3 monitors and 8 osds on my nodes, all on IPV6.
>> When I tested the cluster, I detected strange and severe problem.
>> On first node I'm running qemu hosts with librados disk connection to
>> the cluster and all 3 monitors mentioned in connection.
>> On second node I stopped mon and osd with command
>> kill -STOP MONPID OSDPID
>> Within one minute all my qemu hosts on first node freeze, so they even
>> don't respond to ping. [..]
> Why would you want to *stop* (as in, freeze) a process instead of
> killing it?
> Anyway, with processes still there, it may take a few minutes before
> cluster realizes that daemons are stopped and kicks it out of cluster,
> restoring normal behavior (assuming correctly set crush rules).
Ing. Jan Pekař
jan.pekar at imatic.cz | +420603811737
Imatic | Jagellonská 14 | Praha 3 | 130 00
More information about the ceph-users