[ceph-users] OSD failure test freezes the cluster
b.s.mikhael at gmail.com
Tue Nov 21 00:34:16 PST 2017
I was performing an OSD failure test on a 3 node Ceph cluster configured as follows:
Ceph version 12.2.1
3 x MDS (one per node)
3 x MON (one per node)
3 x MGR (one per node)
15 x OSDs (5 per node)
Ceph filesystem was mounted on a different node using Ceph kernel block device driver.
I failed one of the OSD drives by removing it from the SCSI bus, while consistently writing 1MB files to the cluster.
Once the drive was failed, the write process stopped, ceph -s command timeout, the three MON, MDS and MGR daemons status shows that they received a sigkill.
Any clue on what’s going on on the cluster?
More information about the ceph-users