[ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

Burkhard Linke Burkhard.Linke at computational.bio.uni-giessen.de
Fri Oct 12 05:03:06 PDT 2018


On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
> I rebooted a Ceph host and logged `ceph status` & `ceph health detail`
> every 5 seconds. During this I encountered 'PG_AVAILABILITY Reduced data
> availability: pgs peering'. At the same time some VMs hung as described
> before.

Just a wild guess... you have 71 OSDs and about 4500 PG with size=3. 
13500 PG instance overall, resulting in ~190 PGs per OSD under normal 

If one host is down and the PGs have to re-peer, you might reach the 
limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.

You can try to raise this limit. There are several threads on the 
mailing list about this.


Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810

More information about the ceph-users mailing list