[ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

Burkhard Linke Burkhard.Linke at computational.bio.uni-giessen.de
Fri Oct 12 05:03:06 PDT 2018


Hi,


On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
> I rebooted a Ceph host and logged `ceph status` & `ceph health detail`
> every 5 seconds. During this I encountered 'PG_AVAILABILITY Reduced data
> availability: pgs peering'. At the same time some VMs hung as described
> before.

Just a wild guess... you have 71 OSDs and about 4500 PG with size=3. 
13500 PG instance overall, resulting in ~190 PGs per OSD under normal 
circumstances.

If one host is down and the PGs have to re-peer, you might reach the 
limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.

You can try to raise this limit. There are several threads on the 
mailing list about this.

Regards,
Burkhard

-- 
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810



More information about the ceph-users mailing list