[ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change
Nils Fahldieck - Profihost AG
n.fahldieck at profihost.ag
Fri Oct 12 05:35:03 PDT 2018
Hi, in our `ceph.conf` we have:
mon_max_pg_per_osd = 300
While the host is offline (9 OSDs down):
4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD
If all OSDs are online:
4352 PGs * 3 / 71 OSDs ~ 183 PGs per OSD
... so this doesn't seem to be the issue.
If I understood you right, that's what you've meant. If I got you wrong,
would you mind to point to one of those threads you mentioned?
Am 12.10.2018 um 14:03 schrieb Burkhard Linke:
> On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
>> I rebooted a Ceph host and logged `ceph status` & `ceph health detail`
>> every 5 seconds. During this I encountered 'PG_AVAILABILITY Reduced data
>> availability: pgs peering'. At the same time some VMs hung as described
> Just a wild guess... you have 71 OSDs and about 4500 PG with size=3.
> 13500 PG instance overall, resulting in ~190 PGs per OSD under normal
> If one host is down and the PGs have to re-peer, you might reach the
> limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.
> You can try to raise this limit. There are several threads on the
> mailing list about this.
More information about the ceph-users