[ceph-users] Cluster Down from reweight-by-utilization

Sage Weil sage at newdream.net
Sat Nov 4 21:54:37 PDT 2017

On Sat, 4 Nov 2017, Kevin Hrpcek wrote:
> Hey Sage,
> Thanks for getting back to me this late on a weekend.
> Do you now why the OSDs were going down?  Are there any crash dumps in the 
> osd logs, or is the OOM killer getting them?
> That's a part I can't nail down yet. OSDs didn't crash, after the reweight-by-utilization OSDs on some of our earlier gen
> servers started spinning 100% cpu and were overwhelmed. Admittedly these early gen osd servers are undersized on cpu which is
> probably why they got overwhelmed, but it hasn't escalated like this before. Heartbeats among the cluster's OSDs started
> failing on those OSDs first and then the osd 100% cpu  problem seemed to snowball to all hosts. I'm still trying to figure out
> why the relatively small reweighting caused this problem.
> The usual strategy here is to set 'noup' and get all of the OSDs to catch 
> up on osdmaps (you can check progress via the above status command).  Once 
> they are all caught up, unset noup and let them all peer at once.
> I tried having noup set for a few hours earlier to see if stopping the moving osdmap target would help but I eventually unset
> it while doing more troubleshooting. I'll set it again and let it go overnight. Patience is probably needed with a cluster this
> size. I saw this similar situation and was trying your previous solution
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-May/040030.html
> The problem that has come up here in the past is when the cluster has been 
> unhealthy for a long time and the past intervals use too much memory.  I 
> don't see anything in your description about memory usage, though.  If 
> that does rear its head there's a patch we can apply to kraken to work 
> around it (this is fixed in luminous).
> Memory usage doesn't seem too bad, a little tight on some of those early gen servers, but I haven't seen OOM killing things off
> yet. I think I saw mention of that patch and luminous handling this type of situation better while googling the issue...larger
> osdmap increments or something similar if i recall correctly. My cluster is a few weeks away from a luminous upgrade.

That's good.  You mgiht also try setting nobackfill and norecover just to 
keep the load off the cluster while it's peering.


More information about the ceph-users mailing list