[ceph-users] Sudden omap growth on some OSDs

David Turner drakonstein at gmail.com
Wed Dec 6 14:34:50 PST 2017


I have no proof or anything other than a hunch, but OSDs don't trim omaps
unless all PGs are healthy.  If this PG is actually not healthy, but the
cluster doesn't realize it while these 11 involved OSDs do realize that the
PG is unhealthy... You would see this exact problem.  The OSDs think a PG
is unhealthy so they aren't trimming their omaps while the cluster doesn't
seem to be aware of it and everything else is trimming their omaps properly.

I don't know what to do about it, but I hope it helps get you (or someone
else on the ML) towards a resolution.

On Wed, Dec 6, 2017 at 1:59 PM <george.vasilakakos at stfc.ac.uk> wrote:

> Hi ceph-users,
>
> We have a Ceph cluster (running Kraken) that is exhibiting some odd
> behaviour.
> A couple weeks ago, the LevelDBs on some our OSDs started growing large
> (now at around 20G size).
>
> The one thing they have in common is the 11 disks with inflating LevelDBs
> are all in the set for one PG in one of our pools (EC 8+3). This pool
> started to see use around the time the LevelDBs started inflating.
> Compactions are running and they do go down in size a bit but the overall
> trend is one of rapid growth. The other 2000+ OSDs in the cluster have
> LevelDBs between 650M and 1.2G.
> This PG has nothing to separate it from the others in its pool, within 5%
> of average number of objects per PG, no hot-spotting in terms of load, no
> weird states reported by ceph status.
>
> The one odd thing about it is the pg query output mentions it is
> active+clean, but it has a recovery state, which it enters every morning
> between 9 and 10am, where it mentions a "might_have_unfound" situation and
> having probed all other set members. A deep scrub of the PG didn't turn up
> anything.
>
> The cluster is now starting to manifest slow requests on the OSDs with the
> large LevelDBs, although not in the particular PG.
>
> What can I do to diagnose and resolve this?
>
> Thanks,
>
> George
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171206/1c7127dc/attachment.html>


More information about the ceph-users mailing list