[ceph-users] OSD log being spammed with BlueStore stupidallocator dump

David Turner drakonstein at gmail.com
Wed Oct 10 18:12:43 PDT 2018


Not a resolution, but an idea that you've probably thought of.  Disabling
logging on any affected OSDs (possibly just all of them) seems like a
needed step to be able to keep working with this cluster to finish the
upgrade and get it healthier.

On Wed, Oct 10, 2018 at 6:37 PM Wido den Hollander <wido at 42on.com> wrote:

>
>
> On 10/11/2018 12:08 AM, Wido den Hollander wrote:
> > Hi,
> >
> > On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm
> > seeing OSDs writing heavily to their logfiles spitting out these lines:
> >
> >
> > 2018-10-10 21:52:04.019037 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd2078000~34000
> > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd22cc000~24000
> > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd2300000~20000
> > 2018-10-10 21:52:04.019039 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd2324000~24000
> > 2018-10-10 21:52:04.019040 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd26c0000~24000
> > 2018-10-10 21:52:04.019041 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd2704000~30000
> >
> > It goes so fast that the OS-disk in this case can't keep up and become
> > 100% util.
> >
> > This causes the OSD to slow down and cause slow requests and starts to
> flap.
> >
> > It seems that this is *only* happening on OSDs which are the fullest
> > (~85%) on this cluster and they have about ~400 PGs each (Yes, I know,
> > that's high).
> >
>
> After some searching I stumbled upon this Bugzilla report:
> https://bugzilla.redhat.com/show_bug.cgi?id=1600138
>
> That seems to be the same issue, although I'm not 100% sure.
>
> Wido
>
> > Looking at StupidAllocator.cc I see this piece of code:
> >
> > void StupidAllocator::dump()
> > {
> >   std::lock_guard<std::mutex> l(lock);
> >   for (unsigned bin = 0; bin < free.size(); ++bin) {
> >     ldout(cct, 0) << __func__ << " free bin " << bin << ": "
> >                   << free[bin].num_intervals() << " extents" << dendl;
> >     for (auto p = free[bin].begin();
> >          p != free[bin].end();
> >          ++p) {
> >       ldout(cct, 0) << __func__ << "  0x" << std::hex << p.get_start()
> > << "~"
> >                     << p.get_len() << std::dec << dendl;
> >     }
> >   }
> > }
> >
> > I'm just wondering why it would spit out these lines and what's causing
> it.
> >
> > Has anybody seen this before?
> >
> > Wido
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users at lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181010/e8212eb4/attachment.html>


More information about the ceph-users mailing list