[ceph-users] rbd-nbd timeout and crash

David Turner drakonstein at gmail.com
Wed Dec 6 14:58:54 PST 2017


Do you have the FS mounted with a trimming ability?  What are your mount
options?

On Wed, Dec 6, 2017 at 5:30 PM Jan Pekař - Imatic <jan.pekar at imatic.cz>
wrote:

> Hi,
>
> On 6.12.2017 15:24, Jason Dillaman wrote:
> > On Wed, Dec 6, 2017 at 3:46 AM, Jan Pekař - Imatic <jan.pekar at imatic.cz>
> wrote:
> >> Hi,
> >> I run to overloaded cluster (deep-scrub running) for few seconds and
> rbd-nbd
> >> client timeouted, and device become unavailable.
> >>
> >> block nbd0: Connection timed out
> >> block nbd0: shutting down sockets
> >> block nbd0: Connection timed out
> >> print_req_error: I/O error, dev nbd0, sector 2131833856
> >> print_req_error: I/O error, dev nbd0, sector 2131834112
> >>
> >> Is there any way how to extend rbd-nbd timeout?
> >
> > Support for changing the default timeout of 30 seconds is supported by
> > the kernel [1], but it's not currently implemented in rbd-nbd.  I
> > opened a new feature ticket for adding this option [2] but it may be
> > more constructive to figure out how to address a >30 second IO stall
> > on your cluster during deep-scrub.
>
> Kernel client is not supporting new image features, so I decided to use
> rbd-nbd.
> Now I tried to rm 300GB folder, which is mounted with rbd-nbd from COW
> snapshot on my healthy and almost idle cluster with only 1 deep-scrub
> running and I also hit 30s timeout and device disconnect. I'm mapping it
> from virtual server so there can be some performance issue but I'm not
> hunting performance, but stability.
>
> Thank you
> With regards
> Jan Pekar
>
> >
> >> Also getting pammed devices failed -
> >>
> >> rbd-nbd list-mapped
> >>
> >> /build/ceph-12.2.2/src/tools/rbd_nbd/rbd-nbd.cc: In function 'int
> >> get_mapped_info(int, Config*)' thread 7f069d41ec40 time 2017-12-06
> >> 09:40:33.541426
> >> /build/ceph-12.2.2/src/tools/rbd_nbd/rbd-nbd.cc: 841: FAILED
> >> assert(ifs.is_open())
> >>   ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba)
> luminous
> >> (stable)
> >>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >> const*)+0x102) [0x7f0693f567c2]
> >>   2: (()+0x14165) [0x559a8783d165]
> >>   3: (main()+0x9) [0x559a87838e59]
> >>   4: (__libc_start_main()+0xf1) [0x7f0691178561]
> >>   5: (()+0xff80) [0x559a87838f80]
> >>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to
> >> interpret this.
> >> Aborted
> >
> > It's been fixed in the master branch and is awaiting backport to
> > Luminous [1] -- I'd expect it to be available in v12.2.3.
> >
> >>
> >> Thank you
> >> With regards
> >> Jan Pekar
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users at lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > [1]
> https://github.com/torvalds/linux/blob/master/drivers/block/nbd.c#L1166
> > [2] http://tracker.ceph.com/issues/22333
> > [3] http://tracker.ceph.com/issues/22185
> >
> >
>
> --
> ============
> Ing. Jan Pekař
> jan.pekar at imatic.cz | +420603811737 <+420%20603%20811%20737>
> ----
> Imatic | Jagellonská 14 | Praha 3 | 130 00
> http://www.imatic.cz
> ============
> --
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171206/d2c99d2a/attachment.html>


More information about the ceph-users mailing list