[ceph-users] MDS segfaults on client connection -- brand new FS

Gregory Farnum gfarnum at redhat.com
Fri Mar 8 17:58:39 PST 2019


I don’t have any idea what’s going on here or why it’s not working, but you
are using v0.94.7. That release is:
1) out of date for the Hammer cycle, which reached at least .94.10
2) prior to the release where we declared CephFS stable (Jewel, v10.2.0)
3) way past its supported expiration date.

You will have a much better time deploying Luminous or Mimic, especially
since you want to use CephFS. :)
-Greg

On Fri, Mar 8, 2019 at 5:02 PM Kadiyska, Yana <ykadiysk at akamai.com> wrote:

> Hi,
>
>
>
> I’m very much hoping someone can unblock me on this – we recently ran into
> a very odd issue – I sent an earlier email to the list
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033579.html
>
>
>
> After unsuccessfully trying to repair we decided to forsake the Filesystem
>
>
>
> I marked the cluster down, failed the MDSs, removed the FS and the
> metadata and data pools.
>
>
>
> Then created a new Filesystem from scratch.
>
>
>
> However, I am still observing MDS segfaulting when a client tries to
> connect. This is quite urgent for me as we don’t have a functioning
> Filesystem – if someone can advise how I can remove any and all state
> please do so – I just want to start fresh. I am very puzzled that a brand
> new FS doesn’t work
>
>
>
> Here is the MDS log at level 20 – one odd thing I notice is that the
> client seems to start showing ? as the id well before the segfault…In any
> case, I’m just asking what needs to be done to remove all state from the
> MDS nodes:
>
>
>
> 2019-03-08 19:30:12.024535 7f25ec184700 20 mds.0.server get_session have
> 0x5477e00 client.*2160819875* <client_ip>:0/945029522 state open
>
> 2019-03-08 19:30:12.024537 7f25ec184700 15 mds.0.server
> oldest_client_tid=1
>
> 2019-03-08 19:30:12.024564 7f25ec184700  7 mds.0.cache request_start
> request(client.?:1 cr=0x54a8680)
>
> 2019-03-08 19:30:12.024566 7f25ec184700  7 mds.0.server
> dispatch_client_request client_request(client.?:1 getattr pAsLsXsFs #1
> 2019-03-08 19:29:15.425510 RETRY=2) v2
>
> 2019-03-08 19:30:12.024576 7f25ec184700 10 mds.0.server
> rdlock_path_pin_ref request(client.?:1 cr=0x54a8680) #1
>
> 2019-03-08 19:30:12.024577 7f25ec184700  7 mds.0.cache traverse: opening
> base ino 1 snap head
>
> 2019-03-08 19:30:12.024579 7f25ec184700 10 mds.0.cache path_traverse
> finish on snapid head
>
> 2019-03-08 19:30:12.024580 7f25ec184700 10 mds.0.server ref is [inode 1
> [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1) (iversion lock) |
> dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024589 7f25ec184700 10 mds.0.locker acquire_locks
> request(client.?:1 cr=0x54a8680)
>
> 2019-03-08 19:30:12.024591 7f25ec184700 20 mds.0.locker  must rdlock
> (iauth sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0
> 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024594 7f25ec184700 20 mds.0.locker  must rdlock
> (ilink sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0
> 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024597 7f25ec184700 20 mds.0.locker  must rdlock
> (ifile sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0
> 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024600 7f25ec184700 20 mds.0.locker  must rdlock
> (ixattr sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0
> 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024602 7f25ec184700 20 mds.0.locker  must rdlock
> (isnap sync) [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0
> 1=0+1) (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024605 7f25ec184700 10 mds.0.locker  must authpin
> [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1)
> (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024607 7f25ec184700 10 mds.0.locker  auth_pinning
> [inode 1 [...2,head] / auth v1 snaprealm=0x53b8480 f() n(v0 1=0+1)
> (iversion lock) | request=1 dirfrag=1 0x53ca968]
>
> 2019-03-08 19:30:12.024610 7f25ec184700 10 mds.0.cache.ino(1) auth_pin by
> 0x51e5e00 on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480 f()
> n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 authpin=1 0x53ca968] now
> 1+0
>
> 2019-03-08 19:30:12.024614 7f25ec184700  7 mds.0.locker rdlock_start  on
> (isnap sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480
> f() n(v0 1=0+1) (iversion lock) | request=1 dirfrag=1 authpin=1 0x53ca968]
>
> 2019-03-08 19:30:12.024618 7f25ec184700 10 mds.0.locker  got rdlock on
> (isnap sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480
> f() n(v0 1=0+1) (isnap sync r=1) (iversion lock) | request=1 lock=1
> dirfrag=1 authpin=1 0x53ca968]
>
> 2019-03-08 19:30:12.024621 7f25ec184700  7 mds.0.locker rdlock_start  on
> (ifile sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480
> f() n(v0 1=0+1) (isnap sync r=1) (iversion lock) | request=1 lock=1
> dirfrag=1 authpin=1 0x53ca968]
>
> 2019-03-08 19:30:12.024625 7f25ec184700 10 mds.0.locker  got rdlock on
> (ifile sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480
> f() n(v0 1=0+1) (isnap sync r=1) (ifile sync r=1) (iversion lock) |
> request=1 lock=2 dirfrag=1 authpin=1 0x53ca968]
>
> 2019-03-08 19:30:12.024628 7f25ec184700  7 mds.0.locker rdlock_start  on
> (iauth sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480
> f() n(v0 1=0+1) (isnap sync r=1) (ifile sync r=1) (iversion lock) |
> request=1 lock=2 dirfrag=1 authpin=1 0x53ca968]
>
> 2019-03-08 19:30:12.024631 7f25ec184700 10 mds.0.locker  got rdlock on
> (iauth sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480
> f() n(v0 1=0+1) (iauth sync r=1) (isnap sync r=1) (ifile sync r=1)
> (iversion lock) | request=1 lock=3 dirfrag=1 authpin=1 0x53ca968]
>
> 2019-03-08 19:30:12.024635 7f25ec184700  7 mds.0.locker rdlock_start  on
> (ilink sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480
> f() n(v0 1=0+1) (iauth sync r=1) (isnap sync r=1) (ifile sync r=1)
> (iversion lock) | request=1 lock=3 dirfrag=1 authpin=1 0x53ca968]
>
> 2019-03-08 19:30:12.024638 7f25ec184700 10 mds.0.locker  got rdlock on
> (ilink sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480
> f() n(v0 1=0+1) (iauth sync r=1) (ilink sync r=1) (isnap sync r=1) (ifile
> sync r=1) (iversion lock) | request=1 lock=4 dirfrag=1 authpin=1 0x53ca968]
>
> 2019-03-08 19:30:12.024642 7f25ec184700  7 mds.0.locker rdlock_start  on
> (ixattr sync) on [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480
> f() n(v0 1=0+1) (iauth sync r=1) (ilink sync r=1) (isnap sync r=1) (ifile
> sync r=1) (iversion lock) | request=1 lock=4 dirfrag=1 authpin=1 0x53ca968]
>
> 2019-03-08 19:30:12.024646 7f25ec184700 10 mds.0.locker  got rdlock on
> (ixattr sync r=1) [inode 1 [...2,head] / auth v1 ap=1+0 snaprealm=0x53b8480
> f() n(v0 1=0+1) (iauth sync r=1) (ilink sync r=1) (isnap sync r=1) (ifile
> sync r=1) (ixattr sync r=1) (iversion lock) | request=1 lock=5 dirfrag=1
> authpin=1 0x53ca968]
>
> 2019-03-08 19:30:12.024658 7f25ec184700 10 mds.0.server reply to stat on
> client_request(client.?:1 getattr pAsLsXsFs #1 2019-03-08 19:29:15.425510
> RETRY=2) v2
>
> 2019-03-08 19:30:12.024661 7f25ec184700 10 mds.0.server
> reply_client_request 0 ((0) Success) client_request(client.?:1 getattr
> pAsLsXsFs #1 2019-03-08 19:29:15.425510 RETRY=2) v2
>
> 2019-03-08 19:30:12.024673 7f25ec184700 10 mds.0.server
> apply_allocated_inos 0 / [] / 0
>
> 2019-03-08 19:30:12.024674 7f25ec184700 20 mds.0.server lat 0.060895
>
> 2019-03-08 19:30:12.024677 7f25ec184700 20 mds.0.server set_trace_dist
> snapid head
>
> 2019-03-08 19:30:12.024679 7f25ec184700 10 mds.0.server set_trace_dist
> snaprealm snaprealm(1 seq 1 lc 0 cr 0 cps 1 snaps={} 0x53b8480) len=48
>
> 2019-03-08 19:30:12.024683 7f25ec184700 20 mds.0.cache.ino(1)  pfile 0
> pauth 0 plink 0 pxattr 0 plocal 0 ctime 2019-03-07 21:12:21.476328 valid=1
>
> 2019-03-08 19:30:12.024688 7f25ec184700 10 mds.0.cache.ino(1)
> add_client_cap first cap, joining realm snaprealm(1 seq 1 lc 0 cr 0 cps 1
> snaps={} 0x53b8480)
>
> 2019-03-08 19:30:12.026741 7f25ec184700 -1 *** Caught signal (Segmentation
> fault) **
>
>  in thread 7f25ec184700
>
>
>
>  ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>
>  1: ceph_mds() [0x89982a]
>
>  2: (()+0x10350) [0x7f25f4647350]
>
>  3: (CInode::get_caps_allowed_for_client(client_t) const+0x130) [0x7a19f0]
>
>  4: (CInode::encode_inodestat(ceph::buffer::list&, Session*, SnapRealm*,
> snapid_t, unsigned int, int)+0x132d) [0x7b383d]
>
>  5: (Server::set_trace_dist(Session*, MClientReply*, CInode*, CDentry*,
> snapid_t, int, std::tr1::shared_ptr<MDRequestImpl>&)+0x471) [0x5f26e1]
>
>  6: (Server::reply_client_request(std::tr1::shared_ptr<MDRequestImpl>&,
> MClientReply*)+0x846) [0x611056]
>
>  7: (Server::respond_to_request(std::tr1::shared_ptr<MDRequestImpl>&,
> int)+0x4d9) [0x611759]
>
>  8: (Server::handle_client_getattr(std::tr1::shared_ptr<MDRequestImpl>&,
> bool)+0x47b) [0x613eab]
>
>  9:
> (Server::dispatch_client_request(std::tr1::shared_ptr<MDRequestImpl>&)+0xa38)
> [0x633da8]
>
>  10: (Server::handle_client_request(MClientRequest*)+0x3df) [0x63435f]
>
>  11: (Server::dispatch(Message*)+0x3f3) [0x63b8b3]
>
>  12: (MDS::handle_deferrable_message(Message*)+0x847) [0x5b6c27]
>
>  13: (MDS::_dispatch(Message*)+0x6d) [0x5d2bed]
>
>  14: (C_MDS_RetryMessage::finish(int)+0x1b) [0x63d24b]
>
>  15: (MDSInternalContextBase::complete(int)+0x163) [0x7e3363]
>
>  16: (MDS::_advance_queues()+0x48d) [0x5c9e4d]
>
>  17: (MDS::ProgressThread::entry()+0x4a) [0x5ca1aa]
>
>  18: (()+0x8192) [0x7f25f463f192]
>
>  19: (clone()+0x6d) [0x7f25f3b4c26d]
>
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20190308/091c5a59/attachment.html>


More information about the ceph-users mailing list