[ceph-users] CephFs CDir fnode version far less then subdir inode version causes mds can't start correctly

Yan, Zheng ukernel at gmail.com
Sun Nov 25 18:08:05 PST 2018


Did you do any special operation (such as reset journal) before this happened

On Sat, Nov 24, 2018 at 3:58 PM 关云飞 <gyfelectric at gmail.com> wrote:
>
> hi,
>
>    According to my understanding, the parent directory CDir fnode
> version should be incremented if creating file or directory operation
> happened. But there was a strange problem in my environment parent
> fnode version is far less then subdirectories inode version which
> leading a assert fail and the mds can’t work now.
>
>    Does anyone know how shuod i do can recover my environment?
> Greatly appreciated if there has any help!
>
> ceph version: jewel 10.2.2
> mds mode: single mds
>
> mds.log
> 2018-11-07 17:44:49.892205 7f4015824700 15 mds.0.journal try_to_expire
> committing [dir 1 / [2,head] auth v=94977 cv=0/0 dir_auth=0
> state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401 3=0+3)
> n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
> 171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
> b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
> subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x7f402a7aa000]
> 2018-11-07 17:44:49.892249 7f4015824700 10 mds.0.cache.dir(1) commit
> want 0 on [dir 1 / [2,head] auth v=94977 cv=0/0 dir_auth=0
> state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401 3=0+3)
> n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
> 171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
> b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
> subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x7f402a7aa000]
> 2018-11-07 17:44:49.892280 7f4015824700 10 mds.0.cache.dir(1) auth_pin
> by 0x7f402a7aa000 on [dir 1 / [2,head] auth v=94977 cv=0/0 dir_auth=0
> ap=1+0+0 state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401
> 3=0+3) n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
> 171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
> b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
> subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x7f402a7aa000]
> count now 1 + 0
> 2018-11-07 17:44:49.892312 7f4015824700 10 mds.0.cache.dir(1) _commit
> want 94977 on [dir 1 / [2,head] auth v=94977 cv=0/0 dir_auth=0
> ap=1+0+0 state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401
> 3=0+3) n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
> 171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
> b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
> subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x7f402a7aa000]
> 2018-11-07 17:44:49.892349 7f4015824700 10 mds.0.cache.dir(1) marking committing
> 2018-11-07 17:44:49.892355 7f4015824700 10 mds.0.cache.dir(1) _omap_commit
> 2018-11-07 17:44:49.892610 7f4015824700 10 mds.0.cache.dir(1) set
> bioflow [dentry #1/bioflow [2,head] auth (dversion lock) v=94976
> inode=0x7f402a798c40 | lock=0 inodepin=1 dirty=1 authpin=0
> 0x7f402a7c4000]
> 2018-11-07 17:44:49.892632 7f4015824700 14 mds.0.cache.dir(1) pos 8 dn
> 'bioflow' inode [inode 10000000001 [...2,head] /bioflow/ auth v94976
> f(v0 m2018-10-17 14:48:45.785088 4=0+4) n(v6615 rc2018-11-07
> 17:44:20.552395 b3212972601995 169081=168070+1011) (inest lock)
> (iversion lock) caps={1716375=pAsLsXsFs/- at 1,1721730=pAsLsXsFs/- at 3} |
> dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=1 dirtyrstat=0
> dirty=1 authpin=0 0x7f402a798c40 mode: 16877]
> 2018-11-07 17:44:49.892924 7f4015824700 10 mds.0.cache.dir(1) rm test
> [dentry #1/test [2,head] auth NULL (dversion lock) v=94948 inode=0 |
> inodepin=0 dirty=1 0x7f402b156bc0]
> 2018-11-07 17:44:49.893070 7f4015824700 1 -- 192.168.1.7:6807/26144
> --> 192.168.1.6:6801/31980 -- osd_op(mds.0.63911:110775 8.6b2cdaff
> 1.00000000 [tmap2omap 1~0,omap-set-header 0~258,omap-set-vals
> 0~458,omap-rm-keys 0~17] snapc 0=[]
> ondisk+write+known_if_redirected+full_force e6778) v7 -- ?+0
> 0x7f40304ef8c0 con 0x7f402d350600
>
> 2018-11-07 17:44:49.893134 7f4015824700 15 mds.0.journal try_to_expire
> committing [dir 10000000001 /bioflow/ [2,head] auth v=94464 cv=0/0
> state=1610612736 f(v0 m2018-10-17 14:48:45.785088 4=0+4) n(v6615
> rc2018-11-07 17:44:20.552395 b3212972601995 169080=168070+1010)
> hs=4+0,ss=0+0 dirty=2 | child=1 dirty=1 authpin=0 0x7f402a7aa600]
> //expire next logsegment
>
> 2018-11-07 17:44:50.045914 7f4013820700 10 MDSIOContextBase::complete:
> 18C_IO_Dir_Committed
> 2018-11-07 17:44:50.045917 7f4013820700 10 mds.0.cache.dir(1)
> _committed v 94977 on [dir 1 / [2,head] auth v=94977 cv=94977/0
> dir_auth=0 ap=1+0+1 state=1610612802|complete f(v0 m2018-10-17
> 16:39:14.350401 3=0+3) n(v7782 rc2018-11-07 17:44:20.552395
> b3272234942693 171405=170108+1297)/n(v7782 rc2018-10-17
> 16:39:14.350401 b3272234942693 171394=170108+1286) hs=200+2,ss=0+0
> dirty=2 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1
> 0x7f402a7aa000]
> 2018-11-07 17:44:50.045936 7f4013820700 10 mds.0.cache.dir(1)
> mark_clean [dir 1 / [2,head] auth v=94977 cv=94977/94977 dir_auth=0
> ap=1+0+1 state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401
> 3=0+3) n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
> 171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
> b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
> subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x7f402a7aa000]
> version 94977
> 2018-11-07 17:44:50.045955 7f4013820700 15 mds.0.cache.dir(1) dir
> 94977 < inode 1755186 still dirty [inode 10000003557 [...2,head]
> /tree112/ auth v1755186 f(v0 m2018-09-11 17:56:17.483220 1=1+0) n(v0
> rc2018-09-11 17:56:17.483220 2=1+1) (inest lock) (iversion lock)
> caps={1716375=pAsLsXsFs/- at 1},l=1716375 | caps=1 0x7f402d9aa820 mode:
> 16877]
> 2018-11-07 17:44:50.047480 7f4013820700 -1 mds/CDir.cc: In function
> 'void CDir::_committed(int, version_t)' thread 7f4013820700 time
> 2018-11-07 17:44:50.045990
> mds/CDir.cc: 2292: FAILED assert(in->is_dirty() || in->last < ((__u64)(-2)))
>
> ceph version 10.2.2-Summit3.0-beta2-43-g9c6f44d
> (9c6f44d72d16134cc2504f76d6ee48517d046efd)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x85) [0x7f401e56d0d5]
> 2: (CDir::_committed(int, unsigned long)+0xec2) [0x7f401e34baf2]
> 3: (MDSIOContextBase::complete(int)+0xa4) [0x7f401e3bf754]
> 4: (Finisher::finisher_thread_entry()+0x216) [0x7f401e49c9e6]
> 5: (()+0x7dc5) [0x7f401d340dc5]
> 6: (clone()+0x6d) [0x7f401be0d28d]


More information about the ceph-users mailing list