[ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

Yan, Zheng ukernel at gmail.com
Tue Nov 20 16:58:42 PST 2018


you can run 13.2.1 mds on another machine. kill all client sessions
and wait until purge queue is empty.  then it's safe to run 13.2.2
mds.

run command "cephfs-journal-tool --rank=cephfs_name:rank
--journal=purge_queue header get"

purge queue is  empty when write_pos == expire_pos
On Wed, Nov 21, 2018 at 8:49 AM Chris Martin <span> wrote:
>
> I am also having this problem. Zheng (or anyone else), any idea how to
> perform this downgrade on a node that is also a monitor and an OSD
> node?
>
> dpkg complains of a dependency conflict when I try to install
> ceph-mds_13.2.1-1xenial_amd64.deb:
>
> ```
> dpkg: dependency problems prevent configuration of ceph-mds:
>  ceph-mds depends on ceph-base (= 13.2.1-1xenial); however:
>   Version of ceph-base on system is 13.2.2-1xenial.
> ```
>
> I don't think I want to downgrade ceph-base to 13.2.1.
>
> Thank you,
> Chris Martin
>
> > Sorry. this is caused wrong backport. downgrading mds to 13.2.1 and
> > marking mds repaird can resolve this.
> >
> > Yan, Zheng
> > On Sat, Oct 6, 2018 at 8:26 AM Sergey Malinin <span> wrote:
> > >
> > > Update:
> > > I discovered http://tracker.ceph.com/issues/24236 and
https://github.com/ceph/ceph/pull/22146
> > > Make sure that it is not relevant in your case before
proceeding to operations that modify on-disk data.
> > >
> > >
> > > On 6.10.2018, at 03:17, Sergey Malinin <span> wrote:
> > >
> > > I ended up rescanning the entire fs using alternate
metadata pool approach as in
http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/
> > > The process has not competed yet because during the
recovery our cluster encountered another problem with OSDs that I got
fixed yesterday (thanks to Igor Fedotov @ SUSE).
> > > The first stage (scan_extents) completed in 84 hours
(120M objects in data pool on 8 hdd OSDs on 4 hosts). The second
(scan_inodes) was interrupted by OSDs failure so I have no timing
stats but it seems to be runing 2-3 times faster than extents scan.
> > > As to root cause -- in my case I recall that during
upgrade I had forgotten to restart 3 OSDs, one of which was holding
metadata pool contents, before restarting MDS daemons and that seemed
to had an impact on MDS journal corruption, because when I restarted
those OSDs, MDS was able to start up but soon failed throwing lots of
'loaded dup inode' errors.
> > >
> > >
> > > On 6.10.2018, at 00:41, Alfredo Daniel Rezinovsky <span> wrote:
> > >
> > > Same problem...
> > >
> > > # cephfs-journal-tool --journal=purge_queue journal inspect
> > > 2018-10-05 18:37:10.704 7f01f60a9bc0 -1 Missing object
500.0000016c
> > > Overall journal integrity: DAMAGED
> > > Objects missing:
> > >   0x16c
> > > Corrupt regions:
> > >   0x5b000000-ffffffffffffffff
> > >
> > > Just after upgrade to 13.2.2
> > >
> > > Did you fixed it?
> > >
> > >
> > > On 26/09/18 13:05, Sergey Malinin wrote:
> > >
> > > Hello,
> > > Followed standard upgrade procedure to upgrade from
13.2.1 to 13.2.2.
> > > After upgrade MDS cluster is down, mds rank 0 and
purge_queue journal are damaged. Resetting purge_queue does not seem
to work well as journal still appears to be damaged.
> > > Can anybody help?
> > >
> > > mds log:
> > >
> > >   -789> 2018-09-26 18:42:32.527 7f70f78b1700  1
mds.mds2 Updating MDS map to version 586 from mon.2
> > >   -788> 2018-09-26 18:42:32.527 7f70f78b1700  1
mds.0.583 handle_mds_map i am now mds.0.583
> > >   -787> 2018-09-26 18:42:32.527 7f70f78b1700  1
mds.0.583 handle_mds_map state change up:rejoin --> up:active
> > >   -786> 2018-09-26 18:42:32.527 7f70f78b1700  1
mds.0.583 recovery_done -- successful recovery!
> > > <span>
> > >    -38> 2018-09-26 18:42:32.707 7f70f28a7700 -1
mds.0.purge_queue _consume: Decode error at read_pos=0x322ec6636
> > >    -37> 2018-09-26 18:42:32.707 7f70f28a7700  5
mds.beacon.mds2 set_want_state: up:active -> down:damaged
> > >    -36> 2018-09-26 18:42:32.707 7f70f28a7700  5
mds.beacon.mds2 _send down:damaged seq 137
> > >    -35> 2018-09-26 18:42:32.707 7f70f28a7700 10
monclient: _send_mon_message to mon.ceph3 at mon:6789/0
> > >    -34> 2018-09-26 18:42:32.707 7f70f28a7700  1 --
mds:6800/e4cc09cf --> mon:6789/0 -- mdsbeacon(14c72/mds2
down:damaged seq 137 v24a) v7 -- 0x563b321ad480 con 0
> > > <span>
> > >     -3> 2018-09-26 18:42:32.743 7f70f98b5700  5 --
mds:6800/3838577103 >> mon:6789/0 conn(0x563b3213e000 :-1
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8 cs=1 l=1). rx
mon.2 seq 29 0x563b321ab880 mdsbeaco
> > > n(85106/mds2 down:damaged seq 311 v587) v7
> > >     -2> 2018-09-26 18:42:32.743 7f70f98b5700  1 --
mds:6800/3838577103 <== mon.2 mon:6789/0 29 ====
mdsbeacon(85106/mds2 down:damaged seq 311 v587) v7 ==== 129+0+0
(3296573291 0 0) 0x563b321ab880 con 0x563b3213e
> > > 000
> > >     -1> 2018-09-26 18:42:32.743 7f70f98b5700  5
mds.beacon.mds2 handle_mds_beacon down:damaged seq 311 rtt 0.038261
> > >      0> 2018-09-26 18:42:32.743 7f70f28a7700  1
mds.mds2 respawn!
> > >
> > > # cephfs-journal-tool --journal=purge_queue journal inspect
> > > Overall journal integrity: DAMAGED
> > > Corrupt regions:
> > >   0x322ec65d9-ffffffffffffffff
> > >
> > > # cephfs-journal-tool --journal=purge_queue journal reset
> > > old journal was 13470819801~8463
> > > new journal start will be 13472104448 (1276184 bytes
past old end)
> > > writing journal head
> > > done
> > >
> > > # cephfs-journal-tool --journal=purge_queue journal inspect
> > > 2018-09-26 19:00:52.848 7f3f9fa50bc0 -1 Missing object
500.00000c8c
> > > Overall journal integrity: DAMAGED
> > > Objects missing:
> > >   0xc8c
> > > Corrupt regions:
> > >   0x323000000-ffffffffffffffff
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users at lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users at lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com</span></span></span></span></span></span>


More information about the ceph-users mailing list