[ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

Alfredo Daniel Rezinovsky alfrenovsky at gmail.com
Mon Oct 8 12:22:56 PDT 2018



On 08/10/18 11:47, Yan, Zheng wrote:
> On Mon, Oct 8, 2018 at 9:46 PM Alfredo Daniel Rezinovsky
> <alfrenovsky at gmail.com> wrote:
>>
>>
>> On 08/10/18 10:20, Yan, Zheng wrote:
>>> On Mon, Oct 8, 2018 at 9:07 PM Alfredo Daniel Rezinovsky
>>> <alfrenovsky at gmail.com> wrote:
>>>>
>>>> On 08/10/18 09:45, Yan, Zheng wrote:
>>>>> On Mon, Oct 8, 2018 at 6:40 PM Alfredo Daniel Rezinovsky
>>>>> <alfrenovsky at gmail.com> wrote:
>>>>>> On 08/10/18 07:06, Yan, Zheng wrote:
>>>>>>> On Mon, Oct 8, 2018 at 5:43 PM Sergey Malinin <hell at newmail.com> wrote:
>>>>>>>>> On 8.10.2018, at 12:37, Yan, Zheng <ukernel at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> On Mon, Oct 8, 2018 at 4:37 PM Sergey Malinin <hell at newmail.com> wrote:
>>>>>>>>>> What additional steps need to be taken in order to (try to) regain access to the fs providing that I backed up metadata pool, created alternate metadata pool and ran scan_extents, scan_links, scan_inodes, and somewhat recursive scrub.
>>>>>>>>>> After that I only mounted the fs read-only to backup the data.
>>>>>>>>>> Would anything even work if I had mds journal and purge queue truncated?
>>>>>>>>>>
>>>>>>>>> did you backed up whole metadata pool?  did you make any modification
>>>>>>>>> to the original metadata pool? If you did, what modifications?
>>>>>>>> I backed up both journal and purge queue and used cephfs-journal-tool to recover dentries, then reset journal and purge queue on original metadata pool.
>>>>>>> You can try restoring original journal and purge queue, then downgrade
>>>>>>> mds to 13.2.1.   Journal objects names are 20x.xxxxxxxx, purge queue
>>>>>>> objects names are 50x.xxxxxxxxx.
>>>>>> I'm already done a scan_extents and doing a scan_inodes, Do i need to
>>>>>> finish with the scan_links?
>>>>>>
>>>>>> I'm with 13.2.2. DO I finish the scan_links and then downgrade?
>>>>>>
>>>>>> I have a backup done with "cephfs-journal-tool journal export
>>>>>> backup.bin". I think I don't have the pugue queue
>>>>>>
>>>>>> can I reset the purgue-queue journal?, Can I import an empty file
>>>>>>
>>>>> It's better to restore journal to original metadata pool and reset
>>>>> purge queue to empty, then try starting mds. Reset the purge queue
>>>>> will leave some objects in orphan states. But we can handle them
>>>>> later.
>>>>>
>>>>> Regards
>>>>> Yan, Zheng
>>>> Let's see...
>>>>
>>>> "cephfs-journal-tool journal import  backup.bin" will restore the whole
>>>> metadata ?
>>>> That's what "journal" means?
>>>>
>>> It just restores the journal. If you only reset original fs' journal
>>> and purge queue (run scan_foo commands with alternate metadata pool).
>>> It's highly likely restoring the journal will bring your fs back.
>>>
>>>
>>>
>>>> So I can stopt  cephfs-data-scan, run the import, downgrade, and then
>>>> reset the purge queue?
>>>>
>>> you said you have already run scan_extents and scan_inodes. what
>>> cephfs-data-scan command is running?
>> Already ran (without alternate metadata)
>>
>> time cephfs-data-scan scan_extents cephfs_data # 10 hours
>>
>> time cephfs-data-scan scan_inodes cephfs_data # running 3 hours
>> with a warning:
>> 7fddd8f64ec0 -1 datascan.inject_with_backtrace: Dentry
>> 0x0x10000db852b/dovecot.index already exists but points to 0x0x1000134f97f
>>
>> Still not run:
>>
>> time cephfs-data-scan scan_links
>>
> you have modified metatdata pool. I suggest you to run scan_links.
> After it finishes, reset session table and try restarting mds. If mds
> start successfully, run 'ceph daemon mds.x scrub_path / recursive
> repair'. (don't let client mount before it finishes)
>
> Good luck
>
>
mds still not starting (with 13.2.1) after full data-scan, journal 
import, and purgue queue journal reset.

tarting mds.storage-02 at -
/build/ceph-13.2.1/src/mds/journal.cc: In function 'void 
EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread 
7f36f98be700 time 2018-10-08 16:21:18.804623
/build/ceph-13.2.1/src/mds/journal.cc: 1572: FAILED 
assert(g_conf->mds_wipe_sessions)
2018-10-08 16:21:18.802 7f36f98be700 -1 log_channel(cluster) log [ERR] : 
journal replay inotablev mismatch 1 -> 42160
2018-10-08 16:21:18.802 7f36f98be700 -1 log_channel(cluster) log [ERR] : 
journal replay sessionmap v 20302542 -(1|2) > table 0
  ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic 
(stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x102) [0x7f3709cc4f32]
  2: (()+0x26c0f7) [0x7f3709cc50f7]
  3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x5f4b) 
[0x5616f5fee06b]
  4: (EUpdate::replay(MDSRank*)+0x39) [0x5616f5fef5a9]
  5: (MDLog::_replay_thread()+0x864) [0x5616f5f97c24]
  6: (MDLog::ReplayThread::entry()+0xd) [0x5616f5d3bc0d]
  7: (()+0x76db) [0x7f37095d16db]
  8: (clone()+0x3f) [0x7f37087b788f]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.
2018-10-08 16:21:18.802 7f36f98be700 -1 
/build/ceph-13.2.1/src/mds/journal.cc: In function 'void 
EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread 
7f36f98be700 time
2018-10-08 16:21:18.804623
/build/ceph-13.2.1/src/mds/journal.cc: 1572: FAILED 
assert(g_conf->mds_wipe_sessions)

core dumped.

>
>>> After 'import original journal'.  run 'ceph mds repaired
>>> fs_name:damaged_rank', then try restarting mds. Check if mds can
>>> start.
>>>
>>>> Please remember me the commands:
>>>> I've been 3 days without sleep, and I don't wanna to broke it more.
>>>>
>>> sorry for that.
>> I updated on friday, broke a golden rule: "READ ONLY FRIDAY". My fault.
>>>> Thanks
>>>>
>>>>
>>>>
>>>>>> What do I do with the journals?
>>>>>>
>>>>>>>> Before proceeding to alternate metadata pool recovery I was able to start MDS but it soon failed throwing lots of 'loaded dup inode' errors, not sure if that involved changing anything in the pool.
>>>>>>>> I have left the original metadata pool untouched sine then.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Yan, Zheng
>>>>>>>>>
>>>>>>>>>>> On 8.10.2018, at 05:15, Yan, Zheng <ukernel at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Sorry. this is caused wrong backport. downgrading mds to 13.2.1 and
>>>>>>>>>>> marking mds repaird can resolve this.
>>>>>>>>>>>
>>>>>>>>>>> Yan, Zheng
>>>>>>>>>>> On Sat, Oct 6, 2018 at 8:26 AM Sergey Malinin <hell at newmail.com> wrote:
>>>>>>>>>>>> Update:
>>>>>>>>>>>> I discovered http://tracker.ceph.com/issues/24236 and https://github.com/ceph/ceph/pull/22146
>>>>>>>>>>>> Make sure that it is not relevant in your case before proceeding to operations that modify on-disk data.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 6.10.2018, at 03:17, Sergey Malinin <hell at newmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I ended up rescanning the entire fs using alternate metadata pool approach as in http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/
>>>>>>>>>>>> The process has not competed yet because during the recovery our cluster encountered another problem with OSDs that I got fixed yesterday (thanks to Igor Fedotov @ SUSE).
>>>>>>>>>>>> The first stage (scan_extents) completed in 84 hours (120M objects in data pool on 8 hdd OSDs on 4 hosts). The second (scan_inodes) was interrupted by OSDs failure so I have no timing stats but it seems to be runing 2-3 times faster than extents scan.
>>>>>>>>>>>> As to root cause -- in my case I recall that during upgrade I had forgotten to restart 3 OSDs, one of which was holding metadata pool contents, before restarting MDS daemons and that seemed to had an impact on MDS journal corruption, because when I restarted those OSDs, MDS was able to start up but soon failed throwing lots of 'loaded dup inode' errors.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 6.10.2018, at 00:41, Alfredo Daniel Rezinovsky <alfrenovsky at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Same problem...
>>>>>>>>>>>>
>>>>>>>>>>>> # cephfs-journal-tool --journal=purge_queue journal inspect
>>>>>>>>>>>> 2018-10-05 18:37:10.704 7f01f60a9bc0 -1 Missing object 500.0000016c
>>>>>>>>>>>> Overall journal integrity: DAMAGED
>>>>>>>>>>>> Objects missing:
>>>>>>>>>>>> 0x16c
>>>>>>>>>>>> Corrupt regions:
>>>>>>>>>>>> 0x5b000000-ffffffffffffffff
>>>>>>>>>>>>
>>>>>>>>>>>> Just after upgrade to 13.2.2
>>>>>>>>>>>>
>>>>>>>>>>>> Did you fixed it?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 26/09/18 13:05, Sergey Malinin wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>> Followed standard upgrade procedure to upgrade from 13.2.1 to 13.2.2.
>>>>>>>>>>>> After upgrade MDS cluster is down, mds rank 0 and purge_queue journal are damaged. Resetting purge_queue does not seem to work well as journal still appears to be damaged.
>>>>>>>>>>>> Can anybody help?
>>>>>>>>>>>>
>>>>>>>>>>>> mds log:
>>>>>>>>>>>>
>>>>>>>>>>>> -789> 2018-09-26 18:42:32.527 7f70f78b1700  1 mds.mds2 Updating MDS map to version 586 from mon.2
>>>>>>>>>>>> -788> 2018-09-26 18:42:32.527 7f70f78b1700  1 mds.0.583 handle_mds_map i am now mds.0.583
>>>>>>>>>>>> -787> 2018-09-26 18:42:32.527 7f70f78b1700  1 mds.0.583 handle_mds_map state change up:rejoin --> up:active
>>>>>>>>>>>> -786> 2018-09-26 18:42:32.527 7f70f78b1700  1 mds.0.583 recovery_done -- successful recovery!
>>>>>>>>>>>> <skip>
>>>>>>>>>>>>      -38> 2018-09-26 18:42:32.707 7f70f28a7700 -1 mds.0.purge_queue _consume: Decode error at read_pos=0x322ec6636
>>>>>>>>>>>>      -37> 2018-09-26 18:42:32.707 7f70f28a7700  5 mds.beacon.mds2 set_want_state: up:active -> down:damaged
>>>>>>>>>>>>      -36> 2018-09-26 18:42:32.707 7f70f28a7700  5 mds.beacon.mds2 _send down:damaged seq 137
>>>>>>>>>>>>      -35> 2018-09-26 18:42:32.707 7f70f28a7700 10 monclient: _send_mon_message to mon.ceph3 at mon:6789/0
>>>>>>>>>>>>      -34> 2018-09-26 18:42:32.707 7f70f28a7700  1 -- mds:6800/e4cc09cf --> mon:6789/0 -- mdsbeacon(14c72/mds2 down:damaged seq 137 v24a) v7 -- 0x563b321ad480 con 0
>>>>>>>>>>>> <skip>
>>>>>>>>>>>>       -3> 2018-09-26 18:42:32.743 7f70f98b5700  5 -- mds:6800/3838577103 >> mon:6789/0 conn(0x563b3213e000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8 cs=1 l=1). rx mon.2 seq 29 0x563b321ab880 mdsbeaco
>>>>>>>>>>>> n(85106/mds2 down:damaged seq 311 v587) v7
>>>>>>>>>>>>       -2> 2018-09-26 18:42:32.743 7f70f98b5700  1 -- mds:6800/3838577103 <== mon.2 mon:6789/0 29 ==== mdsbeacon(85106/mds2 down:damaged seq 311 v587) v7 ==== 129+0+0 (3296573291 0 0) 0x563b321ab880 con 0x563b3213e
>>>>>>>>>>>> 000
>>>>>>>>>>>>       -1> 2018-09-26 18:42:32.743 7f70f98b5700  5 mds.beacon.mds2 handle_mds_beacon down:damaged seq 311 rtt 0.038261
>>>>>>>>>>>>        0> 2018-09-26 18:42:32.743 7f70f28a7700  1 mds.mds2 respawn!
>>>>>>>>>>>>
>>>>>>>>>>>> # cephfs-journal-tool --journal=purge_queue journal inspect
>>>>>>>>>>>> Overall journal integrity: DAMAGED
>>>>>>>>>>>> Corrupt regions:
>>>>>>>>>>>> 0x322ec65d9-ffffffffffffffff
>>>>>>>>>>>>
>>>>>>>>>>>> # cephfs-journal-tool --journal=purge_queue journal reset
>>>>>>>>>>>> old journal was 13470819801~8463
>>>>>>>>>>>> new journal start will be 13472104448 (1276184 bytes past old end)
>>>>>>>>>>>> writing journal head
>>>>>>>>>>>> done
>>>>>>>>>>>>
>>>>>>>>>>>> # cephfs-journal-tool --journal=purge_queue journal inspect
>>>>>>>>>>>> 2018-09-26 19:00:52.848 7f3f9fa50bc0 -1 Missing object 500.00000c8c
>>>>>>>>>>>> Overall journal integrity: DAMAGED
>>>>>>>>>>>> Objects missing:
>>>>>>>>>>>> 0xc8c
>>>>>>>>>>>> Corrupt regions:
>>>>>>>>>>>> 0x323000000-ffffffffffffffff
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>>> ceph-users at lists.ceph.com
>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>>> ceph-users at lists.ceph.com
>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users at lists.ceph.com
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



More information about the ceph-users mailing list