[ceph-users] Recover files from cephfs data pool

Rhian Resnick xantho at sepiidae.com
Mon Nov 5 16:43:51 PST 2018


Workload is mixed.

We ran a rados cpool to backup the metadata pool.

So your thinking that truncating journal and purge queue (we are luminous)
with a reset could bring us online missing just data from that day. (most
when the issue started)

If so we could continue our scan into our recovery partition and give it a
try tomorrow after discussions with our recovery team.




On Mon, Nov 5, 2018 at 7:40 PM Sergey Malinin <hell at newmail.com> wrote:

> What was your recent workload? There are chances not to lose much if it
> was mostly read ops. If such, you *must backup your metadata pool via
> "rados export" in order to preserve omap data*, then try truncating
> journals (along with purge queue if supported by your ceph version), wiping
> session table, and resetting the fs.
>
>
> On 6.11.2018, at 03:26, Rhian Resnick <xantho at sepiidae.com> wrote:
>
> That was our original plan. So we migrated to bigger disks and have space
> but recover dentry uses up all our memory (128 GB) and crashes out.
>
> On Mon, Nov 5, 2018 at 7:23 PM Sergey Malinin <hell at newmail.com> wrote:
>
>> I had the same problem with multi-mds. I solved it by freeing up a little
>> space on OSDs, doing "recover dentries", truncating the journal, and then
>> "fs reset". After that I was able to revert to single-active MDS and kept
>> on running for a year until it failed on 13.2.2 upgrade :))
>>
>>
>> On 6.11.2018, at 03:18, Rhian Resnick <xantho at sepiidae.com> wrote:
>>
>> Our metadata pool went from 700 MB to 1 TB in size in a few hours. Used
>> all space on OSD and now 2 ranks report damage. The recovery tools on the
>> journal fail as they run out of memory leaving us with the option of
>> truncating the journal and loosing data or recovering using the scan tools.
>>
>> Any ideas on solutions are welcome. I posted all the logs and and cluster
>> design previously but am happy to do so again. We are not desperate but we
>> are hurting with this long downtime.
>>
>> On Mon, Nov 5, 2018 at 7:05 PM Sergey Malinin <hell at newmail.com> wrote:
>>
>>> What kind of damage have you had? Maybe it is worth trying to get MDS to
>>> start and backup valuable data instead of doing long running recovery?
>>>
>>>
>>> On 6.11.2018, at 02:59, Rhian Resnick <xantho at sepiidae.com> wrote:
>>>
>>> Sounds like I get to have some fun tonight.
>>>
>>> On Mon, Nov 5, 2018, 6:39 PM Sergey Malinin <hell at newmail.com wrote:
>>>
>>>> inode linkage (i.e. folder hierarchy) and file names are stored in omap
>>>> data of objects in metadata pool. You can write a script that would
>>>> traverse through all the metadata pool to find out file names correspond to
>>>> objects in data pool and fetch required files via 'rados get' command.
>>>>
>>>> > On 6.11.2018, at 02:26, Sergey Malinin <hell at newmail.com> wrote:
>>>> >
>>>> > Yes, 'rados -h'.
>>>> >
>>>> >
>>>> >> On 6.11.2018, at 02:25, Rhian Resnick <xantho at sepiidae.com> wrote:
>>>> >>
>>>> >> Does a tool exist to recover files from a cephfs data partition? We
>>>> are rebuilding metadata but have a user who needs data asap.
>>>> >> _______________________________________________
>>>> >> ceph-users mailing list
>>>> >> ceph-users at lists.ceph.com
>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181105/d56f7883/attachment.html>


More information about the ceph-users mailing list