[ceph-users] RBD corruption when removing tier cache

Jan Pekař - Imatic jan.pekar at imatic.cz
Thu Nov 30 18:43:03 PST 2017


Hi all,
today I tested adding SSD cache tier to pool.
Everything worked, but when I tried to remove it and run

rados -p hot-pool cache-flush-evict-all

I got

         rbd_data.9c000238e1f29.0000000000000000
failed to flush /rbd_data.9c000238e1f29.0000000000000000: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000621
failed to flush /rbd_data.9c000238e1f29.0000000000000621: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000001
failed to flush /rbd_data.9c000238e1f29.0000000000000001: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000a2c
failed to flush /rbd_data.9c000238e1f29.0000000000000a2c: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000200
failed to flush /rbd_data.9c000238e1f29.0000000000000200: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000622
failed to flush /rbd_data.9c000238e1f29.0000000000000622: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000009
failed to flush /rbd_data.9c000238e1f29.0000000000000009: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000208
failed to flush /rbd_data.9c000238e1f29.0000000000000208: (2) No such 
file or directory
         rbd_data.9c000238e1f29.00000000000000c1
failed to flush /rbd_data.9c000238e1f29.00000000000000c1: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000625
failed to flush /rbd_data.9c000238e1f29.0000000000000625: (2) No such 
file or directory
         rbd_data.9c000238e1f29.00000000000000d8
failed to flush /rbd_data.9c000238e1f29.00000000000000d8: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000623
failed to flush /rbd_data.9c000238e1f29.0000000000000623: (2) No such 
file or directory
         rbd_data.9c000238e1f29.0000000000000624
failed to flush /rbd_data.9c000238e1f29.0000000000000624: (2) No such 
file or directory
error from cache-flush-evict-all: (1) Operation not permitted

I also notice, that switching cache tier to "forward" is not safe?

Error EPERM: 'forward' is not a well-supported cache mode and may 
corrupt your data.  pass --yes-i-really-mean-it to force.

In the moment of flushing (or switching to forward mode) RBD got 
corrupted and even fsck was unable to repair it (unable to set 
superblock flags). I don't know if it is due to cache still active and 
corrupted or ext4 got messed, that it cannot work anymore.

Even if VM that was using that pool is stopped I cannot flush it.

So what I did wrong? Can I get my data back? Is it safe to remove tier 
cache and how?

Using rados get I can dump objects to disk, but why I cannot flush it 
(evict)?

It looks like the same issue as on
http://tracker.ceph.com/issues/12659
but it is unresolved.

I also have some snapshot of RBD image in the cold pool, but that should 
not cause problems in production.

I'm using 12.2.1 version on all 4 nodes.

With regards
Jan Pekar


More information about the ceph-users mailing list