[ceph-users] RBD corruption when removing tier cache

Jan Pekař - Imatic jan.pekar at imatic.cz
Sat Dec 2 17:54:21 PST 2017


Hi all,

today I continued with my investigation. and maybe somebody will be 
interested with my research, so I'm sending it here.

I compared object in hot pool with object in cold pool and they were the 
same so I removed cache tier from cold pool.

Then I tried to fsck my rbd image using libvirt virtual with booted 
rescue cd.

I was successful only with read-only mount and with not replaying 
journal (mount -o ro,noload)

I noticed, that I'm getting IO errors on the disk.

sd 2:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] tag#0 Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] tag#0 Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] tag#0 CDB: Write(10) 2a 00 00 00 08 08 00 00 10 00
blk_update_request: 4 callbacks suppressed
blk_update_request: I/O error, dev sda, sector 2056
buffer_io_error: 61 callbacks suppressed
Buffer I/O error on dev sda1, logical block 1, lost async page write
Buffer I/O error on dev sda1, logical block 2, lost async page write
VFS: Dirty inode writeback failed for block device sda1 (err=-5).

I wanted to write to that block manually. To be sure I created rbd 
snapshot of that filesystem and after I created it, problems disappeared.

After creating snapshot I was able to fsck that filesystem, replay ext4 
jorunal.

It looks, that objects in cold pool were locked somehow so they cannot 
be modified? After snapshot they changed name and modification was 
possible? Can I debug it somehow?

I continued with cleaning hot pool, i tried to delete objects. Delete 
operation succeeded with rados rm, but some objects stayed there and I 
couldn't delete or get them anymore.

rados -p hot ls


rbd_data.9c000238e1f29.0000000000000000
rbd_data.9c000238e1f29.0000000000000621
rbd_data.9c000238e1f29.0000000000000001
rbd_data.9c000238e1f29.0000000000000a2c
rbd_data.9c000238e1f29.0000000000000200
rbd_data.9c000238e1f29.0000000000000622
rbd_data.9c000238e1f29.0000000000000009
rbd_data.9c000238e1f29.0000000000000208
rbd_data.9c000238e1f29.00000000000000c1
rbd_data.9c000238e1f29.0000000000000625
rbd_data.9c000238e1f29.00000000000000d8
rbd_data.9c000238e1f29.0000000000000623
rbd_data.9c000238e1f29.0000000000000624

rados -p hot rm
error removing hot>rbd_data.9c000238e1f29.0000000000000000: (2) No such 
file or directory

How to cleanup that pool? What could happen to that pool?


After some additional tests I think, that my initial problem caused 
switching cache mode to forward, so I recommend not only warn, like it 
is now when using that mode, but also to change official webpage

http://docs.ceph.com/docs/master/rados/operations/cache-tiering/

and find some other ways to flush all objects (like turn off VMs, set 
short time to evict or target size) and remove overlay after that.

With regards
Jan Pekar

On 1.12.2017 03:43, Jan Pekař - Imatic wrote:
> Hi all,
> today I tested adding SSD cache tier to pool.
> Everything worked, but when I tried to remove it and run
> 
> rados -p hot-pool cache-flush-evict-all
> 
> I got
> 
>          rbd_data.9c000238e1f29.0000000000000000
> failed to flush /rbd_data.9c000238e1f29.0000000000000000: (2) No such 
> file or directory
>          rbd_data.9c000238e1f29.0000000000000621
> failed to flush /rbd_data.9c000238e1f29.0000000000000621: (2) No such 
> file or directory
>          rbd_data.9c000238e1f29.0000000000000001
> failed to flush /rbd_data.9c000238e1f29.0000000000000001: (2) No such 
> file or directory
>          rbd_data.9c000238e1f29.0000000000000a2c
> failed to flush /rbd_data.9c000238e1f29.0000000000000a2c: (2) No such 
> file or directory
>          rbd_data.9c000238e1f29.0000000000000200
> failed to flush /rbd_data.9c000238e1f29.0000000000000200: (2) No such 
> file or directory
>          rbd_data.9c000238e1f29.0000000000000622
> failed to flush /rbd_data.9c000238e1f29.0000000000000622: (2) No such 
> file or directory
>          rbd_data.9c000238e1f29.0000000000000009
> failed to flush /rbd_data.9c000238e1f29.0000000000000009: (2) No such 
> file or directory
>          rbd_data.9c000238e1f29.0000000000000208
> failed to flush /rbd_data.9c000238e1f29.0000000000000208: (2) No such 
> file or directory
>          rbd_data.9c000238e1f29.00000000000000c1
> failed to flush /rbd_data.9c000238e1f29.00000000000000c1: (2) No such 
> file or directory
>          rbd_data.9c000238e1f29.0000000000000625
> failed to flush /rbd_data.9c000238e1f29.0000000000000625: (2) No such 
> file or directory
>          rbd_data.9c000238e1f29.00000000000000d8
> failed to flush /rbd_data.9c000238e1f29.00000000000000d8: (2) No such 
> file or directory
>          rbd_data.9c000238e1f29.0000000000000623
> failed to flush /rbd_data.9c000238e1f29.0000000000000623: (2) No such 
> file or directory
>          rbd_data.9c000238e1f29.0000000000000624
> failed to flush /rbd_data.9c000238e1f29.0000000000000624: (2) No such 
> file or directory
> error from cache-flush-evict-all: (1) Operation not permitted
> 
> I also notice, that switching cache tier to "forward" is not safe?
> 
> Error EPERM: 'forward' is not a well-supported cache mode and may 
> corrupt your data.  pass --yes-i-really-mean-it to force.
> 
> In the moment of flushing (or switching to forward mode) RBD got 
> corrupted and even fsck was unable to repair it (unable to set 
> superblock flags). I don't know if it is due to cache still active and 
> corrupted or ext4 got messed, that it cannot work anymore.
> 
> Even if VM that was using that pool is stopped I cannot flush it.
> 
> So what I did wrong? Can I get my data back? Is it safe to remove tier 
> cache and how?
> 
> Using rados get I can dump objects to disk, but why I cannot flush it 
> (evict)?
> 
> It looks like the same issue as on
> http://tracker.ceph.com/issues/12659
> but it is unresolved.
> 
> I also have some snapshot of RBD image in the cold pool, but that should 
> not cause problems in production.
> 
> I'm using 12.2.1 version on all 4 nodes.
> 
> With regards
> Jan Pekar
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
============
Ing. Jan Pekař
jan.pekar at imatic.cz | +420603811737
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz
============
--


More information about the ceph-users mailing list