[ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

Igor Fedotov ifedotov at suse.de
Wed Oct 3 04:00:05 PDT 2018


I've seen somewhat similar behavior in a log from Sergey Malinin in 
another thread ("mimic: 3/4 OSDs crashed...")

He claimed it happened after LVM volume expansion. Isn't this the case 
for you?

Am I right that you use LVM volumes?


On 10/3/2018 11:22 AM, Kevin Olbrich wrote:
> Small addition: the failing disks are in the same host.
> This is a two-host, failure-domain OSD cluster.
>
>
> Am Mi., 3. Okt. 2018 um 10:13 Uhr schrieb Kevin Olbrich <ko at sv01.de 
> <mailto:ko at sv01.de>>:
>
>     Hi!
>
>     Yesterday one of our (non-priority) clusters failed when 3 OSDs
>     went down (EC 8+2) together.
>     *This is strange as we did an upgrade from 13.2.1 to 13.2.2 one or
>     two hours before.*
>     They failed exactly at the same moment, rendering the cluster
>     unusable (CephFS).
>     We are using CentOS 7 with latest updates and ceph repo. No cache
>     SSDs, no external journal / wal / db.
>
>     *OSD 29 (no disk failure in dmesg):*
>     2018-10-03 09:47:15.074 7fb8835ce1c0  0 set uid:gid to 167:167
>     (ceph:ceph)
>     2018-10-03 09:47:15.074 7fb8835ce1c0  0 ceph version 13.2.2
>     (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable), process
>     ceph-osd, pid 20899
>     2018-10-03 09:47:15.074 7fb8835ce1c0  0 pidfile_write: ignore
>     empty --pid-file
>     2018-10-03 09:47:15.100 7fb8835ce1c0  0 load: jerasure load: lrc
>     load: isa
>     2018-10-03 09:47:15.100 7fb8835ce1c0  1 bdev create path
>     /var/lib/ceph/osd/ceph-29/block type kernel
>     2018-10-03 09:47:15.100 7fb8835ce1c0  1 bdev(0x561250a20000
>     /var/lib/ceph/osd/ceph-29/block) open path
>     /var/lib/ceph/osd/ceph-29/block
>     2018-10-03 09:47:15.100 7fb8835ce1c0  1 bdev(0x561250a20000
>     /var/lib/ceph/osd/ceph-29/block) open size 1000198897664
>     (0xe8e0800000, 932 GiB) block_size 4096 (4 KiB) rotational
>     2018-10-03 09:47:15.101 7fb8835ce1c0  1
>     bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes kv_min_ratio
>     1 > kv_ratio 0.5
>     2018-10-03 09:47:15.101 7fb8835ce1c0  1
>     bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes cache_size
>     536870912 meta 0 kv 1 data 0
>     2018-10-03 09:47:15.101 7fb8835ce1c0  1 bdev(0x561250a20000
>     /var/lib/ceph/osd/ceph-29/block) close
>     2018-10-03 09:47:15.358 7fb8835ce1c0  1
>     bluestore(/var/lib/ceph/osd/ceph-29) _mount path
>     /var/lib/ceph/osd/ceph-29
>     2018-10-03 09:47:15.358 7fb8835ce1c0  1 bdev create path
>     /var/lib/ceph/osd/ceph-29/block type kernel
>     2018-10-03 09:47:15.358 7fb8835ce1c0  1 bdev(0x561250a20000
>     /var/lib/ceph/osd/ceph-29/block) open path
>     /var/lib/ceph/osd/ceph-29/block
>     2018-10-03 09:47:15.359 7fb8835ce1c0  1 bdev(0x561250a20000
>     /var/lib/ceph/osd/ceph-29/block) open size 1000198897664
>     (0xe8e0800000, 932 GiB) block_size 4096 (4 KiB) rotational
>     2018-10-03 09:47:15.360 7fb8835ce1c0  1
>     bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes kv_min_ratio
>     1 > kv_ratio 0.5
>     2018-10-03 09:47:15.360 7fb8835ce1c0  1
>     bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes cache_size
>     536870912 meta 0 kv 1 data 0
>     2018-10-03 09:47:15.360 7fb8835ce1c0  1 bdev create path
>     /var/lib/ceph/osd/ceph-29/block type kernel
>     2018-10-03 09:47:15.360 7fb8835ce1c0  1 bdev(0x561250a20a80
>     /var/lib/ceph/osd/ceph-29/block) open path
>     /var/lib/ceph/osd/ceph-29/block
>     2018-10-03 09:47:15.360 7fb8835ce1c0  1 bdev(0x561250a20a80
>     /var/lib/ceph/osd/ceph-29/block) open size 1000198897664
>     (0xe8e0800000, 932 GiB) block_size 4096 (4 KiB) rotational
>     2018-10-03 09:47:15.360 7fb8835ce1c0  1 bluefs add_block_device
>     bdev 1 path /var/lib/ceph/osd/ceph-29/block size 932 GiB
>     2018-10-03 09:47:15.360 7fb8835ce1c0  1 bluefs mount
>     2018-10-03 09:47:15.538 7fb8835ce1c0 -1 bluefs _replay file with
>     link count 0: file(ino 519 size 0x31e2f42 mtime 2018-10-02
>     12:24:22.632397 bdev 1 allocated 3200000 extents
>     [1:0x7008200000+100000,1:0x7009000000+100000,1:0x7009100000+100000,1:0x7009200000+100000,1:0x7009300000+100000,1:0x7009400000+100000,1:0x7009500000+100000,1:0x7009600000+100000,1:0x7009700000+100000,1:0x7009800000+100000,1:0x7009900000+100000,1:0x7009a00000+100000,1:0x7009b00000+100000,1:0x7009c00000+100000,1:0x7009d00000+100000,1:0x7009e00000+100000,1:0x7009f00000+100000,1:0x700a000000+100000,1:0x700a100000+100000,1:0x700a200000+100000,1:0x700a300000+100000,1:0x700a400000+100000,1:0x700a500000+100000,1:0x700a600000+100000,1:0x700a700000+100000,1:0x700a800000+100000,1:0x700a900000+100000,1:0x700aa00000+100000,1:0x700ab00000+100000,1:0x700ac00000+100000,1:0x700ad00000+100000,1:0x700ae00000+100000,1:0x700af00000+100000,1:0x700b000000+100000,1:0x700b100000+100000,1:0x700b200000+100000,1:0x700b300000+100000,1:0x700b400000+100000,1:0x700b500000+100000,1:0x700b600000+100000,1:0x700b700000+100000,1:0x700b800000+100000,1:0x700b900000+100000,1:0x700ba00000+100000,1:0x700bb00000+100000,1:0x700bc00000+100000,1:0x700bd00000+100000,1:0x700be00000+100000,1:0x700bf00000+100000,1:0x700c000000+100000])
>     2018-10-03 09:47:15.538 7fb8835ce1c0 -1 bluefs mount failed to
>     replay log: (5) Input/output error
>     2018-10-03 09:47:15.538 7fb8835ce1c0  1 stupidalloc
>     0x0x561250b8d030 shutdown
>     2018-10-03 09:47:15.538 7fb8835ce1c0 -1
>     bluestore(/var/lib/ceph/osd/ceph-29) _open_db failed bluefs mount:
>     (5) Input/output error
>     2018-10-03 09:47:15.538 7fb8835ce1c0  1 bdev(0x561250a20a80
>     /var/lib/ceph/osd/ceph-29/block) close
>     2018-10-03 09:47:15.616 7fb8835ce1c0  1 bdev(0x561250a20000
>     /var/lib/ceph/osd/ceph-29/block) close
>     2018-10-03 09:47:15.870 7fb8835ce1c0 -1 osd.29 0 OSD:init: unable
>     to mount object store
>     2018-10-03 09:47:15.870 7fb8835ce1c0 -1  ** ERROR: osd init
>     failed: (5) Input/output error
>
>     *OSD 42:*
>     disk is found by lvm, tmpfs is created but service immediately
>     dies on start without log...
>     This might be failed.
>
>     *OSD 47 (same as above, seems not be died, no dmesg trace):*
>     2018-10-03 10:02:25.221 7f4d54b611c0  0 set uid:gid to 167:167
>     (ceph:ceph)
>     2018-10-03 10:02:25.221 7f4d54b611c0  0 ceph version 13.2.2
>     (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable), process
>     ceph-osd, pid 8993
>     2018-10-03 10:02:25.221 7f4d54b611c0  0 pidfile_write: ignore
>     empty --pid-file
>     2018-10-03 10:02:25.247 7f4d54b611c0  0 load: jerasure load: lrc
>     load: isa
>     2018-10-03 10:02:25.248 7f4d54b611c0  1 bdev create path
>     /var/lib/ceph/osd/ceph-46/block type kernel
>     2018-10-03 10:02:25.248 7f4d54b611c0  1 bdev(0x564072f96000
>     /var/lib/ceph/osd/ceph-46/block) open path
>     /var/lib/ceph/osd/ceph-46/block
>     2018-10-03 10:02:25.248 7f4d54b611c0  1 bdev(0x564072f96000
>     /var/lib/ceph/osd/ceph-46/block) open size 1000198897664
>     (0xe8e0800000, 932 GiB) block_size 4096 (4 KiB) rotational
>     2018-10-03 10:02:25.249 7f4d54b611c0  1
>     bluestore(/var/lib/ceph/osd/ceph-46) _set_cache_sizes kv_min_ratio
>     1 > kv_ratio 0.5
>     2018-10-03 10:02:25.249 7f4d54b611c0  1
>     bluestore(/var/lib/ceph/osd/ceph-46) _set_cache_sizes cache_size
>     536870912 meta 0 kv 1 data 0
>     2018-10-03 10:02:25.249 7f4d54b611c0  1 bdev(0x564072f96000
>     /var/lib/ceph/osd/ceph-46/block) close
>     2018-10-03 10:02:25.503 7f4d54b611c0  1
>     bluestore(/var/lib/ceph/osd/ceph-46) _mount path
>     /var/lib/ceph/osd/ceph-46
>     2018-10-03 10:02:25.504 7f4d54b611c0  1 bdev create path
>     /var/lib/ceph/osd/ceph-46/block type kernel
>     2018-10-03 10:02:25.504 7f4d54b611c0  1 bdev(0x564072f96000
>     /var/lib/ceph/osd/ceph-46/block) open path
>     /var/lib/ceph/osd/ceph-46/block
>     2018-10-03 10:02:25.504 7f4d54b611c0  1 bdev(0x564072f96000
>     /var/lib/ceph/osd/ceph-46/block) open size 1000198897664
>     (0xe8e0800000, 932 GiB) block_size 4096 (4 KiB) rotational
>     2018-10-03 10:02:25.505 7f4d54b611c0  1
>     bluestore(/var/lib/ceph/osd/ceph-46) _set_cache_sizes kv_min_ratio
>     1 > kv_ratio 0.5
>     2018-10-03 10:02:25.505 7f4d54b611c0  1
>     bluestore(/var/lib/ceph/osd/ceph-46) _set_cache_sizes cache_size
>     536870912 meta 0 kv 1 data 0
>     2018-10-03 10:02:25.505 7f4d54b611c0  1 bdev create path
>     /var/lib/ceph/osd/ceph-46/block type kernel
>     2018-10-03 10:02:25.505 7f4d54b611c0  1 bdev(0x564072f96a80
>     /var/lib/ceph/osd/ceph-46/block) open path
>     /var/lib/ceph/osd/ceph-46/block
>     2018-10-03 10:02:25.505 7f4d54b611c0  1 bdev(0x564072f96a80
>     /var/lib/ceph/osd/ceph-46/block) open size 1000198897664
>     (0xe8e0800000, 932 GiB) block_size 4096 (4 KiB) rotational
>     2018-10-03 10:02:25.505 7f4d54b611c0  1 bluefs add_block_device
>     bdev 1 path /var/lib/ceph/osd/ceph-46/block size 932 GiB
>     2018-10-03 10:02:25.505 7f4d54b611c0  1 bluefs mount
>     2018-10-03 10:02:25.620 7f4d54b611c0 -1 bluefs _replay file with
>     link count 0: file(ino 450 size 0x169964c mtime 2018-10-02
>     12:24:22.602432 bdev 1 allocated 1700000 extents
>     [1:0x6fd9500000+100000,1:0x6fd9600000+100000,1:0x6fd9700000+100000,1:0x6fd9800000+100000,1:0x6fd9900000+100000,1:0x6fd9a00000+100000,1:0x6fd9b00000+100000,1:0x6fd9c00000+100000,1:0x6fd9d00000+100000,1:0x6fd9e00000+100000,1:0x6fd9f00000+100000,1:0x6fda000000+100000,1:0x6fda100000+100000,1:0x6fda200000+100000,1:0x6fda300000+100000,1:0x6fda400000+100000,1:0x6fda500000+100000,1:0x6fda600000+100000,1:0x6fda700000+100000,1:0x6fda800000+100000,1:0x6fda900000+100000,1:0x6fdaa00000+100000,1:0x6fdab00000+100000])
>     2018-10-03 10:02:25.620 7f4d54b611c0 -1 bluefs mount failed to
>     replay log: (5) Input/output error
>     2018-10-03 10:02:25.620 7f4d54b611c0  1 stupidalloc
>     0x0x564073102fc0 shutdown
>     2018-10-03 10:02:25.620 7f4d54b611c0 -1
>     bluestore(/var/lib/ceph/osd/ceph-46) _open_db failed bluefs mount:
>     (5) Input/output error
>     2018-10-03 10:02:25.620 7f4d54b611c0  1 bdev(0x564072f96a80
>     /var/lib/ceph/osd/ceph-46/block) close
>     2018-10-03 10:02:25.763 7f4d54b611c0  1 bdev(0x564072f96000
>     /var/lib/ceph/osd/ceph-46/block) close
>     2018-10-03 10:02:26.010 7f4d54b611c0 -1 osd.46 0 OSD:init: unable
>     to mount object store
>     2018-10-03 10:02:26.010 7f4d54b611c0 -1  ** ERROR: osd init
>     failed: (5) Input/output error
>
>     We had failing disks in this cluster before but that was easily
>     recovered by out + rebalance.
>     For me, it seems like one disk died (there was large I/O on the
>     cluster when this happened) and took two additional disks with it.
>     It is very strange that this happened about two hours after the
>     upgrade + reboot.
>
>     *Any recommendations?*
>     *I have 8 PGs down, the remeining are active and recovery /
>     rebalance.*
>
>     Kind regards
>     Kevin
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181003/4eb71549/attachment.html>


More information about the ceph-users mailing list