[ceph-users] ceph-backed VM drive became corrupted after unexpected VM termination

Дробышевский, Владимир vlad at itgorod.ru
Tue Nov 7 05:55:03 PST 2017


Oh, sorry, I forgot to mention that all OSDs are with bluestore, so xfs
mount options don't have any influence.

VMs have cache="none" by default, then I've tried "writethrough". No
difference.

And aren't these rbd cache options enabled by default?




2017-11-07 18:45 GMT+05:00 Peter Maloney <peter.maloney at brockmann-consult.de
>:

> I see nobarrier in there... Try without that. (unless that's just the
> bluestore xfs...then it probably won't change anything). And are the osds
> using bluestore?
>
> And what cache options did you set in the VM config? It's dangerous to set
> writeback without also this in the client side ceph.conf:
>
> rbd cache writethrough until flush = true
> rbd_cache = true
>
>
>
>
> On 11/07/17 14:36, Дробышевский, Владимир wrote:
>
> Hello!
>
>   I've got a weird situation with rdb drive image reliability. I found
> that after hard-reset VM with ceph rbd drive from my new cluster become
> corrupted. I accidentally found it during HA tests of my new cloud cluster:
> after host reset VM was not able to boot again because of the virtual drive
> errors. The same result will be if you just kill qemu process (like would
> happened at host crash time).
>
>   First of all I thought it is a guest OS problem. But then I tried
> RouterOS (linux based), Linux, FreeBSD - all options show the same
> behavior.
>   Then I blamed OpenNebula installation. For the test sake I've installed
> the latest Proxmox (5.1-36) to another server. The first subtest: I've
> created a VM in OpenNebula from predefined image, shut it down, then create
> Proxmox VM and pointed it to the image was created from OpenNebula.
> The second subtest: I've made a clean install from ISO with from Proxmox
> console, having previously created from Proxmox VM and drive image (of
> course, on the same ceph pool).
>   Both results: unbootable VMs.
>
>   Finally I've made a clean install to the fresh VM with local LVM-backed
> drive image. And - guess what? - it survived qemu process kill.
>
>   This is the first situation of this kind in my practice so I would like
> to ask for guidance. I believe that it is a cache problem of some kind, but
> I haven't faced it with earlier releases.
>
>   Some cluster details:
>
>   It's a small test cluster with 4 nodes, each has:
>
>   2x CPU E5-2665,
>   128GB RAM
>   1 OSD with Samsung sm863 1.92TB drive
>   IB connection with IPoIB on QDR IB network
>
>   OS: Ubuntu 16.04 with 4.10 kernel
>   ceph: luminous 12.2.1
>
>   Client (kvm host) OSes:
>   1. Ubuntu 16.04 (the same hosts as ceph cluster)
>   2. Debian 9.1 in case of Proxmox
>
>
> *ceph.conf:*
>
> [global]
> fsid = 6a8ffc55-fa2e-48dc-a71c-647e1fff749b
>
> public_network = 10.103.0.0/16
> cluster_network = 10.104.0.0/16
>
> mon_initial_members = e001n01, e001n02, e001n03
> mon_host = 10.103.0.1,10.103.0.2,10.103.0.3
>
> rbd default format = 2
>
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
>
> osd mount options = rw,noexec,nodev,noatime,nodiratime,nobarrier
> osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier
> osd_mkfs_type = xfs
>
> bluestore fsck on mount = true
>
> debug_lockdep = 0/0
> debug_context = 0/0
> debug_crush = 0/0
> debug_buffer = 0/0
> debug_timer = 0/0
> debug_filer = 0/0
> debug_objecter = 0/0
> debug_rados = 0/0
> debug_rbd = 0/0
> debug_journaler = 0/0
> debug_objectcatcher = 0/0
> debug_client = 0/0
> debug_osd = 0/0
> debug_optracker = 0/0
> debug_objclass = 0/0
> debug_filestore = 0/0
> debug_journal = 0/0
> debug_ms = 0/0
> debug_monc = 0/0
> debug_tp = 0/0
> debug_auth = 0/0
> debug_finisher = 0/0
> debug_heartbeatmap = 0/0
> debug_perfcounter = 0/0
> debug_asok = 0/0
> debug_throttle = 0/0
> debug_mon = 0/0
> debug_paxos = 0/0
> debug_rgw = 0/0
>
> [osd]
> osd op threads = 4
> osd disk threads = 2
> osd max backfills = 1
> osd recovery threads = 1
> osd recovery max active = 1
>
> --
>
> Best regards,
> Vladimir
>
>
> _______________________________________________
> ceph-users mailing listceph-users at lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
>
> --------------------------------------------
> Peter Maloney
> Brockmann Consult
> Max-Planck-Str. 2
> 21502 Geesthacht
> Germany
> Tel: +49 4152 889 300
> Fax: +49 4152 889 333
> E-mail: peter.maloney at brockmann-consult.de
> Internet: http://www.brockmann-consult.de
> --------------------------------------------
>
>


-- 

Best regards,
Vladimir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171107/88e61c32/attachment.html>


More information about the ceph-users mailing list