[ceph-users] ceph-backed VM drive became corrupted after unexpected VM termination

Peter Maloney peter.maloney at brockmann-consult.de
Tue Nov 7 05:45:18 PST 2017


I see nobarrier in there... Try without that. (unless that's just the
bluestore xfs...then it probably won't change anything). And are the
osds using bluestore?

And what cache options did you set in the VM config? It's dangerous to
set writeback without also this in the client side ceph.conf:

rbd cache writethrough until flush = true
rbd_cache = true



On 11/07/17 14:36, Дробышевский, Владимир wrote:
> Hello!
>
>   I've got a weird situation with rdb drive image reliability. I found
> that after hard-reset VM with ceph rbd drive from my new cluster
> become corrupted. I accidentally found it during HA tests of my new
> cloud cluster: after host reset VM was not able to boot again because
> of the virtual drive errors. The same result will be if you just kill
> qemu process (like would happened at host crash time).
>
>   First of all I thought it is a guest OS problem. But then I tried
> RouterOS (linux based), Linux, FreeBSD - all options show the same
> behavior. 
>   Then I blamed OpenNebula installation. For the test sake I've
> installed the latest Proxmox (5.1-36) to another server. The first
> subtest: I've created a VM in OpenNebula from predefined image, shut
> it down, then create Proxmox VM and pointed it to the image was
> created from OpenNebula.
> The second subtest: I've made a clean install from ISO with from
> Proxmox console, having previously created from Proxmox VM and drive
> image (of course, on the same ceph pool).
>   Both results: unbootable VMs.
>
>   Finally I've made a clean install to the fresh VM with local
> LVM-backed drive image. And - guess what? - it survived qemu process kill.
>   
>   This is the first situation of this kind in my practice so I would
> like to ask for guidance. I believe that it is a cache problem of some
> kind, but I haven't faced it with earlier releases.
>
>   Some cluster details:
>
>   It's a small test cluster with 4 nodes, each has:
>
>   2x CPU E5-2665,
>   128GB RAM
>   1 OSD with Samsung sm863 1.92TB drive
>   IB connection with IPoIB on QDR IB network
>
>   OS: Ubuntu 16.04 with 4.10 kernel
>   ceph: luminous 12.2.1
>
>   Client (kvm host) OSes: 
>   1. Ubuntu 16.04 (the same hosts as ceph cluster)
>   2. Debian 9.1 in case of Proxmox
>
>
> *ceph.conf:*
>
> [global]
> fsid = 6a8ffc55-fa2e-48dc-a71c-647e1fff749b
>
> public_network = 10.103.0.0/16 <http://10.103.0.0/16>
> cluster_network = 10.104.0.0/16 <http://10.104.0.0/16>
>
> mon_initial_members = e001n01, e001n02, e001n03
> mon_host = 10.103.0.1,10.103.0.2,10.103.0.3
>
> rbd default format = 2
>
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
>
> osd mount options = rw,noexec,nodev,noatime,nodiratime,nobarrier
> osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier
> osd_mkfs_type = xfs
>
> bluestore fsck on mount = true
>   
> debug_lockdep = 0/0
> debug_context = 0/0
> debug_crush = 0/0
> debug_buffer = 0/0
> debug_timer = 0/0
> debug_filer = 0/0
> debug_objecter = 0/0
> debug_rados = 0/0
> debug_rbd = 0/0
> debug_journaler = 0/0
> debug_objectcatcher = 0/0
> debug_client = 0/0
> debug_osd = 0/0
> debug_optracker = 0/0
> debug_objclass = 0/0
> debug_filestore = 0/0
> debug_journal = 0/0
> debug_ms = 0/0
> debug_monc = 0/0
> debug_tp = 0/0
> debug_auth = 0/0
> debug_finisher = 0/0
> debug_heartbeatmap = 0/0
> debug_perfcounter = 0/0
> debug_asok = 0/0
> debug_throttle = 0/0
> debug_mon = 0/0
> debug_paxos = 0/0
> debug_rgw = 0/0
>
> [osd]
> osd op threads = 4
> osd disk threads = 2
> osd max backfills = 1
> osd recovery threads = 1
> osd recovery max active = 1
>
> -- 
>
> Best regards,
> Vladimir
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney at brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171107/1128bb72/attachment.html>


More information about the ceph-users mailing list