[ceph-users] ceph-backed VM drive became corrupted after unexpected VM termination

Дробышевский, Владимир vlad at itgorod.ru
Tue Nov 7 06:50:58 PST 2017


2017-11-07 19:06 GMT+05:00 Jason Dillaman <jdillama at redhat.com>:

> On Tue, Nov 7, 2017 at 8:55 AM, Дробышевский, Владимир <vlad at itgorod.ru>
> wrote:
> >
> > Oh, sorry, I forgot to mention that all OSDs are with bluestore, so xfs
> mount options don't have any influence.
> >
> > VMs have cache="none" by default, then I've tried "writethrough". No
> difference.
> >
> > And aren't these rbd cache options enabled by default?
>
> Yes, they are enabled by default. Note, however, that the QEMU cache
> options for the drive will override the Ceph configuration defaults.
>
> What specifically are you seeing in the guest OS when you state
> "corruption"?

Guest OS just can't mount partitions and stuck at initramfs. But I found
the reason: image is staying locked forever after a hypervisor crash. I
didn't tie up one thing with another.
And I've seen that images are staying locked when I tried to investigate
the problem, don't know why didn't I try to unlock before writing here :(

It was a problem with permissions, I've changed it and now everything works
as it should.

Thanks a lot for your help!

And Nico, thank you for pointing out your thread: I found the correct
permissions (in Jason's message) there. Going to open PR with fix for
OpenNebula docs.


> Assuming you haven't disabled barriers in your guest OS
> mount options and are using a journaled filesystem like ext4 or XFS,
> it should be sending proper flush requests to QEMU / librbd to ensure
> that it remains crash consistent. However, if you disable barriers or
> set a QEMU "cache=unsafe" option, these flush requests will not be
> sent and your data will most likely be corrupt after a hard failure.
>
> > 2017-11-07 18:45 GMT+05:00 Peter Maloney <peter.maloney at brockmann-consu
> lt.de>:
> >>
> >> I see nobarrier in there... Try without that. (unless that's just the
> bluestore xfs...then it probably won't change anything). And are the osds
> using bluestore?
> >>
> >> And what cache options did you set in the VM config? It's dangerous to
> set writeback without also this in the client side ceph.conf:
> >>
> >> rbd cache writethrough until flush = true
> >> rbd_cache = true
> >>
> >>
> >>
> >>
> >> On 11/07/17 14:36, Дробышевский, Владимир wrote:
> >>
> >> Hello!
> >>
> >>   I've got a weird situation with rdb drive image reliability. I found
> that after hard-reset VM with ceph rbd drive from my new cluster become
> corrupted. I accidentally found it during HA tests of my new cloud cluster:
> after host reset VM was not able to boot again because of the virtual drive
> errors. The same result will be if you just kill qemu process (like would
> happened at host crash time).
> >>
> >>   First of all I thought it is a guest OS problem. But then I tried
> RouterOS (linux based), Linux, FreeBSD - all options show the same behavior.
> >>   Then I blamed OpenNebula installation. For the test sake I've
> installed the latest Proxmox (5.1-36) to another server. The first subtest:
> I've created a VM in OpenNebula from predefined image, shut it down, then
> create Proxmox VM and pointed it to the image was created from OpenNebula.
> >> The second subtest: I've made a clean install from ISO with from
> Proxmox console, having previously created from Proxmox VM and drive image
> (of course, on the same ceph pool).
> >>   Both results: unbootable VMs.
> >>
> >>   Finally I've made a clean install to the fresh VM with local
> LVM-backed drive image. And - guess what? - it survived qemu process kill.
> >>
> >>   This is the first situation of this kind in my practice so I would
> like to ask for guidance. I believe that it is a cache problem of some
> kind, but I haven't faced it with earlier releases.
> >>
> >>   Some cluster details:
> >>
> >>   It's a small test cluster with 4 nodes, each has:
> >>
> >>   2x CPU E5-2665,
> >>   128GB RAM
> >>   1 OSD with Samsung sm863 1.92TB drive
> >>   IB connection with IPoIB on QDR IB network
> >>
> >>   OS: Ubuntu 16.04 with 4.10 kernel
> >>   ceph: luminous 12.2.1
> >>
> >>   Client (kvm host) OSes:
> >>   1. Ubuntu 16.04 (the same hosts as ceph cluster)
> >>   2. Debian 9.1 in case of Proxmox
> >>
> >>
> >> ceph.conf:
> >>
> >> [global]
> >> fsid = 6a8ffc55-fa2e-48dc-a71c-647e1fff749b
> >>
> >> public_network = 10.103.0.0/16
> >> cluster_network = 10.104.0.0/16
> >>
> >> mon_initial_members = e001n01, e001n02, e001n03
> >> mon_host = 10.103.0.1,10.103.0.2,10.103.0.3
> >>
> >> rbd default format = 2
> >>
> >> auth_cluster_required = cephx
> >> auth_service_required = cephx
> >> auth_client_required = cephx
> >>
> >> osd mount options = rw,noexec,nodev,noatime,nodiratime,nobarrier
> >> osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier
> >> osd_mkfs_type = xfs
> >>
> >> bluestore fsck on mount = true
> >>
> >> debug_lockdep = 0/0
> >> debug_context = 0/0
> >> debug_crush = 0/0
> >> debug_buffer = 0/0
> >> debug_timer = 0/0
> >> debug_filer = 0/0
> >> debug_objecter = 0/0
> >> debug_rados = 0/0
> >> debug_rbd = 0/0
> >> debug_journaler = 0/0
> >> debug_objectcatcher = 0/0
> >> debug_client = 0/0
> >> debug_osd = 0/0
> >> debug_optracker = 0/0
> >> debug_objclass = 0/0
> >> debug_filestore = 0/0
> >> debug_journal = 0/0
> >> debug_ms = 0/0
> >> debug_monc = 0/0
> >> debug_tp = 0/0
> >> debug_auth = 0/0
> >> debug_finisher = 0/0
> >> debug_heartbeatmap = 0/0
> >> debug_perfcounter = 0/0
> >> debug_asok = 0/0
> >> debug_throttle = 0/0
> >> debug_mon = 0/0
> >> debug_paxos = 0/0
> >> debug_rgw = 0/0
> >>
> >> [osd]
> >> osd op threads = 4
> >> osd disk threads = 2
> >> osd max backfills = 1
> >> osd recovery threads = 1
> >> osd recovery max active = 1
> >>
> >> --
> >>
> >> Best regards,
> >> Vladimir
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users at lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >> --
> >>
> >> --------------------------------------------
> >> Peter Maloney
> >> Brockmann Consult
> >> Max-Planck-Str. 2
> >> 21502 Geesthacht
> >> Germany
> >> Tel: +49 4152 889 300
> >> Fax: +49 4152 889 333
> >> E-mail: peter.maloney at brockmann-consult.de
> >> Internet: http://www.brockmann-consult.de
> >> --------------------------------------------
> >
> >
> >
> >
> > --
> >
> > Best regards,
> > Vladimir
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users at lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Jason
>



-- 

Best regards,
Vladimir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171107/fec3af85/attachment.html>


More information about the ceph-users mailing list