[ceph-users] CephFS desync

Andrey Klimentyev andrey.klimentyev at flant.com
Mon Nov 27 03:03:37 PST 2017


No, it doesn't. But I'll keep this commit in mind, thanks!

On 24 November 2017 at 03:33, Yan, Zheng <ukernel at gmail.com> wrote:

> On Thu, Nov 23, 2017 at 5:49 PM, Andrey Klimentyev
> <andrey.klimentyev at flant.com> wrote:
> > The workload is... really common. It's just a bunch of PHP scripts being
> > executed via php-fpm, that sometimes write a couple of files (some
> > e-commerce reports). There were concerns with mmap(2) being used, but
> it's
> > not the case, I've checked with strace.
> > I am using the kernel client with a relatively fresh kernel - 4.10.0-28.
> >
> > I think, the simplest thing to do, would be updating to Luminous, to be
> > honest. The problem is elusive and a PITA to resolve after it occurs.
> > "touch" does not work, I have to change the contents of a file to
> forcefully
> > synchronize it on every cephfs client.
> >
>
> does it use readahead(2), madvise(2) or fadvise(2)? 4.10 kernel does
> not include following commit
>
> https://github.com/ceph/ceph-client/commit/2b1ac852eb67a6e95595e576371d23
> 519105559f
>
>
>
> > 20 нояб. 2017 г. 6:26 пользователь "Yan, Zheng" <ukernel at gmail.com>
> написал:
> >
> >> ceph-fuse or kernel client? which version of ceph-fuse/kernel? This
> >> issue can happen on ceph-fuse if fuse_disable_pagecache config is
> >> false. Old version kernel has a bug that can cause this issue. the bug
> >> is in splice_{read,write} and readahead code.
> >>
> >> On Sun, Nov 19, 2017 at 5:52 PM, Gregory Farnum <gfarnum at redhat.com>
> >> wrote:
> >> > Hmm, are you mounting the filesystem using ceph-fuse? Can you describe
> >> > your
> >> > workload?
> >> > -Greg
> >> >
> >> > On Fri, Nov 3, 2017 at 6:42 PM Andrey Klimentyev
> >> > <andrey.klimentyev at flant.com> wrote:
> >> >>
> >> >> I am absolutely incorrect, my apologies.
> >> >>
> >> >> caps: [mds] allow rw
> >> >> caps: [mon] allow r
> >> >> caps: [osd] allow rwx pool=cephfs_metadata, allow rwx
> pool=cephfs_data
> >> >>
> >> >> On 3 November 2017 at 10:40, Henrik Korkuc <lists at kirneh.eu> wrote:
> >> >>>
> >> >>> On 17-11-03 09:29, Andrey Klimentyev wrote:
> >> >>>
> >> >>> Thanks for a swift response.
> >> >>>
> >> >>> We are using 10.2.10.
> >> >>>
> >> >>> They all share the same set of permissions (and one key, too).
> Haven't
> >> >>> found anything incriminating in logs, too.
> >> >>>
> >> >>> caps: [mon] allow r
> >> >>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx
> >> >>> pool=rbd
> >> >>>
> >> >>> Are you sure you pasted correct user permissions? It looks like you
> >> >>> are
> >> >>> using RBD permissions for CephFS and this seems to be the problem.
> >> >>>
> >> >>> On 3 November 2017 at 00:56, Gregory Farnum <gfarnum at redhat.com>
> >> >>> wrote:
> >> >>>>
> >> >>>> On Thu, Nov 2, 2017 at 9:05 AM Andrey Klimentyev
> >> >>>> <andrey.klimentyev at flant.com> wrote:
> >> >>>>>
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> we've recently hit a problem in a production cluster. The gist of
> it
> >> >>>>> is
> >> >>>>> that sometimes file will be changed on one machine, but only the
> >> >>>>> "change
> >> >>>>> time" would propagate to others. The checksum is different.
> >> >>>>> Contents,
> >> >>>>> obviously, differ as well. How can I debug this?
> >> >>>>>
> >> >>>>> In other words, how would I approach such problem with "stuck
> >> >>>>> files"?
> >> >>>>> Haven't found anything on Google or troubleshooting docs.
> >> >>>>
> >> >>>>
> >> >>>> What versions are you running?
> >> >>>> The only way I can think of this happening is if one of the clients
> >> >>>> had
> >> >>>> permission to access the CephFS namespace on the MDS, but not to
> >> >>>> write to
> >> >>>> the OSDs which store the file data. Have you checked that the
> clients
> >> >>>> all
> >> >>>> have the same caps? ("ceph auth list" or one of the related
> >> >>>> more-specific
> >> >>>> commands will let you compare.)
> >> >>>> -Greg
> >> >>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>> Andrey Klimentyev,
> >> >>>>> DevOps engineer @ JSC «Flant»
> >> >>>>> http://flant.com/
> >> >>>>> _______________________________________________
> >> >>>>> ceph-users mailing list
> >> >>>>> ceph-users at lists.ceph.com
> >> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Andrey Klimentyev,
> >> >>> DevOps engineer @ JSC «Flant»
> >> >>> http://flant.com/
> >> >>>
> >> >>>
> >> >>> _______________________________________________
> >> >>> ceph-users mailing list
> >> >>> ceph-users at lists.ceph.com
> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Andrey Klimentyev,
> >> >> DevOps engineer @ JSC «Flant»
> >> >> http://flant.com/
> >> >> +7 (495) 721-10-27, ext. 487
> >> >> +7 (960) 180-38-98
> >> >
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users at lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
>



-- 
Andrey Klimentyev,
DevOps engineer @ JSC «Flant»
http://flant.com/ <http://flant.ru/>
+7 (495) 721-10-27, ext. 487
+7 (960) 180-38-98
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171127/af73537b/attachment.html>


More information about the ceph-users mailing list