[ceph-users] CephFS desync

Yan, Zheng ukernel at gmail.com
Thu Nov 23 16:33:12 PST 2017


On Thu, Nov 23, 2017 at 5:49 PM, Andrey Klimentyev
<andrey.klimentyev at flant.com> wrote:
> The workload is... really common. It's just a bunch of PHP scripts being
> executed via php-fpm, that sometimes write a couple of files (some
> e-commerce reports). There were concerns with mmap(2) being used, but it's
> not the case, I've checked with strace.
> I am using the kernel client with a relatively fresh kernel - 4.10.0-28.
>
> I think, the simplest thing to do, would be updating to Luminous, to be
> honest. The problem is elusive and a PITA to resolve after it occurs.
> "touch" does not work, I have to change the contents of a file to forcefully
> synchronize it on every cephfs client.
>

does it use readahead(2), madvise(2) or fadvise(2)? 4.10 kernel does
not include following commit

https://github.com/ceph/ceph-client/commit/2b1ac852eb67a6e95595e576371d23519105559f



> 20 нояб. 2017 г. 6:26 пользователь "Yan, Zheng" <ukernel at gmail.com> написал:
>
>> ceph-fuse or kernel client? which version of ceph-fuse/kernel? This
>> issue can happen on ceph-fuse if fuse_disable_pagecache config is
>> false. Old version kernel has a bug that can cause this issue. the bug
>> is in splice_{read,write} and readahead code.
>>
>> On Sun, Nov 19, 2017 at 5:52 PM, Gregory Farnum <gfarnum at redhat.com>
>> wrote:
>> > Hmm, are you mounting the filesystem using ceph-fuse? Can you describe
>> > your
>> > workload?
>> > -Greg
>> >
>> > On Fri, Nov 3, 2017 at 6:42 PM Andrey Klimentyev
>> > <andrey.klimentyev at flant.com> wrote:
>> >>
>> >> I am absolutely incorrect, my apologies.
>> >>
>> >> caps: [mds] allow rw
>> >> caps: [mon] allow r
>> >> caps: [osd] allow rwx pool=cephfs_metadata, allow rwx pool=cephfs_data
>> >>
>> >> On 3 November 2017 at 10:40, Henrik Korkuc <lists at kirneh.eu> wrote:
>> >>>
>> >>> On 17-11-03 09:29, Andrey Klimentyev wrote:
>> >>>
>> >>> Thanks for a swift response.
>> >>>
>> >>> We are using 10.2.10.
>> >>>
>> >>> They all share the same set of permissions (and one key, too). Haven't
>> >>> found anything incriminating in logs, too.
>> >>>
>> >>> caps: [mon] allow r
>> >>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx
>> >>> pool=rbd
>> >>>
>> >>> Are you sure you pasted correct user permissions? It looks like you
>> >>> are
>> >>> using RBD permissions for CephFS and this seems to be the problem.
>> >>>
>> >>> On 3 November 2017 at 00:56, Gregory Farnum <gfarnum at redhat.com>
>> >>> wrote:
>> >>>>
>> >>>> On Thu, Nov 2, 2017 at 9:05 AM Andrey Klimentyev
>> >>>> <andrey.klimentyev at flant.com> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> we've recently hit a problem in a production cluster. The gist of it
>> >>>>> is
>> >>>>> that sometimes file will be changed on one machine, but only the
>> >>>>> "change
>> >>>>> time" would propagate to others. The checksum is different.
>> >>>>> Contents,
>> >>>>> obviously, differ as well. How can I debug this?
>> >>>>>
>> >>>>> In other words, how would I approach such problem with "stuck
>> >>>>> files"?
>> >>>>> Haven't found anything on Google or troubleshooting docs.
>> >>>>
>> >>>>
>> >>>> What versions are you running?
>> >>>> The only way I can think of this happening is if one of the clients
>> >>>> had
>> >>>> permission to access the CephFS namespace on the MDS, but not to
>> >>>> write to
>> >>>> the OSDs which store the file data. Have you checked that the clients
>> >>>> all
>> >>>> have the same caps? ("ceph auth list" or one of the related
>> >>>> more-specific
>> >>>> commands will let you compare.)
>> >>>> -Greg
>> >>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Andrey Klimentyev,
>> >>>>> DevOps engineer @ JSC «Flant»
>> >>>>> http://flant.com/
>> >>>>> _______________________________________________
>> >>>>> ceph-users mailing list
>> >>>>> ceph-users at lists.ceph.com
>> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Andrey Klimentyev,
>> >>> DevOps engineer @ JSC «Flant»
>> >>> http://flant.com/
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> ceph-users at lists.ceph.com
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Andrey Klimentyev,
>> >> DevOps engineer @ JSC «Flant»
>> >> http://flant.com/
>> >> +7 (495) 721-10-27, ext. 487
>> >> +7 (960) 180-38-98
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users at lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >


More information about the ceph-users mailing list