[ceph-users] CephFS: clients hanging on write with ceph-fuse

Andras Pataki apataki at flatironinstitute.org
Thu Nov 2 11:40:23 PDT 2017

We've been running into a strange problem with Ceph using ceph-fuse and 
the filesystem. All the back end nodes are on 10.2.10, the fuse clients 
are on 10.2.7.

After some hours of runs, some processes get stuck waiting for fuse like:

[root at worker1144 ~]# cat /proc/58193/stack
[<ffffffffa08cd241>] wait_answer_interruptible+0x91/0xe0 [fuse]
[<ffffffffa08cd653>] __fuse_request_send+0x253/0x2c0 [fuse]
[<ffffffffa08cd6d2>] fuse_request_send+0x12/0x20 [fuse]
[<ffffffffa08d69d6>] fuse_send_write+0xd6/0x110 [fuse]
[<ffffffffa08d84d5>] fuse_perform_write+0x2f5/0x5a0 [fuse]
[<ffffffffa08d8a21>] fuse_file_aio_write+0x2a1/0x340 [fuse]
[<ffffffff811fdfbd>] do_sync_write+0x8d/0xd0
[<ffffffff811fe82d>] vfs_write+0xbd/0x1e0
[<ffffffff811ff34f>] SyS_write+0x7f/0xe0
[<ffffffff816975c9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

The cluster is healthy (all OSDs up, no slow requests, etc.).  More 
details of my investigation efforts are in the bug report I just submitted:

It looks like the fuse client is asking for some caps that it never 
thinks it receives from the MDS, so the thread waiting for those caps on 
behalf of the writing client never wakes up.  The restart of the MDS 
fixes the problem (since ceph-fuse re-negotiates caps).

Any ideas/suggestions?


