[ceph-users] cephfs kernel, hang with libceph: osdx X.X.X.X socket closed (con state OPEN)

Linh Vu vul at unimelb.edu.au
Thu Nov 8 17:16:07 PST 2018


If you're using kernel client for cephfs, I strongly advise to have the client on the same subnet as the ceph public one i.e all traffic should be on the same subnet/VLAN. Even if your firewall situation is good, if you have to cross subnets or VLANs, you will run into weird problems later. Fuse has much better tolerance for that scenario.

________________________________
From: ceph-users <ceph-users-bounces at lists.ceph.com> on behalf of Alexandre DERUMIER <aderumier at odiso.com>
Sent: Friday, 9 November 2018 12:06:43 PM
To: ceph-users
Subject: Re: [ceph-users] cephfs kernel, hang with libceph: osdx X.X.X.X socket closed (con state OPEN)

Ok,
It seem to come from firewall,
I'm seeing dropped session exactly 15min before the log.

The sessions are the session to osd,  session to mon && mds are ok.


Seem that keeplive2 is used to monitor the mon session
https://patchwork.kernel.org/patch/7105641/

but I'm not sure about osd sessions ?

----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "ceph-users" <ceph-users at lists.ceph.com>
Cc: "Alexandre Bruyelles" <abruyelles at odiso.com>
Envoyé: Vendredi 9 Novembre 2018 01:12:25
Objet: Re: [ceph-users] cephfs kernel, hang with libceph: osdx X.X.X.X socket closed (con state OPEN)

To be more precise,

the logs occurs when the hang is finished.

I have looked at stats on 10 differents hang, and the duration is always around 15 minutes.

Maybe related to:

ms tcp read timeout
Description: If a client or daemon makes a request to another Ceph daemon and does not drop an unused connection, the ms tcp read timeout defines the connection as idle after the specified number of seconds.
Type: Unsigned 64-bit Integer
Required: No
Default: 900 15 minutes.

?

Find a similar bug report with firewall too:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-October/013841.html


----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "ceph-users" <ceph-users at lists.ceph.com>
Envoyé: Jeudi 8 Novembre 2018 18:16:20
Objet: [ceph-users] cephfs kernel, hang with libceph: osdx X.X.X.X socket closed (con state OPEN)

Hi,

we are currently test cephfs with kernel module (4.17 and 4.18) instead fuse (worked fine),

and we have hang, iowait jump like crazy for around 20min.

client is a qemu 2.12 vm with virtio-net interface.


Is the client logs, we are seeing this kind of logs:

[jeu. nov. 8 12:20:18 2018] libceph: osd14 x.x.x.x:6801 socket closed (con state OPEN)
[jeu. nov. 8 12:42:03 2018] libceph: osd9 x.x.x.x:6821 socket closed (con state OPEN)


and in osd logs:

osd14:
2018-11-08 12:20:25.247 7f31ffac8700 0 -- x.x.x.x:6801/1745 >> x.x.x.x:0/3678871522 conn(0x558c430ec300 :6801 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)

osd9:
2018-11-08 12:42:09.820 7f7ca970e700 0 -- x.x.x.x:6821/1739 >> x.x.x.x:0/3678871522 conn(0x564fcbec5100 :6821 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)


cluster is ceph 13.2.1

Note that we have a physical firewall between client and server, I'm not sure yet if the session could be dropped. (I don't have find any logs in the firewall).

Any idea ? I would like to known if it's a network bug, or ceph bug (not sure how to understand the osd logs)

Regards,

Alexandre



client ceph.conf
----------------
[client]
fuse_disable_pagecache = true
client_reconnect_stale = true


_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181109/b1236f17/attachment.html>


More information about the ceph-users mailing list