[ceph-users] Re : bad crc/signature errors

Olivier Bonvalet ceph.list at daevel.fr
Thu Oct 5 03:01:47 PDT 2017


Le jeudi 05 octobre 2017 à 11:47 +0200, Ilya Dryomov a écrit :
> The stable pages bug manifests as multiple sporadic connection
> resets,
> because in that case CRCs computed by the kernel don't always match
> the
> data that gets sent out.  When the mismatch is detected on the OSD
> side, OSDs reset the connection and you'd see messages like
> 
>   libceph: osd1 1.2.3.4:6800 socket closed (con state OPEN)
>   libceph: osd2 1.2.3.4:6804 socket error on write
> 
> This is a different issue.  Josy, Adrian, Olivier, do you also see
> messages of the "libceph: read_partial_message ..." type or is it
> just
> "libceph: ... bad crc/signature" errors?
> 
> Thanks,
> 
>                 Ilya

I have "read_partial_message" too, for example :

Oct  5 09:00:47 lorunde kernel: [65575.969322] libceph: read_partial_message ffff88027c231500 data crc 181941039 != exp. 115232978
Oct  5 09:00:47 lorunde kernel: [65575.969953] libceph: osd122 10.0.0.31:6800 bad crc/signature
Oct  5 09:04:30 lorunde kernel: [65798.958344] libceph: read_partial_message ffff880254a25c00 data crc 443114996 != exp. 2014723213
Oct  5 09:04:30 lorunde kernel: [65798.959044] libceph: osd18 10.0.0.22:6802 bad crc/signature
Oct  5 09:14:28 lorunde kernel: [66396.788272] libceph: read_partial_message ffff880238636200 data crc 1797729588 != exp. 2550563968
Oct  5 09:14:28 lorunde kernel: [66396.788984] libceph: osd43 10.0.0.9:6804 bad crc/signature
Oct  5 10:09:36 lorunde kernel: [69704.211672] libceph: read_partial_message ffff8802712dff00 data crc 2241944833 != exp. 762990605
Oct  5 10:09:36 lorunde kernel: [69704.212422] libceph: osd103 10.0.0.28:6804 bad crc/signature
Oct  5 10:25:41 lorunde kernel: [70669.203596] libceph: read_partial_message ffff880257521400 data crc 3655331946 != exp. 2796991675
Oct  5 10:25:41 lorunde kernel: [70669.204462] libceph: osd16 10.0.0.21:6806 bad crc/signature
Oct  5 10:25:52 lorunde kernel: [70680.255943] libceph: read_partial_message ffff880245e3d600 data crc 3787567693 != exp. 725251636
Oct  5 10:25:52 lorunde kernel: [70680.257066] libceph: osd60 10.0.0.23:6800 bad crc/signature


On OSD side, for osd122 for example, I don't see any "reset" in osd
log.


Thanks,

Olivier


More information about the ceph-users mailing list