[ceph-users] bad crc/signature errors

Ilya Dryomov
Thu Oct 5 02:47:17 PDT 2017

On Thu, Oct 5, 2017 at 7:53 AM, Adrian Saul
<Adrian.Saul at tpgtelecom.com.au> wrote:
> We see the same messages and are similarly on a 4.4 KRBD version that is affected by this.
> I have seen no impact from it so far that I know about
>> Jason Dillaman
>> Perhaps this is related to a known issue on some 4.4 and later kernels [1]
>> where the stable write flag was not preserved by the kernel?
>> [1] http://tracker.ceph.com/issues/19275

The stable pages bug manifests as multiple sporadic connection resets,
because in that case CRCs computed by the kernel don't always match the
data that gets sent out.  When the mismatch is detected on the OSD
side, OSDs reset the connection and you'd see messages like

  libceph: osd1 socket closed (con state OPEN)
  libceph: osd2 socket error on write

This is a different issue.  Josy, Adrian, Olivier, do you also see
messages of the "libceph: read_partial_message ..." type or is it just
"libceph: ... bad crc/signature" errors?



