[ceph-users] 1/3 mon not working after upgrade to Nautilus

Clausen, Jörn jclausen at geomar.de
Mon Mar 25 03:51:25 PDT 2019


Hi!

I just tried upgrading my test cluster from Mimic (13.2.5) to Nautilus 
(14.2.0), and everything looked fine. Until I activated msgr2. At that 
moment, one of my three MONs (the then active one) fell out of the 
quorum and refuses to join back. The two other MONs seem to work fine.

ceph-mon.log on that host is filling rapidly with messages (mainly from 
rocksdb), but I can't find any useful information that hint to any 
problem. On one of the remaining MONs I can see

2019-03-25 11:36:04.081 7fb4add7a700  0 --1- 
[v2:172.17.0.35:3300/0,v1:172.17.0.35:6789/0] >> v1:172.17.0.37:6789/0 
conn(0x559095a5bc00 0x559095a2d800 :6789 
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 
l=0).handle_connect_message_2 accept peer reset, then tried to connect 
to us, replacing
2019-03-25 11:36:04.152 7fb4ad579700  0 --1- 
[v2:172.17.0.35:3300/0,v1:172.17.0.35:6789/0] >> v1:172.17.0.37:6789/0 
conn(0x559093ce9400 0x559093cec800 :6789 s=OPENED pgs=4422675 cs=1 
l=0).fault initiating reconnect

and similar messages, but again with little information for me. 
172.17.0.37 is the broken MON, 172.17.0.35 is one of the remaining ones.

ceph.conf on all MONs contains

mon host = 172.17.0.35,172.17.0.36,172.17.0.37

i.e. it should work for both v1 and v2.

Can anybody tell me how to debug this situation further? Or even solve it?

-- 
Jörn Clausen
Daten- und Rechenzentrum
GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel
Düsternbrookerweg 20
24105 Kiel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5386 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20190325/b76f6a32/attachment.bin>


More information about the ceph-users mailing list