[ceph-users] Luminous cluster stuck when adding monitor

Nico Schottelius nico.schottelius at ungleich.ch
Wed Oct 4 13:05:16 PDT 2017


Some more detail:

when restarting the monitor on server1, it stays in synchronizing state
forever.

However the other two monitors change into electing state.

I have double checked that there are not (host) firewalls active and
that the times are within 1 second different of the hosts (they all have
ntpd running).

We are running everything on IPv6, but this should not be a problem,
should it?

Best,

Nico


Nico Schottelius <nico.schottelius at ungleich.ch> writes:

> Hello Gregory,
>
> the logfile I produced has already debug mon = 20 set:
>
> [21:03:51] server1:~# grep "debug mon" /etc/ceph/ceph.conf
> debug mon = 20
>
> It is clear that server1 is out of quorum, however how do we make it
> being part of the quorum again?
>
> I expected that the quorum finding process is triggered automatically
> after restarting the monitor, or is that incorrect?
>
> Best,
>
> Nico
>
>
> Gregory Farnum <gfarnum at redhat.com> writes:
>
>> You'll need to change the config so that it's running "debug mon = 20" for
>> the log to be very useful here. It does say that it's dropping client
>> connections because it's been out of quorum for too long, which is the
>> correct behavior in general. I'd imagine that you've got clients trying to
>> connect to the new monitor instead of the ones already in the quorum and
>> not passing around correctly; this is all configurable.
>>
>> On Wed, Oct 4, 2017 at 4:09 AM Nico Schottelius <
>> nico.schottelius at ungleich.ch> wrote:
>>
>>>
>>> Good morning,
>>>
>>> we have recently upgraded our kraken cluster to luminous and since then
>>> noticed an odd behaviour: we cannot add a monitor anymore.
>>>
>>> As soon as we start a new monitor (server2), ceph -s and ceph -w start to
>>> hang.
>>>
>>> The situation became worse, since one of our staff stopped an existing
>>> monitor (server1), as restarting that monitor results in the same
>>> situation, ceph -s hangs until we stop the monitor again.
>>>
>>> We kept the monitor running for some minutes, but the situation never
>>> cleares up.
>>>
>>> The network does not have any firewall in between the nodes and there
>>> are no host firewalls.
>>>
>>> I have attached the output of the monitor on server1, running in
>>> foreground using
>>>
>>> root at server1:~# ceph-mon -i server1 --pid-file
>>> /var/lib/ceph/run/mon.server1.pid -c /etc/ceph/ceph.conf --cluster ceph
>>> --setuser ceph --setgroup ceph -d 2>&1 | tee cephmonlog
>>>
>>> Does anyone see any obvious problem in the attached log?
>>>
>>> Any input or hint would be appreciated!
>>>
>>> Best,
>>>
>>> Nico
>>>
>>>
>>>
>>> --
>>> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>


--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch


More information about the ceph-users mailing list