[ceph-users] Luminous cluster stuck when adding monitor

Joao Eduardo Luis joao at suse.de
Wed Oct 4 18:44:57 PDT 2017


On 10/04/2017 09:19 PM, Gregory Farnum wrote:
> Oh, hmm, you're right. I see synchronization starts but it seems to 
> progress very slowly, and it certainly doesn't complete in that 2.5 
> minute logging window. I don't see any clear reason why it's so slow; it 
> might be more clear if you could provide logs of the other logs at the 
> same time (especially since you now say they are getting stuck in the 
> electing state during that period). Perhaps Kefu or Joao will have some 
> clearer idea what the problem is.
> -Greg

I haven't gone through logs yet (maybe Friday, it's late today and it's 
a holiday tomorrow), but not so long ago I seem to recall someone having 
a similar issue with the monitors that was solely related to a switch's 
MTU being too small.

Maybe that could be the case? If not, I'll take a look at the logs as 
soon as possible.

   -Joao

> 
> On Wed, Oct 4, 2017 at 1:04 PM Nico Schottelius 
> <nico.schottelius at ungleich.ch <mailto:nico.schottelius at ungleich.ch>> wrote:
> 
> 
>     Some more detail:
> 
>     when restarting the monitor on server1, it stays in synchronizing state
>     forever.
> 
>     However the other two monitors change into electing state.
> 
>     I have double checked that there are not (host) firewalls active and
>     that the times are within 1 second different of the hosts (they all have
>     ntpd running).
> 
>     We are running everything on IPv6, but this should not be a problem,
>     should it?
> 
>     Best,
> 
>     Nico
> 
> 
>     Nico Schottelius <nico.schottelius at ungleich.ch
>     <mailto:nico.schottelius at ungleich.ch>> writes:
> 
>      > Hello Gregory,
>      >
>      > the logfile I produced has already debug mon = 20 set:
>      >
>      > [21:03:51] server1:~# grep "debug mon" /etc/ceph/ceph.conf
>      > debug mon = 20
>      >
>      > It is clear that server1 is out of quorum, however how do we make it
>      > being part of the quorum again?
>      >
>      > I expected that the quorum finding process is triggered automatically
>      > after restarting the monitor, or is that incorrect?
>      >
>      > Best,
>      >
>      > Nico
>      >
>      >
>      > Gregory Farnum <gfarnum at redhat.com <mailto:gfarnum at redhat.com>>
>     writes:
>      >
>      >> You'll need to change the config so that it's running "debug mon
>     = 20" for
>      >> the log to be very useful here. It does say that it's dropping
>     client
>      >> connections because it's been out of quorum for too long, which
>     is the
>      >> correct behavior in general. I'd imagine that you've got clients
>     trying to
>      >> connect to the new monitor instead of the ones already in the
>     quorum and
>      >> not passing around correctly; this is all configurable.
>      >>
>      >> On Wed, Oct 4, 2017 at 4:09 AM Nico Schottelius <
>      >> nico.schottelius at ungleich.ch
>     <mailto:nico.schottelius at ungleich.ch>> wrote:
>      >>
>      >>>
>      >>> Good morning,
>      >>>
>      >>> we have recently upgraded our kraken cluster to luminous and
>     since then
>      >>> noticed an odd behaviour: we cannot add a monitor anymore.
>      >>>
>      >>> As soon as we start a new monitor (server2), ceph -s and ceph
>     -w start to
>      >>> hang.
>      >>>
>      >>> The situation became worse, since one of our staff stopped an
>     existing
>      >>> monitor (server1), as restarting that monitor results in the same
>      >>> situation, ceph -s hangs until we stop the monitor again.
>      >>>
>      >>> We kept the monitor running for some minutes, but the situation
>     never
>      >>> cleares up.
>      >>>
>      >>> The network does not have any firewall in between the nodes and
>     there
>      >>> are no host firewalls.
>      >>>
>      >>> I have attached the output of the monitor on server1, running in
>      >>> foreground using
>      >>>
>      >>> root at server1:~# ceph-mon -i server1 --pid-file
>      >>> /var/lib/ceph/run/mon.server1.pid -c /etc/ceph/ceph.conf
>     --cluster ceph
>      >>> --setuser ceph --setgroup ceph -d 2>&1 | tee cephmonlog
>      >>>
>      >>> Does anyone see any obvious problem in the attached log?
>      >>>
>      >>> Any input or hint would be appreciated!
>      >>>
>      >>> Best,
>      >>>
>      >>> Nico
>      >>>
>      >>>
>      >>>
>      >>> --
>      >>> Modern, affordable, Swiss Virtual Machines. Visit
>     www.datacenterlight.ch <http://www.datacenterlight.ch>
>      >>> _______________________________________________
>      >>> ceph-users mailing list
>      >>> ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>      >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>      >>>
> 
> 
>     --
>     Modern, affordable, Swiss Virtual Machines. Visit
>     www.datacenterlight.ch <http://www.datacenterlight.ch>
> 



More information about the ceph-users mailing list