[ceph-users] Issue with "renamed" mon, crashing

Anders Olausson anders at spacedump.se
Wed Nov 8 11:06:45 PST 2017


Hi Kamila,

Thank you for your response.

I think we solved it yesterday.
I simply removed the mon again and this time I also removed all references to it in ceph.conf (had some remnants there).
After that I ran ceph-deploy and after that it haven’t crashed again so far.

So in this case it was most likely some leftovers from the old mon in the config that fscked up things. (don’t get why though, but since it works after I removed all traces of it first and then recreated it). (before that I had removed it, recreated it a bunch of times aswell, but with some leftovers I ceph.conf, that was when it didn’t work)

//Anders

Från: Kamila Součková [mailto:kamila at ksp.sk]
Skickat: den 8 november 2017 13:43
Till: Anders Olausson <anders at spacedump.se>
Kopia: ceph-users at lists.ceph.com
Ämne: Re: [ceph-users] Issue with "renamed" mon, crashing

Hi,

I am not sure if this is the same issue as we had recently, but it looks a bit like it -- we also had a Luminous mon crashing right after syncing was done.

Turns out that the current release has a bug which causes the mon to crash if it cannot find a mgr daemon. This should be fixed in the upcoming release.

In our case we "solved" it by moving the active mgr to the mon's host. (I am not sure how to activate a specific mgr, but it appears that the mgrs get activated in FIFO order -- so just keep killing and re-starting the active one until a mgr on the mon's host is active).

Hope this helps!

Kamila

On Mon, Nov 6, 2017 at 12:44 PM Anders Olausson <anders at spacedump.se<mailto:anders at spacedump.se>> wrote:
Hi,

I recently (yesterday) upgraded to Luminous (12.2.1) running on Ubuntu 14.04.5 LTS.
Upgrade went fine, no issues at all.
However when I was about to use ceph-deploy to configure some new disks it failed.
After some investigation I figured out that it didn’t like that my mons was named ceph03mon on the host ceph03 for example, ceph-deploy gatherkeys ceph03 failed.
So I decided to rename my mons. I started with removing one of them:

# stop ceph-mon id=ceph03mon
# ceph mon remove ceph03mon
# cd /var/lib/ceph/mon/
# mv ceph-ceph03mon disabled-ceph-ceph03mon

Created the new one:

# mkdir tmp
# mkdir ceph-ceph03
# ceph auth get mon. -o tmp/keyring
# ceph mon getmap -o tmp/monmap
# ceph-mon -i ceph03 --mkfs --monmap tmp/monmap --keyring tmp/keyring
# chown -R ceph:ceph ceph-ceph03
# ceph-mon -i ceph03 --public-addr 10.10.1.23:6789<http://10.10.1.23:6789>
# start ceph-mon id=ceph03

Starts OK, quorum is established, when it gets the command “ceph osd pool stat” for example, or “ceph auth list” it crashes.

Complete log can be found at: http://files.spacedump.se/ceph03-monerror-20171106-01.txt
Used below settings for logging in ceph.conf at the time:

[mon]
       debug mon = 20
       debug paxos = 20
       debug auth = 20

I have now rolled back to the old monitor, it works as it should, on the same box etc. But it’s the one upgraded from Hammer -> Jewel -> Luminous.

Any idea what the issue could be?
Thanks.

Best regards
  Anders Olausson
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171108/d8253e51/attachment.html>


More information about the ceph-users mailing list