[ceph-users] [EXTERNAL] Upgrading 0.94.6 -> 0.94.9 saturating mon node networking
Stillwell, Bryan J
Bryan.Stillwell at charter.com
Fri Sep 23 13:24:03 PDT 2016
This issue in the tracker has an explanation of what is going on:
So the encoding change caused the old OSDs to start requesting full OSDMap
updates instead of incremental ones.
I would still like to know the purpose of changing the encoding so late in
the stable release series...
On 9/22/16, 7:32 AM, "Will.Boege" <Will.Boege at target.com> wrote:
>Just went through this upgrading a ~400 OSD cluster. I was in the EXACT
>spot you were in. The faster you can get all OSDs to the same version as
>the MONs the better. We decided to power forward and the performance got
>better for every OSD node we patched.
>Additionally I also discovered your LevelDBs will start growing
>exponentially if you leave your cluster in that state for too long.
>Pretty sure the downrev OSDs are aggressively getting osdmaps from the
>MONs causing some kind of spinlock condition.
>> On Sep 21, 2016, at 4:21 PM, Stillwell, Bryan J
>><Bryan.Stillwell at charter.com> wrote:
>> While attempting to upgrade a 1200+ OSD cluster from 0.94.6 to 0.94.9
>> run into serious performance issues every time I restart an OSD.
>> At first I thought the problem I was running into was caused by the
>> encoding bug that Dan and Wido ran into when upgrading to 0.94.7,
>> I was seeing a ton (millions) of these messages in the logs:
>> 2016-09-21 20:48:32.831040 osd.504 126.96.36.199:6810/96488 24 :
>> [WRN] failed to encode map e727985 with expected cry
>> Here are the links to their descriptions of the problem:
>> I tried the solution of using the following command to stop those errors
>> from occurring:
>> ceph tell osd.* injectargs '--clog_to_monitors false'
>> Which did get the messages to stop spamming the log files, however, it
>> didn't fix the performance issue for me.
>> Using dstat on the mon nodes I was able to determine that every time the
>> osdmap is updated (by running 'ceph osd pool set data size 2' in this
>> example) it causes the outgoing network on all mon nodes to be saturated
>> for multiple seconds at a time:
>> ----system---- ----total-cpu-usage---- ------memory-usage-----
>> -dsk/total- --io/total-
>> time |usr sys idl wai hiq siq| used buff cach free| recv
>> send| read writ| read writ
>> 21-09 21:06:53| 1 0 99 0 0 0|11.8G 273M 18.7G 221G|2326k
>> 9015k| 0 1348k| 0 16.0
>> 21-09 21:06:54| 1 1 98 0 0 0|11.9G 273M 18.7G 221G| 15M
>> 10M| 0 1312k| 0 16.0
>> 21-09 21:06:55| 2 2 94 0 0 1|12.3G 273M 18.7G 220G| 14M
>> 311M| 0 48M| 0 309
>> 21-09 21:06:56| 2 3 93 0 0 3|12.2G 273M 18.7G 220G|7745k
>> 1190M| 0 16M| 0 93.0
>> 21-09 21:06:57| 1 2 96 0 0 1|12.0G 273M 18.7G 220G|8269k
>> 1189M| 0 1956k| 0 10.0
>> 21-09 21:06:58| 3 1 95 0 0 1|11.8G 273M 18.7G 221G|4854k
>> 752M| 0 4960k| 0 21.0
>> 21-09 21:06:59| 3 0 97 0 0 0|11.8G 273M 18.7G 221G|3098k
>> 25M| 0 5036k| 0 26.0
>> 21-09 21:07:00| 1 0 98 0 0 0|11.8G 273M 18.7G 221G|2247k
>> 25M| 0 9980k| 0 45.0
>> 21-09 21:07:01| 2 1 97 0 0 0|11.8G 273M 18.7G 221G|4149k
>> 17M| 0 76M| 0 427
>> That would be 1190 MiB/s (or 9.982 Gbps).
>> Restarting every OSD on a node at once as part of the upgrade causes a
>> couple minutes worth of network saturation on all three mon nodes. This
>> causes thousands of slow requests and many unhappy OpenStack users.
>> I'm now stuck about 15% into the upgrade and haven't been able to
>> determine how to move forward (or even backward) without causing another
>> I've attempted to run the same test on another cluster with 1300+ OSDs
>> the outgoing network on the mon nodes didn't exceed 15 MiB/s (0.126
>> Any suggestions on how I can proceed?
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
More information about the ceph-users