[ceph-users] 13.2.4 odd memory leak?

Mark Nelson mnelson at redhat.com
Fri Mar 8 07:10:13 PST 2019


On 3/8/19 8:12 AM, Steffen Winther Sørensen wrote:
>
>
>> On 8 Mar 2019, at 14.30, Mark Nelson <mnelson at redhat.com 
>> <mailto:mnelson at redhat.com>> wrote:
>>
>>
>> On 3/8/19 5:56 AM, Steffen Winther Sørensen wrote:
>>>
>>>> On 5 Mar 2019, at 10.02, Paul Emmerich <paul.emmerich at croit.io 
>>>> <mailto:paul.emmerich at croit.io>> wrote:
>>>>
>>>> Yeah, there's a bug in 13.2.4. You need to set it to at least ~1.2GB.
>>> Yeap thanks, setting it at 1G+256M worked :)
>>> Hope this won’t bloat memory during coming weekend VM backups 
>>> through CephFS
>>>
>>
>>
>> FWIW, setting it to 1.2G will almost certainly result in the 
>> bluestore caches being stuck at cache_min, ie 128MB and the autotuner 
>> may not be able to keep the OSD memory that low.  I typically 
>> recommend a bare minimum of 2GB per OSD, and on SSD/NVMe backed OSDs 
>> 3-4+ can improve performance significantly.
> This a smaller dev cluster, not much IO, 4 nodes of 16GB & 6x HDD OSD
>
> Just want to avoid consuming swap, which bloated after patching to 
> 13.2.4 from 13.2.2 after performing VM snapshots to CephFS, Otherwise 
> cluster has been fine for ages…
> /Steffen


Understood.  We struggled with whether we should have separate HDD and 
SSD defaults for osd_memory_target, but we were seeing other users 
having problems with setting the global default vs the ssd/hdd default 
and not seeing expected behavior.  We decided to have a single 
osd_memory_target to try to make the whole thing simpler with only a 
single parameter to set.  The 4GB/OSD is aggressive but can dramatically 
improve performance on NVMe and we figured that it sort of communicates 
to users where we think the sweet spot is (and as devices and data sets 
get larger, this is going to be even more important).


Mark

>
>
>>
>>
>> Mark
>>
>>
>>
>>>> On Tue, Mar 5, 2019 at 9:00 AM Steffen Winther Sørensen
>>>> <stefws at gmail.com <mailto:stefws at gmail.com>> wrote:
>>>>>
>>>>>
>>>>> On 4 Mar 2019, at 16.09, Paul Emmerich <paul.emmerich at croit.io 
>>>>> <mailto:paul.emmerich at croit.io>> wrote:
>>>>>
>>>>> Bloated to ~4 GB per OSD and you are on HDDs?
>>>>>
>>>>> Something like that yes.
>>>>>
>>>>>
>>>>> 13.2.3 backported the cache auto-tuning which targets 4 GB memory
>>>>> usage by default.
>>>>>
>>>>>
>>>>> See https://ceph.com/releases/13-2-4-mimic-released/
>>>>>
>>>>> Right, thanks…
>>>>>
>>>>>
>>>>> The bluestore_cache_* options are no longer needed. They are replaced
>>>>> by osd_memory_target, defaulting to 4GB. BlueStore will expand
>>>>> and contract its cache to attempt to stay within this
>>>>> limit. Users upgrading should note this is a higher default
>>>>> than the previous bluestore_cache_size of 1GB, so OSDs using
>>>>> BlueStore will use more memory by default.
>>>>> For more details, see the BlueStore docs.
>>>>>
>>>>> Adding a 'osd memory target’ value to our ceph.conf and restarting 
>>>>> an OSD just makes the OSD dump like this:
>>>>>
>>>>> [osd]
>>>>>   ; this key makes 13.2.4 OSDs abort???
>>>>>   osd memory target = 1073741824
>>>>>
>>>>>   ; other OSD key settings
>>>>>   osd pool default size = 2  # Write an object 2 times.
>>>>>   osd pool default min size = 1 # Allow writing one copy in a 
>>>>> degraded state.
>>>>>
>>>>>   osd pool default pg num = 256
>>>>>   osd pool default pgp num = 256
>>>>>
>>>>>   client cache size = 131072
>>>>>   osd client op priority = 40
>>>>>   osd op threads = 8
>>>>>   osd client message size cap = 512
>>>>>   filestore min sync interval = 10
>>>>>   filestore max sync interval = 60
>>>>>
>>>>>   recovery max active = 2
>>>>>   recovery op priority = 30
>>>>>   osd max backfills = 2
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> osd log snippet:
>>>>>  -472> 2019-03-05 08:36:02.233 7f2743a8c1c0  1 -- - start start
>>>>>  -471> 2019-03-05 08:36:02.234 7f2743a8c1c0  2 osd.12 0 init 
>>>>> /var/lib/ceph/osd/ceph-12 (looks like hdd)
>>>>>  -470> 2019-03-05 08:36:02.234 7f2743a8c1c0  2 osd.12 0 journal 
>>>>> /var/lib/ceph/osd/ceph-12/journal
>>>>>  -469> 2019-03-05 08:36:02.234 7f2743a8c1c0  1 
>>>>> bluestore(/var/lib/ceph/osd/ceph-12) _mount path 
>>>>> /var/lib/ceph/osd/ceph-12
>>>>>  -468> 2019-03-05 08:36:02.235 7f2743a8c1c0  1 bdev create path 
>>>>> /var/lib/ceph/osd/ceph-12/block type kernel
>>>>>  -467> 2019-03-05 08:36:02.235 7f2743a8c1c0  1 bdev(0x55b31af4a000 
>>>>> /var/lib/ceph/osd/ceph-12/block) open path 
>>>>> /var/lib/ceph/osd/ceph-12/block
>>>>>  -466> 2019-03-05 08:36:02.236 7f2743a8c1c0  1 bdev(0x55b31af4a000 
>>>>> /var/lib/ceph/osd/ceph-12/block) open size 146775474176 
>>>>> (0x222c800000, 137 GiB) block_size 4096 (4 KiB) rotational
>>>>>  -465> 2019-03-05 08:36:02.236 7f2743a8c1c0  1 
>>>>> bluestore(/var/lib/ceph/osd/ceph-12) _set_cache_sizes cache_size 
>>>>> 1073741824 meta 0.4 kv 0.4 data 0.2
>>>>>  -464> 2019-03-05 08:36:02.237 7f2743a8c1c0  1 bdev create path 
>>>>> /var/lib/ceph/osd/ceph-12/block type kernel
>>>>>  -463> 2019-03-05 08:36:02.237 7f2743a8c1c0  1 bdev(0x55b31af4aa80 
>>>>> /var/lib/ceph/osd/ceph-12/block) open path 
>>>>> /var/lib/ceph/osd/ceph-12/block
>>>>>  -462> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bdev(0x55b31af4aa80 
>>>>> /var/lib/ceph/osd/ceph-12/block) open size 146775474176 
>>>>> (0x222c800000, 137 GiB) block_size 4096 (4 KiB) rotational
>>>>>  -461> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bluefs 
>>>>> add_block_device bdev 1 path /var/lib/ceph/osd/ceph-12/block size 
>>>>> 137 GiB
>>>>>  -460> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bluefs mount
>>>>>  -459> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>>>>> compaction_readahead_size = 2097152
>>>>>  -458> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>>>>> compression = kNoCompression
>>>>>  -457> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>>>>> max_write_buffer_number = 4
>>>>>  -456> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>>>>> min_write_buffer_number_to_merge = 1
>>>>>  -455> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>>>>> recycle_log_file_num = 4
>>>>>  -454> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>>>>> writable_file_max_buffer_size = 0
>>>>>  -453> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option 
>>>>> write_buffer_size = 268435456
>>>>>  -452> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
>>>>> compaction_readahead_size = 2097152
>>>>>  -451> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
>>>>> compression = kNoCompression
>>>>>  -450> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
>>>>> max_write_buffer_number = 4
>>>>>  -449> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
>>>>> min_write_buffer_number_to_merge = 1
>>>>>  -448> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
>>>>> recycle_log_file_num = 4
>>>>>  -447> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
>>>>> writable_file_max_buffer_size = 0
>>>>>  -446> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option 
>>>>> write_buffer_size = 268435456
>>>>>  -445> 2019-03-05 08:36:02.340 7f2743a8c1c0  1 rocksdb: do_open 
>>>>> column families: [default]
>>>>>  -444> 2019-03-05 08:36:02.341 7f2743a8c1c0  4 rocksdb: RocksDB 
>>>>> version: 5.13.0
>>>>>  -443> 2019-03-05 08:36:02.342 7f2743a8c1c0  4 rocksdb: Git sha 
>>>>> rocksdb_build_git_sha:@0@
>>>>>  -442> 2019-03-05 08:36:02.342 7f2743a8c1c0  4 rocksdb: Compile 
>>>>> date Jan  4 2019
>>>>> ...
>>>>>  -271> 2019-03-05 08:36:02.431 7f2743a8c1c0  1 freelist init
>>>>>  -270> 2019-03-05 08:36:02.535 7f2743a8c1c0  1 
>>>>> bluestore(/var/lib/ceph/osd/ceph-12) _open_alloc opening 
>>>>> allocation metadata
>>>>>  -269> 2019-03-05 08:36:02.714 7f2743a8c1c0  1 
>>>>> bluestore(/var/lib/ceph/osd/ceph-12) _open_alloc loaded 93 GiB in 
>>>>> 31828 extents
>>>>>  -268> 2019-03-05 08:36:02.722 7f2743a8c1c0  2 osd.12 0 journal 
>>>>> looks like hdd
>>>>>  -267> 2019-03-05 08:36:02.722 7f2743a8c1c0  2 osd.12 0 boot
>>>>>  -266> 2019-03-05 08:36:02.723 7f272a0f3700  5 
>>>>> bluestore.MempoolThread(0x55b31af46a30) _tune_cache_size target: 
>>>>> 1073741824 heap: 64675840 unmapped: 786432 mapped: 63889408 old 
>>>>> cache_size: 134217728 new cache size: 17349132402135320576
>>>>>  -265> 2019-03-05 08:36:02.723 7f272a0f3700  5 
>>>>> bluestore.MempoolThread(0x55b31af46a30) _trim_shards cache_size: 
>>>>> 17349132402135320576 kv_alloc: 134217728 kv_used: 5099462 
>>>>> meta_alloc: 0 meta_used: 21301 data_alloc: 0 data_used: 0
>>>>> ...
>>>>> 2019-03-05 08:36:40.166 7f03fc57f700  1 osd.12 pg_epoch: 7063 
>>>>> pg[2.93( v 6687'5 (0'0,6687'5] local-lis/les=7015/7016 n=1 
>>>>> ec=103/103 lis/c 7015/7015 les/c/f 7016/7016/0 7063/7063/7063) 
>>>>> [12,19] r=0 lpr=7063 pi=[7015,7063)/1 crt=6687'5 lcod 0'0 mlcod 
>>>>> 0'0 unknown NOTIFY mbc={}] start_peering_interval up [19] -> 
>>>>> [12,19], acting [19] -> [12,19], acting_primary 19 -> 12, 
>>>>> up_primary 19 -> 12, role -1 -> 0, features acting 
>>>>> 4611087854031142907 upacting 4611087854031142907
>>>>> 2019-03-05 08:36:40.167 7f03fc57f700  1 osd.12 pg_epoch: 7063 
>>>>> pg[2.93( v 6687'5 (0'0,6687'5] local-lis/les=7015/7016 n=1 
>>>>> ec=103/103 lis/c 7015/7015 les/c/f 7016/7016/0 7063/7063/7063) 
>>>>> [12,19] r=0 lpr=7063 pi=[7015,7063)/1 crt=6687'5 lcod 0'0 mlcod 
>>>>> 0'0 unknown mbc={}] state<Start>: transitioning to Primary
>>>>> 2019-03-05 08:36:40.167 7f03fb57d700  1 osd.12 pg_epoch: 7061 
>>>>> pg[2.40( v 6964'703 (0'0,6964'703] local-lis/les=6999/7000 n=1 
>>>>> ec=103/103 lis/c 6999/6999 les/c/f 7000/7000/0 7061/7061/6999) [8] 
>>>>> r=-1 lpr=7061 pi=[6999,7061)/1 crt=6964'703 lcod 0'0 unknown 
>>>>> mbc={}] start_peering_interval up [8,12] -> [8], acting [8,12] -> 
>>>>> [8], acting_primary 8 -> 8, up_primary 8 -> 8, role 1 -> -1, 
>>>>> features acting 4611087854031142907 upacting 4611087854031142907
>>>>>   1/ 5 heartbeatmap
>>>>>   1/ 5 perfcounter
>>>>>   1/ 5 rgw
>>>>>   1/ 5 rgw_sync
>>>>>   1/10 civetweb
>>>>>   1/ 5 javaclient
>>>>>   1/ 5 asok
>>>>>   1/ 1 throttle
>>>>>   0/ 0 refs
>>>>>   1/ 5 xio
>>>>>   1/ 5 compressor
>>>>>   1/ 5 bluestore
>>>>>   1/ 5 bluefs
>>>>>   1/ 3 bdev
>>>>>   1/ 5 kstore
>>>>>   4/ 5 rocksdb
>>>>>   4/ 5 leveldb
>>>>>   4/ 5 memdb
>>>>>   1/ 5 kinetic
>>>>>   1/ 5 fuse
>>>>>   1/ 5 mgr
>>>>>   1/ 5 mgrc
>>>>>   1/ 5 dpdk
>>>>>   1/ 5 eventtrace
>>>>>  -2/-2 (syslog threshold)
>>>>>  -1/-1 (stderr threshold)
>>>>>  max_recent     10000
>>>>>  max_new         1000
>>>>>  log_file /var/log/ceph/ceph-osd.12.log
>>>>> --- end dump of recent events ---
>>>>>
>>>>> 2019-03-05 08:36:07.750 7f272a0f3700 -1 *** Caught signal (Aborted) **
>>>>> in thread 7f272a0f3700 thread_name:bstore_mempool
>>>>>
>>>>> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) 
>>>>> mimic (stable)
>>>>> 1: (()+0x911e70) [0x55b318337e70]
>>>>> 2: (()+0xf5d0) [0x7f2737a4e5d0]
>>>>> 3: (gsignal()+0x37) [0x7f2736a6f207]
>>>>> 4: (abort()+0x148) [0x7f2736a708f8]
>>>>> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>>>>> const*)+0x242) [0x7f273aec62b2]
>>>>> 6: (()+0x25a337) [0x7f273aec6337]
>>>>> 7: (()+0x7a886e) [0x55b3181ce86e]
>>>>> 8: (BlueStore::MempoolThread::entry()+0x3b0) [0x55b3181d0060]
>>>>> 9: (()+0x7dd5) [0x7f2737a46dd5]
>>>>> 10: (clone()+0x6d) [0x7f2736b36ead]
>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
>>>>> needed to interpret this.
>>>>>
>>>>>
>>>>> Even without the ‘osd memory target’ conf key, OSD claims on start:
>>>>>
>>>>> bluestore(/var/lib/ceph/osd/ceph-12) _set_cache_sizes cache_size 
>>>>> 1073741824
>>>>>
>>>>> Any hints appreciated!
>>>>>
>>>>> /Steffen
>>>>>
>>>>>
>>>>> Paul
>>>>>
>>>>> --
>>>>> Paul Emmerich
>>>>>
>>>>> Looking for help with your Ceph cluster? Contact us at 
>>>>> https://croit.io
>>>>>
>>>>> croit GmbH
>>>>> Freseniusstr. 31h
>>>>> 81247 München
>>>>> www.croit.io <http://www.croit.io>
>>>>> Tel: +49 89 1896585 90
>>>>>
>>>>> On Mon, Mar 4, 2019 at 3:55 PM Steffen Winther Sørensen
>>>>> <stefws at gmail.com> wrote:
>>>>>
>>>>>
>>>>> List Members,
>>>>>
>>>>> patched a centos 7  based cluster from 13.2.2 to 13.2.4 last 
>>>>> monday, everything appeared working fine.
>>>>>
>>>>> Only this morning I found all OSDs in the cluster to be bloated in 
>>>>> memory foot print, possible after weekend backup through MDS.
>>>>>
>>>>> Anyone else seeing possible memory leak in 13.2.4 OSD possible 
>>>>> primarily when using MDS?
>>>>>
>>>>> TIA
>>>>>
>>>>> /Steffen
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users at lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


More information about the ceph-users mailing list