[ceph-users] Unexplainable high memory usage OSD with BlueStore

Hector Martin hector at marcansoft.com
Thu Nov 8 03:28:26 PST 2018

On 11/8/18 5:52 PM, Wido den Hollander wrote:
> [osd]
> bluestore_cache_size_ssd = 1G
> The BlueStore Cache size for SSD has been set to 1GB, so the OSDs
> shouldn't use more then that.
> When dumping the mem pools each OSD claims to be using between 1.8GB and
> 2.2GB of memory.
> $ ceph daemon osd.X dump_mempools|jq '.total.bytes'
> Summing up all the values I get to a total of 15.8GB and the system is
> using 22GB.
> Looking at 'ps aux --sort rss' I see OSDs using almost 10% of the
> memory, which would be ~3GB for a single daemon.

This is similar to what I see on a memory-starved host with the OSDs
configured with very little cache:

  bluestore cache size = 180000000

$ ceph daemon osd.13 dump_mempools|jq '.mempool.total.bytes'

That adds up, but ps says:

ceph     234576  2.6  6.2 1236200 509620 ?      Ssl  20:10   0:16
/usr/bin/ceph-osd -i 13 --pid-file /run/ceph/osd.13.pid -c
/etc/ceph/ceph.conf --foreground

So ~500MB RSS for this one. Due to an emergency situation that made me
lose half of the RAM on this host, I'm actually resorting to killing the
oldest OSD every 5 minutes right now to keep the server from OOMing
(this will be fixed soon).

I would very much like to know if this OSD memory usage outside of the
bluestore cache size can be bounded or reduced somehow. I don't
particularly care about performance, so it would be useful to be able to
tune it lower. This would help single-host and smaller Ceph use cases; I
think Ceph's properties make it a very interesting alternative to things
like btrfs and zfs, but dedicating several GB of RAM per disk/OSD is not
always viable. Right now it seems that besides the cache, OSDs will
creep up in memory usage up to some threshold, and I'm not sure what
determines what that baseline usage is or whether it can be controlled.

Hector Martin (hector at marcansoft.com)
Public Key: https://mrcn.st/pub

