[ceph-users] Slow rbd reads (fast writes) with luminous + bluestore

Florian Haas florian at citynetwork.eu
Wed Nov 28 06:36:59 PST 2018

On 14/08/2018 15:57, Emmanuel Lacour wrote:
> Le 13/08/2018 à 16:58, Jason Dillaman a écrit :
>> See [1] for ways to tweak the bluestore cache sizes. I believe that by
>> default, bluestore will not cache any data but instead will only
>> attempt to cache its key/value store and metadata.
> I suppose too because default ratio is to cache as much as possible k/v
> up to 512M and hdd cache is 1G by default.
> I tried to increase hdd cache up to 4G and it seems to be used, 4 osd
> processes uses 20GB now.
>> In general, however, I would think that attempting to have bluestore
>> cache data is just an attempt to optimize to the test instead of
>> actual workloads. Personally, I think it would be more worthwhile to
>> just run 'fio --ioengine=rbd' directly against a pre-initialized image
>> after you have dropped the cache on the OSD nodes.
> So with bluestore, I assume that we need to think more of client page
> cache (at least when using a VM)  when with old filestore both osd and
> client cache where used.
> For benchmark, I did real benchmark here for the expected app workload
> of this new cluster and it's ok for us :)
> Thanks for your help Jason.

Shifting over a discussion from IRC and taking the liberty to resurrect
an old thread, as I just ran into the same (?) issue. I see
*significantly* reduced performance on RBD reads, compared to writes
with the same parameters. "rbd bench --io-type read" gives me 8K IOPS
(with the default 4K I/O size), whereas "rbd bench --io-type write"
produces more than twice that.

I should probably add that while my end result of doing an "rbd bench
--io-type read" is about half of what I get from a write benchmark, the
intermediate ops/sec output fluctuates from > 30K IOPS (about twice the
write IOPS) to about 3K IOPS (about 1/6 of what I get for writes). So
really, my read IOPS are all over the map (and terrible on average),
whereas my write IOPS are not stellar, but consistent.

This is an all-bluestore cluster on spinning disks with Luminous, and
I've tried the following things:

- run rbd bench with --rbd_readahead_disable_after_bytes=0 and
--rbd_readahead_max_bytes=4194304 (per

- configure OSDs with a larger bluestore_cache_size_hdd (4G; default is 1G)

- configure OSDs with bluestore_cache_kv_ratio = .49, so that rather
than using 1%/99%/0% for metadata/KV data/objects, the OSDs use 1%/49%/50%

None of the above produced any tangible improvement. Benchmark results
are at http://paste.openstack.org/show/736314/ if anyone wants to take a

I'd be curious to see if anyone has a suggestion on what else to try.
Thanks in advance!


More information about the ceph-users mailing list