[ceph-users] Slow rbd reads (fast writes) with luminous + bluestore

Florian Haas florian at citynetwork.eu
Wed Nov 28 07:53:20 PST 2018

On 28/11/2018 15:52, Mark Nelson wrote:
>> Shifting over a discussion from IRC and taking the liberty to resurrect
>> an old thread, as I just ran into the same (?) issue. I see
>> *significantly* reduced performance on RBD reads, compared to writes
>> with the same parameters. "rbd bench --io-type read" gives me 8K IOPS
>> (with the default 4K I/O size), whereas "rbd bench --io-type write"
>> produces more than twice that.
>> I should probably add that while my end result of doing an "rbd bench
>> --io-type read" is about half of what I get from a write benchmark, the
>> intermediate ops/sec output fluctuates from > 30K IOPS (about twice the
>> write IOPS) to about 3K IOPS (about 1/6 of what I get for writes). So
>> really, my read IOPS are all over the map (and terrible on average),
>> whereas my write IOPS are not stellar, but consistent.
>> This is an all-bluestore cluster on spinning disks with Luminous, and
>> I've tried the following things:
>> - run rbd bench with --rbd_readahead_disable_after_bytes=0 and
>> --rbd_readahead_max_bytes=4194304 (per
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008271.html)
>> - configure OSDs with a larger bluestore_cache_size_hdd (4G; default
>> is 1G)
>> - configure OSDs with bluestore_cache_kv_ratio = .49, so that rather
>> than using 1%/99%/0% for metadata/KV data/objects, the OSDs use
>> 1%/49%/50%
>> None of the above produced any tangible improvement. Benchmark results
>> are at http://paste.openstack.org/show/736314/ if anyone wants to take a
>> look.
>> I'd be curious to see if anyone has a suggestion on what else to try.
>> Thanks in advance!
> Hi Florian,

Hi Mark, thanks for the speedy reply!

> By default bluestore will cache buffers on reads but not on writes
> (unless there are hints):
> Option("bluestore_default_buffered_read", Option::TYPE_BOOL,
>     .set_default(true)
>     .set_flag(Option::FLAG_RUNTIME)
>     .set_description("Cache read results by default (unless hinted
>     Option("bluestore_default_buffered_write", Option::TYPE_BOOL,
>     .set_default(false)
>     .set_flag(Option::FLAG_RUNTIME)
>     .set_description("Cache writes by default (unless hinted NOCACHE or
> This is one area where bluestore is a lot more confusing for users that
> filestore was.  There was a lot of concern about enabling buffer cache
> on writes by default because there's some associated overhead
> (potentially both during writes and in the mempool thread when trimming
> the cache).  It might be worth enabling bluestore_default_buffered_write
> and see if it helps reads.

So yes this is rather counterintuitive, but I happily gave it a shot and
the results are... more head-scratching than before. :)

The output is here: http://paste.openstack.org/show/736324/

In summary:

1. Write benchmark is in the same ballpark as before (good).

2. Read benchmark *without* readahead is *way* better than before
(splendid!) but has a weird dip down to 9K IOPS that I find
inexplicable. Any ideas on that?

3. Read benchmark *with* readahead is still abysmal, which I also find
rather odd. What do you think about that one?

4. Rerunning the benchmark without readahead is slow at first and then
speeds up to where it was before, but is not nearly being as consistent
even towards the end of the benchmark run.

I do much appreciate your continued insight, thanks a lot!


More information about the ceph-users mailing list