[ceph-users] Cache tier operation clarifications

Shinobu Kinjo shinobu.kj at gmail.com
Fri Mar 4 03:14:42 PST 2016

Great feedback (at least for me).
I would like to know if the behaviours you seeing are expected things or not.

BTW I will do some test regarding to cache tier with my new toy.


On Fri, Mar 4, 2016 at 5:17 PM, Christian Balzer <chibi at gol.com> wrote:
> Hello,
> Unlike the subject may suggest, I'm mostly going to try and explain how
> things work with cache tiers, as far as I understand them.
> Something of a reference to point to.
> Of course if you spot something that's wrong or have additional
> information, by all means please do comment.
> While the documentation in master now correctly warns that you HAVE to set
> target_max_bytes (the size of your cache pool) for any of the relative
> sizing bits to work, lets repeat that here since it wasn't mentioned there
> previously.
> And without that value being set, none of the flushing or eviction will
> happen, resulting in blocked IOs when it gets full.
> The other thing about target_max_bytes is to remember (documented nowhere)
> that this space calculation is base per PG.
> So if you have a 1024GB cache pool and target_max_bytes set accordingly
> (one of the most annoying things about Ceph is have to specify full bytes
> in most places instead of human friendly shortcuts like "1TB"), Ceph
> (the cache tiering agent to be precise) will think that the cache is 50%
> full when just one PG has reached 512MB.
> In short, expect things to happen quite a bit before you reach the usage
> that you think you specified in cache_target_dirty_ratio and
> cache_target_full_ratio.
> Annoying, but at least failing safe.
> I'm ignoring target_max_objects for this, as it's the same for object
> count instead of space.
> min_read_recency_for_promote and min_write_recency_for_promote I shall
> ignore for now as well, since I have no cluster to test them with.
> Flush
> Either way once Ceph thinks you've reached the cache_target_dirty_ratio
> specified, it copies dirty objects to the backing storage.
> If they never existed there before, they will be created (so keep that in
> mind if you see an increase in objects).
> This (additional object) is similar to tier promotion, when an existing
> object is copied from the base pool to the cache pool the first time it's
> accessed.
> In versions after Hammer there is also cache_target_dirty_high_ratio,
> which specifies at which point more aggressive flushing starts.
> Note that flushing keeps objects in the cache.
> So that object you wrote too some days ago and kept reading frequently
> ever since isn't just going away to the slower base pool.
> Evict
> Next is eviction. This is where things became bit more muddled for me and
> I had to do some testing and staring at objects in PGs.
> So your cache pool is now hitting the cache_target_full_ratio (or so the
> wonky space per PG algorithm thinks).
> Remember that all IO will stop once the cache pool gets 100% full, so you
> want this to happen at some safe, sane point before this.
> What that point is depends of course on the maximum write speed to your
> pool, how fast your cache can flush to the base pool, etc.
> Now here is the fun part, clean objects (ones that have not been modified
> since they were promoted from the base pool or last flushed) are eligible
> for eviction.
> When reading about this the first time I thought this involved more moving
> of data from the cache pool to the base pool.
> However what happens is that since the object is "clean" (copy exists on
> the base pool), it is simply zero'd (after demotion), leaving an empty
> rados object in the cache pool and consequently releasing space.
> So as far as IO and network traffic is concerned, your enemy is flushing,
> not eviction.
> In clusters that have a clear usage pattern and idle times, a command
> to trigger flushes for a specified ratio and with settable IO limits would
> be most welcome. (hint-hint)
> Lacking this for now, I've be pondering a cron job that sets
> cache_target_dirty_ratio from .7 (my current value) to .6 (or more
> likely something smaller, like .65) for a few hours during night and then
> back up again.
> This is based on our cache typically not growing more than 2% per day.
> Lastly we come to cache_min_flush_age and cache_min_evict_age.
> It is my understanding that in Hammer and later a truly full cache pool
> will cause these to be ignored to prevent IO deadlocks, correct?
> The largest source of cache pollution for us are VM reboots (all those
> objects holding the kernel and other things only read at startup, never to
> be needed again for months) while on the other hand we have about 10k
> truly hot objects that are constantly being read/written.
> Lacking min_write_recency_for_promote for now, I've been thinking to set
> cache_min_evict_age to several hours.
> Truly cold objects will be subject to eviction, even lukewarm ones get to
> stay.
> Note that for objects that more or less belong in the cache we're using
> less than 15% of its capacity.
> Christian
> --
> Christian Balzer        Network/Systems Engineer
> chibi at gol.com           Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

shinobu at linux.com
Life with Distributed Computational System based on OpenSource

More information about the ceph-users mailing list