[ceph-users] erasure-coded with overwrites versus erasure-coded with cache tiering

Chad William Seys cwseys at physics.wisc.edu
Thu Oct 5 09:44:21 PDT 2017

Thanks David,
   When I convert to bluestore and the dust settles I hope to do a same 
cluster comparison and post here!


On 09/30/2017 07:29 PM, David Turner wrote:
>  > In my case, the replica-3 and k2m2 are stored on the same spinning disks.
> That is exactly what I meant by same pool.  The only way for a cache to 
> make sense would be if the data being written or read will be modified 
> or heavily read for X amount of time and then ignored.
> If things are rarely read, and randomly so, them prompting then into a 
> cache tier just makes you wait for the object to be promoted to cache 
> before you read it once or twice before it sits in there until it's 
> demoted again.  If you have random io and anything can really be read 
> next, then a cache tier on the same disks as the EC pool will only cause 
> things to be promoted and demoted for no apparent reason.
> You can always test this for your use case and see if it helps enough to 
> create a pool and tier that you need to manage or not. I'm planning to 
> remove my cephfs cache tier once I upgrade to Luminous as I only have it 
> as a requirement. It causes me to show down my writes heavily as 
> eviction io is useless and wasteful of cluster io for me.  I haven't 
> checked on the process for that yet, but I'm assuming it's a set command 
> on the pool that will then allow me to disable and remove the cache 
> tier.  I mention that because if it is that easy to enable/disable, then 
> testing it should be simple and easy to compare.
> On Sat, Sep 30, 2017, 8:10 PM Chad William Seys <cwseys at physics.wisc.edu 
> <mailto:cwseys at physics.wisc.edu>> wrote:
>     Hi David,
>         Thanks for the clarification.  Reminded me of some details I forgot
>     to mention.
>         In my case, the replica-3 and k2m2 are stored on the same spinning
>     disks. (Mainly using EC for "compression" b/c with the EC k2m2 setting
>     PG only takes up the same amount of space as a replica-2 while allowing
>     2 disks to fail like replica-3 without loss.)
>         I'm using this setup as RBDs and cephfs to store things like local
>     mirrors of linux packages and drive images to be broadcast over network.
>        Seems to be about as fast as a normal hard drive. :)
>         So is this the situation where the "cache tier [is] ont the same
>     root
>     of osds as the EC pool"?
>     Thanks for the advice!
>     Chad.
>     On 09/30/2017 12:32 PM, David Turner wrote:
>      > I can only think of 1 type of cache tier usage that is faster if
>     you are
>      > using the cache tier on the same root of osds as the EC pool. 
>     That is
>      > cold storage where the file is written initially, modified and
>     read door
>      > the first X hours, and then remains in cold storage for the
>     remainder of
>      > its life with rate reads.
>      >
>      > Other than that there are a few use cases using a faster root of osds
>      > that might make sense, but generally it's still better to utilize
>     that
>      > faster storage in the rest of the osd stack either as journals for
>      > filestore or Wal/DB partitions for bluestore.
>      >
>      >
>      > On Sat, Sep 30, 2017, 12:56 PM Chad William Seys
>      > <cwseys at physics.wisc.edu <mailto:cwseys at physics.wisc.edu>
>     <mailto:cwseys at physics.wisc.edu <mailto:cwseys at physics.wisc.edu>>>
>     wrote:
>      >
>      >     Hi all,
>      >         Now that Luminous supports direct writing to EC pools I was
>      >     wondering
>      >     if one can get more performance out of an erasure-coded pool with
>      >     overwrites or an erasure-coded pool with a cache tier?
>      >         I currently have a 3 replica pool in front of a k2m2
>     erasure coded
>      >     pool.  Luminous documentation on cache tiering
>      >
>     http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/#a-word-of-caution
>      >     makes it sound like cache tiering is usually not recommonded.
>      >
>      >     Thanks!
>      >     Chad.
>      >     _______________________________________________
>      >     ceph-users mailing list
>      > ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>     <mailto:ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>>
>      > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>      >

More information about the ceph-users mailing list