[ceph-users] Ceph cache pool full

David Turner drakonstein at gmail.com
Fri Oct 6 06:46:22 PDT 2017


On Fri, Oct 6, 2017, 1:05 AM Christian Balzer <chibi at gol.com> wrote:

>
> Hello,
>
> On Fri, 06 Oct 2017 03:30:41 +0000 David Turner wrote:
>
> > You're missing most all of the important bits. What the osds in your
> > cluster look like, your tree, and your cache pool settings.
> >
> > ceph df
> > ceph osd df
> > ceph osd tree
> > ceph osd pool get cephfs_cache all
> >
> Especially the last one.
>
> My money is on not having set target_max_objects and target_max_bytes to
> sensible values along with the ratios.
> In short, not having read the (albeit spotty) documentation.
>
> > You have your writeback cache on 3 nvme drives. It looks like you have
> > 1.6TB available between them for the cache. I don't know the behavior of
> a
> > writeback cache tier on cephfs for large files, but I would guess that it
> > can only hold full files and not flush partial files.
>
> I VERY much doubt that, if so it would be a massive flaw.
> One assumes that cache operations work on the RADOS object level, no
> matter what.
>
I hope that it is on the rados level, but not a single object had been
flushed to the backing pool. So I hazarded a guess. Seeing his settings
will shed more light.

>
> > That would mean your
> > cache needs to have enough space for any file being written to the
> cluster.
> > In this case a 1.3TB file with 3x replication would require 3.9TB (more
> > than double what you have available) of available space in your writeback
> > cache.
> >
> > There are very few use cases that benefit from a cache tier. The docs for
> > Luminous warn as much.
> You keep repeating that like a broken record.
>
> And while certainly not false I for one wouldn't be able to use (justify
> using) Ceph w/o cache tiers in our main use case.


> In this case I assume they were following and old cheat sheet or such,
> suggesting the previously required cache tier with EC pools.
>

http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/

I know I keep repeating it, especially recently as there have been a lot of
people asking about it. The Luminous docs added a large section about how
it is probably not what you want. Like me, it is not saying that there are
no use cases for it. There was no information provided about the use case
and I made some suggestions/guesses. I'm also guessing that they are
following a guide where a writeback cache was necessary for CephFS to use
EC prior to Luminous. I also usually add that people should test it out and
find what works best for them. I will always defer to your practical use of
cache tiers as well, especially when using rbds.

I manage a cluster that I intend to continue running a writeback cache in
front of CephFS on the same drives as the EC pool. The use case receives a
good enough benefit from the cache tier that it isn't even required to use
flash media to see it. It is used for video editing and the files are
usually modified and read within the first 24 hours and then left in cold
storage until deleted. I have the cache timed to keep everything in it for
24 hours and then evict it by using a minimum time to flush and evict at 24
hours and a target max bytes of 0. All files are in there for that time and
then it never has to decide what to keep as it doesn't keep anything longer
than that. Luckily read performance from cold storage is not a requirement
of this cluster as any read operation has to first read it from EC storage,
write it to replica storage, and then read it from replica storage... Yuck.

>
> Christian
>
> >What is your goal by implementing this cache? If the
> > answer is to utilize extra space on the nvmes, then just remove it and
> say
> > thank you. The better use of nvmes in that case are as a part of the
> > bluestore stack and give your osds larger DB partitions. Keeping your
> > metadata pool on nvmes is still a good idea.
> >
> > On Thu, Oct 5, 2017, 7:45 PM Shawfeng Dong <shaw at ucsc.edu> wrote:
> >
> > > Dear all,
> > >
> > > We just set up a Ceph cluster, running the latest stable release Ceph
> > > v12.2.0 (Luminous):
> > > # ceph --version
> > > ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
> > > (rc)
> > >
> > > The goal is to serve Ceph filesystem, for which we created 3 pools:
> > > # ceph osd lspools
> > > 1 cephfs_data,2 cephfs_metadata,3 cephfs_cache,
> > > where
> > > * cephfs_data is the data pool (36 OSDs on HDDs), which is
> erased-coded;
> > > * cephfs_metadata is the metadata pool
> > > * cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The
> > > cache-mode is writeback.
> > >
> > > Everything had worked fine, until today when we tried to copy a 1.3TB
> file
> > > to the CephFS.  We got the "No space left on device" error!
> > >
> > > 'ceph -s' says some OSDs are full:
> > > # ceph -s
> > >   cluster:
> > >     id:     e18516bf-39cb-4670-9f13-88ccb7d19769
> > >     health: HEALTH_ERR
> > >             full flag(s) set
> > >             1 full osd(s)
> > >             1 pools have many more objects per pg than average
> > >
> > >   services:
> > >     mon: 3 daemons, quorum pulpo-admin,pulpo-mon01,pulpo-mds01
> > >     mgr: pulpo-mds01(active), standbys: pulpo-admin, pulpo-mon01
> > >     mds: pulpos-1/1/1 up  {0=pulpo-mds01=up:active}
> > >     osd: 39 osds: 39 up, 39 in
> > >          flags full
> > >
> > >   data:
> > >     pools:   3 pools, 2176 pgs
> > >     objects: 347k objects, 1381 GB
> > >     usage:   2847 GB used, 262 TB / 265 TB avail
> > >     pgs:     2176 active+clean
> > >
> > >   io:
> > >     client:   19301 kB/s rd, 2935 op/s rd, 0 op/s wr
> > >
> > > And indeed the cache pool is full:
> > > # rados df
> > > POOL_NAME       USED  OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND
> > > DEGRADED RD_OPS   RD
> > >     WR_OPS  WR
> > > cephfs_cache    1381G  355385      0 710770                  0       0
> > >     0 10004954 15
> > > 22G 1398063  1611G
> > > cephfs_data         0       0      0      0                  0       0
> > >     0        0
> > >   0       0      0
> > > cephfs_metadata 8515k      24      0     72                  0       0
> > >     0        3  3
> > > 072    3953 10541k
> > >
> > > total_objects    355409
> > > total_used       2847G
> > > total_avail      262T
> > > total_space      265T
> > >
> > > However, the data pool is completely empty! So it seems that data has
> only
> > > been written to the cache pool, but not written back to the data pool.
> > >
> > > I am really at a loss whether this is due to a setup error on my part,
> or
> > > a Luminous bug. Could anyone shed some light on this? Please let me
> know if
> > > you need any further info.
> > >
> > > Best,
> > > Shaw
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users at lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
>
>
> --
> Christian Balzer        Network/Systems Engineer
> chibi at gol.com           Rakuten Communications
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171006/6d5dad6a/attachment.html>


More information about the ceph-users mailing list