[ceph-users] RGW: ERROR: failed to distribute cache
yehuda at redhat.com
Mon Nov 6 11:17:54 PST 2017
On Mon, Nov 6, 2017 at 7:29 AM, Wido den Hollander <wido at 42on.com> wrote:
> On a Ceph Luminous (12.2.1) environment I'm seeing RGWs stall and about the same time I see these errors in the RGW logs:
> 2017-11-06 15:50:24.859919 7f8f5fa1a700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:.bucket.meta.XXXXX:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.20
> 2017-11-06 15:50:41.768881 7f8f7824b700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:XXXXX
> 2017-11-06 15:55:15.781739 7f8f7824b700 0 ERROR: failed to distribute cache for gn1-pf.rgw.meta:.meta:bucket.instance:XXXXX:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.32:_XK5LExyXXXXX6EEIXxCD5Cws:1
> 2017-11-06 15:55:25.784404 7f8f7824b700 0 ERROR: failed to distribute cache for gn1-pf.rgw.data.root:.bucket.meta.XXXXX:eb32b1ca-807a-4867-aea5-ff43ef7647c6.20755572.32
> I see one message from a year ago: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/010531.html
> The setup has two RGWs running:
> - ceph-rgw1
> - ceph-rgw2
> While trying to figure this out I see that a "radosgw-admin period pull" hangs for ever.
> I don't know if that is related, but it's something I've noticed.
> Mainly I see that at random times the RGW stalls for about 30 seconds and while that happens these messages show up in the RGW's log.
do you happen to know if there's a dynamic resharding happening? The
dynamic resharding should only affect the writes to the specific
bucket, and should not affect cache distribution though. Originally I
thought it could be HUP signal related issue, but that seem to be
fixed in 12.2.1.
> Is anybody else running into this issue?
> ceph-users mailing list
> ceph-users at lists.ceph.com
More information about the ceph-users