[ceph-users] radosgw sync falling behind regularly

Trey Palmer nerdmagicatl at gmail.com
Mon Mar 11 13:38:42 PDT 2019


HI Casey,

We're still trying to figure this sync problem out, if you could possibly
tell us anything further we would be deeply grateful!

Our errors are coming from 'data sync'.   In `sync status` we pretty
constantly show one shard behind, but a different one each time we run it.

Here's a paste -- these commands were run in rapid succession.

root at sv3-ceph-rgw1:~# radosgw-admin sync status
          realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
      zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
           zone 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
                source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
root at sv3-ceph-rgw1:~# radosgw-admin sync status
          realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
      zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
           zone 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 1 shards
                        behind shards: [30]
                        oldest incremental change not applied: 2019-01-19
22:53:23.0.16109s
                source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
root at sv3-ceph-rgw1:~#


Below I'm pasting a small section of log.  Thanks so much for looking!

Trey Palmer


root at sv3-ceph-rgw1:/var/log/ceph# tail -f ceph-rgw-sv3-ceph-rgw1.log | grep
-i error
2019-03-08 11:43:07.208572 7fa080cc7700  0 data sync: ERROR: failed to read
remote data log info: ret=-2
2019-03-08 11:43:07.211348 7fa080cc7700  0 meta sync: ERROR:
RGWBackoffControlCR called coroutine returned -2
2019-03-08 11:43:07.267117 7fa080cc7700  0 data sync: ERROR: failed to read
remote data log info: ret=-2
2019-03-08 11:43:07.269631 7fa080cc7700  0 meta sync: ERROR:
RGWBackoffControlCR called coroutine returned -2
2019-03-08 11:43:07.895192 7fa080cc7700  0 data sync: ERROR: init sync on
dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.046685 7fa080cc7700  0 data sync: ERROR: init sync on
dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.171277 7fa0870eb700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.phowe_superset:phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
2019-03-08 11:43:08.171748 7fa0850e7700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.gdfp_dev:gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
2019-03-08 11:43:08.175867 7fa08a0f1700  0 meta sync: ERROR: can't remove
key:
bucket.instance:phowe_superset/phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
ret=-2
2019-03-08 11:43:08.176755 7fa0820e1700  0 data sync: ERROR: init sync on
whoiswho/whoiswho:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.293 failed,
retcode=-2
2019-03-08 11:43:08.176872 7fa0820e1700  0 data sync: ERROR: init sync on
dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.176885 7fa093103700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.phowe_superset:phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
2019-03-08 11:43:08.176925 7fa0820e1700  0 data sync: ERROR: failed to
retrieve bucket info for
bucket=phowe_superset/phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
2019-03-08 11:43:08.177916 7fa0910ff700  0 meta sync: ERROR: can't remove
key:
bucket.instance:gdfp_dev/gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
ret=-2
2019-03-08 11:43:08.178815 7fa08b0f3700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.gdfp_dev:gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
2019-03-08 11:43:08.178847 7fa0820e1700  0 data sync: ERROR: failed to
retrieve bucket info for
bucket=gdfp_dev/gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
2019-03-08 11:43:08.179492 7fa0820e1700  0 data sync: ERROR: init sync on
adcreative/adcreative:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.21 failed,
retcode=-2
2019-03-08 11:43:08.179529 7fa0820e1700  0 data sync: ERROR: init sync on
vulnerability_report/vulnerability-report:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.421
failed, retcode=-2
2019-03-08 11:43:08.179770 7fa0820e1700  0 data sync: ERROR: init sync on
early_osquery/early-osquery:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.339
failed, retcode=-2
2019-03-08 11:43:08.217393 7fa0820e1700  0 data sync: ERROR: init sync on
bugsnag_integration/bugsnag-integration:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.328
failed, retcode=-2
2019-03-08 11:43:08.233847 7fa0820e1700  0 data sync: ERROR: init sync on
vulnerability_report/vulnerability-report:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.421
failed, retcode=-2
2019-03-08 11:43:08.233917 7fa0820e1700  0 data sync: ERROR: init sync on
dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.233998 7fa0820e1700  0 data sync: ERROR: init sync on
early_osquery/early-osquery:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.339
failed, retcode=-2
2019-03-08 11:43:08.273391 7fa0820e1700  0 data sync: ERROR: init sync on
bugsnag_integration/bugsnag-integration:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.328
failed, retcode=-2
2019-03-08 11:43:08.745150 7fa0840e5700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.event_dashboard:event_dashboard:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.148
2019-03-08 11:43:08.745408 7fa08c0f5700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.produktizr_doc:produktizr_doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.241
2019-03-08 11:43:08.749571 7fa0820e1700  0 data sync: ERROR: init sync on
ceph/ceph:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.427 failed, retcode=-2
2019-03-08 11:43:08.750472 7fa0820e1700  0 data sync: ERROR: init sync on
terraform_dev/terraform-dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.418
failed, retcode=-2
2019-03-08 11:43:08.750508 7fa08e0f9700  0 meta sync: ERROR: can't remove
key:
bucket.instance:event_dashboard/event_dashboard:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.148
ret=-2
2019-03-08 11:43:08.751094 7fa0868ea700  0 meta sync: ERROR: can't remove
key:
bucket.instance:produktizr_doc/produktizr_doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.241
ret=-2
2019-03-08 11:43:08.751331 7fa08a8f2700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.event_dashboard:event_dashboard:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.148
2019-03-08 11:43:08.751387 7fa0820e1700  0 data sync: ERROR: failed to
retrieve bucket info for
bucket=event_dashboard/event_dashboard:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.148
2019-03-08 11:43:08.751497 7fa0820e1700  0 data sync: ERROR: init sync on
pithos_doc/pithos-doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.393
failed, retcode=-2
2019-03-08 11:43:08.751619 7fa0820e1700  0 data sync: ERROR: init sync on
jmeter_sc/jmeter-sc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.360 failed,
retcode=-2
2019-03-08 11:43:08.752037 7fa0900fd700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.produktizr_doc:produktizr_doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.241
2019-03-08 11:43:08.752063 7fa0820e1700  0 data sync: ERROR: failed to
retrieve bucket info for
bucket=produktizr_doc/produktizr_doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.241
2019-03-08 11:43:08.752462 7fa0820e1700  0 data sync: ERROR: init sync on
goinfosb/goinfosb:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.160 failed,
retcode=-2
2019-03-08 11:43:08.793707 7fa0820e1700  0 data sync: ERROR: init sync on
kafkadrm/kafkadrm:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.183 failed,
retcode=-2
2019-03-08 11:43:08.809748 7fa0820e1700  0 data sync: ERROR: init sync on
terraform_dev/terraform-dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.418
failed, retcode=-2
2019-03-08 11:43:08.809804 7fa0820e1700  0 data sync: ERROR: init sync on
pithos_doc/pithos-doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.393
failed, retcode=-2
2019-03-08 11:43:08.809917 7fa0820e1700  0 data sync: ERROR: init sync on
jmeter_sc/jmeter-sc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.360 failed,
retcode=-2
2019-03-08 11:43:09.345180 7fa0840e5700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.spins_on_the_ledger:spins_on_the_ledger:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.274
2019-03-08 11:43:09.349186 7fa0820e1700  0 data sync: ERROR: init sync on
steno/steno:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.279 failed,
retcode=-2
2019-03-08 11:43:09.349235 7fa0820e1700  0 data sync: ERROR: init sync on
adjuster_kafka/adjuster-kafka:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.308
failed, retcode=-2
2019-03-08 11:43:09.349809 7fa0820e1700  0 data sync: ERROR: init sync on
oauth/oauth:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.223 failed,
retcode=-2
2019-03-08 11:43:09.351909 7fa08d0f7700  0 meta sync: ERROR: can't remove
key:
bucket.instance:spins_on_the_ledger/spins_on_the_ledger:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.274
ret=-2
2019-03-08 11:43:09.352412 7fa0820e1700  0 data sync: ERROR: init sync on
sre_jmeter/sre-jmeter:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.635
failed, retcode=-2
2019-03-08 11:43:09.352609 7fa08f0fb700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.spins_on_the_ledger:spins_on_the_ledger:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.274
2019-03-08 11:43:09.352635 7fa0820e1700  0 data sync: ERROR: failed to
retrieve bucket info for
bucket=spins_on_the_ledger/spins_on_the_ledger:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.274
2019-03-08 11:43:09.352831 7fa0820e1700  0 data sync: ERROR: init sync on
charon_analytics/charon-analytics:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.331
failed, retcode=-2
2019-03-08 11:43:09.352903 7fa0820e1700  0 data sync: ERROR: init sync on
kafka_doc/kafka-doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.362 failed,
retcode=-2
2019-03-08 11:43:09.353337 7fa0820e1700  0 data sync: ERROR: init sync on
serversidesequencing/serversidesequencing:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.263
failed, retcode=-2
2019-03-08 11:43:09.389559 7fa0820e1700  0 data sync: ERROR: init sync on
radio_publicapi/radio-publicapi:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.401
failed, retcode=-2
2019-03-08 11:43:09.402324 7fa0820e1700  0 data sync: ERROR: init sync on
adjuster_kafka/adjuster-kafka:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.308
failed, retcode=-2
2019-03-08 11:43:09.405314 7fa0820e1700  0 data sync: ERROR: init sync on
charon_analytics/charon-analytics:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.331
failed, retcode=-2
2019-03-08 11:43:09.406046 7fa0820e1700  0 data sync: ERROR: init sync on
kafka_doc/kafka-doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.362 failed,
retcode=-2
2019-03-08 11:43:09.441428 7fa0820e1700  0 data sync: ERROR: init sync on
radio_publicapi/radio-publicapi:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.401
failed, retcode=-2






On Fri, Mar 8, 2019 at 10:29 AM Casey Bodley <cbodley at redhat.com> wrote:

> (cc ceph-users)
>
> Can you tell whether these sync errors are coming from metadata sync or
> data sync? Are they blocking sync from making progress according to your
> 'sync status'?
>
> On 3/8/19 10:23 AM, Trey Palmer wrote:
> > Casey,
> >
> > Having done the 'reshard stale-instances delete' earlier on the advice
> > of another list member, we have tons of sync errors on deleted
> > buckets, as you mention.
> >
> > After 'data sync init' we're still seeing all of these errors on
> > deleted buckets.
> >
> > Since buckets are metadata, it occurred to me this morning that
> > buckets are metadata so a 'sync init' wouldn't refresh that info.
> >  But a 'metadata sync init' might get rid of the stale bucket sync
> > info and stop the sync errors.   Would that be the way to go?
> >
> > Thanks,
> >
> > Trey
> >
> >
> >
> > On Wed, Mar 6, 2019 at 11:47 AM Casey Bodley <cbodley at redhat.com
> > <mailto:cbodley at redhat.com>> wrote:
> >
> >     Hi Trey,
> >
> >     I think it's more likely that these stale metadata entries are from
> >     deleted buckets, rather than accidental bucket reshards. When a
> >     bucket
> >     is deleted in a multisite configuration, we don't delete its bucket
> >     instance because other zones may still need to sync the object
> >     deletes -
> >     and they can't make progress on sync if the bucket metadata
> >     disappears.
> >     These leftover bucket instances look the same to the 'reshard
> >     stale-instances' commands, but I'd be cautious about using that to
> >     remove them in multisite, as it may cause more sync errors and
> >     potentially leak storage if they still contain objects.
> >
> >     Regarding 'datalog trim', that alone isn't safe because it could trim
> >     entries that hadn't been applied on other zones yet, causing them to
> >     miss some updates. What you can do is run 'data sync init' on each
> >     zone,
> >     and restart gateways. This will restart with a data full sync (which
> >     will scan all buckets for changes), and skip past any datalog entries
> >     from before the full sync. I was concerned that the bug in error
> >     handling (ie "ERROR: init sync on...") would also affect full
> >     sync, but
> >     that doesn't appear to be the case - so I do think that's worth
> >     trying.
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20190311/53603f32/attachment.html>


More information about the ceph-users mailing list