[ceph-users] RGW how to delete orphans

Florian Engelmann florian.engelmann at everyware.ch
Fri Oct 26 03:28:19 PDT 2018


Hi,

we've got the same problem here. Our 12.2.5 RadosGWs crashed 
(unrecognised by us) about 30.000 times with ongoing multipart uploads. 
After a couple of days we ended up with:

xx-1.rgw.buckets.data       6      N/A               N/A 
116TiB     87.22       17.1TiB     36264870     36.26M     3.63GiB 
148MiB       194TiB

116TB data (194TB raw) while only:

for i in $(radosgw-admin bucket list | jq -r '.[]'); do  radosgw-admin 
bucket stats --bucket=$i | jq '.usage | ."rgw.main" | .size_kb' ; done | 
awk '{ SUM += $1} END { print SUM/1024/1024/1024 }'

46.0962

116 - 46 = 70TB

So 70TB of objects are orphans, right?

And there are 36.264.870 objects in our rgw.buckets.data pool.

So we started:

radosgw-admin orphans list-jobs --extra-info
[
     {
         "orphan_search_state": {
             "info": {
                 "orphan_search_info": {
                     "job_name": "check-orph",
                     "pool": "zh-1.rgw.buckets.data",
                     "num_shards": 64,
                     "start_time": "2018-10-10 09:01:14.746436Z"
                 }
             },
             "stage": {
                 "orphan_search_stage": {
                     "search_stage": "iterate_bucket_index",
                     "shard": 0,
                     "marker": ""
                 }
             }
         }
     }
]

writing stdout to: orphans.txt

I am not sure about how to interpret the output but:

cat orphans.txt | awk '/^storing / { SUM += $2} END { print SUM }'
2145042765

So how to interpret those output lines:
...
storing 16 entries at orphan.scan.check-orph.linked.62
storing 19 entries at orphan.scan.check-orph.linked.63
storing 13 entries at orphan.scan.check-orph.linked.0
storing 13 entries at orphan.scan.check-orph.linked.1
...

Is it like

"I am storing 16 'healthy' object 'names' to the shard 
orphan.scan.check-orph.linked.62"

Is it objects? What is meant by "entries"? Where are those "shards"? Are 
they files or objects in a pool? How to know about the progress of 
"orphans find"? Is the job still doing the right thing? Time estimated 
to run on SATA disks with 194TB RAW?

The orphan find command stored already 2.145.042.765 (more than 2 
billion) "entries"... while there are "only" 36 million objects...

Is the process still healthy and doing the right thing?

All the best,
Florian





Am 10/3/17 um 10:48 AM schrieb Andreas Calminder:
> The output, to stdout, is something like leaked: $objname. Am I supposed 
> to pipe it to a log, grep for leaked: and pipe it to rados delete? Or am 
> I supposed to dig around in the log pool to try and find the objects 
> there? The information available is quite vague. Maybe Yehuda can shed 
> some light on this issue?
> 
> Best regards,
> /Andreas
> 
> On 3 Oct 2017 06:25, "Christian Wuerdig" <christian.wuerdig at gmail.com 
> <mailto:christian.wuerdig at gmail.com>> wrote:
> 
>     yes, at least that's how I'd interpret the information given in this
>     thread:
>     http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016521.html
>     <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016521.html>
> 
>     On Tue, Oct 3, 2017 at 1:11 AM, Webert de Souza Lima
>     <webert.boss at gmail.com <mailto:webert.boss at gmail.com>> wrote:
>      > Hey Christian,
>      >
>      >> On 29 Sep 2017 12:32 a.m., "Christian Wuerdig"
>      >> <christian.wuerdig at gmail.com
>     <mailto:christian.wuerdig at gmail.com>> wrote:
>      >>>
>      >>> I'm pretty sure the orphan find command does exactly just that -
>      >>> finding orphans. I remember some emails on the dev list where
>     Yehuda
>      >>> said he wasn't 100% comfortable of automating the delete just yet.
>      >>> So the purpose is to run the orphan find tool and then delete the
>      >>> orphaned objects once you're happy that they all are actually
>      >>> orphaned.
>      >>>
>      >
>      > so what you mean is that one should manually remove the result listed
>      > objects that are output?
>      >
>      >
>      > Regards,
>      >
>      > Webert Lima
>      > DevOps Engineer at MAV Tecnologia
>      > Belo Horizonte - Brasil
>      >
>      >
>      > _______________________________________________
>      > ceph-users mailing list
>      > ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>      > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>      >
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> << ATT00001.txt (0.4KB) (0.4KB) >>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5210 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181026/3d5e4c2c/attachment.bin>


More information about the ceph-users mailing list