[ceph-users] Resolving Large omap objects in RGW index pool

Tomasz Płaza tomasz.plaza at grupawp.pl
Wed Oct 17 04:11:32 PDT 2018


Hi,

I have a similar issue, and created a simple bash file to delete old 
indexes (it is PoC and have not been tested on production):

for bucket in `radosgw-admin metadata list bucket | jq -r '.[]' | sort`
do
   actual_id=`radosgw-admin bucket stats --bucket=${bucket} | jq -r '.id'`
   for instance in `radosgw-admin metadata list bucket.instance | jq -r 
'.[]' | grep ${bucket}: | cut -d ':' -f 2`
   do
     if [ "$actual_id" != "$instance" ]
     then
       radosgw-admin bi purge --bucket=${bucket} --bucket-id=${instance}
       radosgw-admin metadata rm bucket.instance:${bucket}:${instance}
     fi
   done
done

I find it more readable than mentioned one liner. Any sugestions on this 
topic are greatly appreciated.
Tom

> Hi,
>
> Having spent some time on the below issue, here are the steps I took 
> to resolve the "Large omap objects" warning. Hopefully this will help 
> others who find themselves in this situation.
>
> I got the object ID and OSD ID implicated from the ceph cluster 
> logfile on the mon.  I then proceeded to the implicated host 
> containing the OSD, and extracted the implicated PG by running the 
> following, and looking at which PG had started and completed a 
> deep-scrub around the warning being logged:
>
> grep -C 200 Large /var/log/ceph/ceph-osd.*.log | egrep '(Large 
> omap|deep-scrub)'
>
> If the bucket had not been sharded sufficiently (IE the cluster log 
> showed a "Key Count" or "Size" over the thresholds), I ran through the 
> manual sharding procedure (shown here: 
> https://tracker.ceph.com/issues/24457#note-5)
>
> Once this was successfully sharded, or if the bucket was previously 
> sufficiently sharded by Ceph prior to disabling the functionality I 
> was able to use the following command (seemingly undocumented for 
> Luminous http://docs.ceph.com/docs/mimic/man/8/radosgw-admin/#commands):
>
> radosgw-admin bi purge --bucket ${bucketname} --bucket-id ${old_bucket_id}
>
> I then issued a ceph pg deep-scrub against the PG that had contained 
> the Large omap object.
>
> Once I had completed this procedure, my Large omap object warnings 
> went away and the cluster returned to HEALTH_OK.
>
> However our radosgw bucket indexes pool now seems to be using 
> substantially more space than previously.  Having looked initially at 
> this bug, and in particular the first comment:
>
> http://tracker.ceph.com/issues/34307#note-1
>
> I was able to extract a number of bucket indexes that had apparently 
> been resharded, and removed the legacy index using the radosgw-admin 
> bi purge --bucket ${bucket} ${marker}.  I am still able  to perform a 
> radosgw-admin metadata get bucket.instance:${bucket}:${marker} 
> successfully, however now when I run rados -p .rgw.buckets.index ls | 
> grep ${marker} nothing is returned.  Even after this, we were still 
> seeing extremely high disk usage of our OSDs containing the bucket 
> indexes (we have a dedicated pool for this).  I then modified the one 
> liner referenced in the previous link as follows:
>
>  grep -E '"bucket"|"id"|"marker"' bucket-stats.out | awk -F ":" 
> '{print $2}' | tr -d '",' | while read -r bucket; do read -r id; read 
> -r marker; [ "$id" == "$marker" ] && true || NEWID=`radosgw-admin --id 
> rgw.ceph-rgw-1 metadata get bucket.instance:${bucket}:${marker} | 
> python -c 'import sys, json; print 
> json.load(sys.stdin)["data"]["bucket_info"]["new_bucket_instance_id"]'`; 
> while [ ${NEWID} ]; do if [ "${NEWID}" != "${marker}" ] && [ ${NEWID} 
> != ${bucket} ] ; then echo "$bucket $NEWID"; fi; NEWID=`radosgw-admin 
> --id rgw.ceph-rgw-1 metadata get bucket.instance:${bucket}:${NEWID} | 
> python -c 'import sys, json; print 
> json.load(sys.stdin)["data"]["bucket_info"]["new_bucket_instance_id"]'`; 
> done; done > buckets_with_multiple_reindexes2.txt
>
> This loops through the buckets that have a different marker/bucket_id, 
> and looks to see if a new_bucket_instance_id is there, and if so will 
> loop through until there is no longer a "new_bucket_instance_id".  
> After letting this complete, this suggests that I have over 5000 
> indexes for 74 buckets, some of these buckets have > 100 indexes 
> apparently.
>
> :~# awk '{print $1}' buckets_with_multiple_reindexes2.txt | uniq | wc -l
> 74
> ~# wc -l buckets_with_multiple_reindexes2.txt
> 5813 buckets_with_multiple_reindexes2.txt
>
> This is running a single realm, multiple zone configuration, and no 
> multi site sync, but the closest I can find to this issue is this bug 
> https://tracker.ceph.com/issues/24603
>
> Should I be OK to loop through these indexes and remove any with a 
> reshard_status of 2, a new_bucket_instance_id that does not match the 
> bucket_instance_id returned by the command:
>
> radosgw-admin bucket stats --bucket ${bucket}
>
> I'd ideally like to get to a point where I can turn dynamic sharding 
> back on safely for this cluster.
>
> Thanks for any assistance, let me know if there's any more information 
> I should provide
> Chris
>
> On Thu, 4 Oct 2018 at 18:22 Chris Sarginson <csargiso at gmail.com 
> <mailto:csargiso at gmail.com>> wrote:
>
>     Hi,
>
>     Thanks for the response - I am still unsure as to what will happen
>     to the "marker" reference in the bucket metadata, as this is the
>     object that is being detected as Large.  Will the bucket generate
>     a new "marker" reference in the bucket metadata?
>
>     I've been reading this page to try and get a better understanding
>     of this
>     http://docs.ceph.com/docs/luminous/radosgw/layout/
>
>     However I'm no clearer on this (and what the "marker" is used
>     for), or why there are multiple separate "bucket_id" values (with
>     different mtime stamps) that all show as having the same number of
>     shards.
>
>     If I were to remove the old bucket would I just be looking to execute
>
>     rados - p .rgw.buckets.index rm .dir.default.5689810.107
>
>     Is the differing marker/bucket_id in the other buckets that was
>     found also an indicator?  As I say, there's a good number of
>     these, here's some additional examples, though these aren't
>     necessarily reporting as large omap objects:
>
>     "BUCKET1", "default.281853840.479", "default.105206134.5",
>     "BUCKET2", "default.364663174.1", "default.349712129.3674",
>
>     Checking these other buckets, they are exhibiting the same sort of
>     symptoms as the first (multiple instances of radosgw-admin
>     metadata get showing what seem to be multiple resharding processes
>     being run, with different mtimes recorded).
>
>     Thanks
>     Chris
>
>     On Thu, 4 Oct 2018 at 16:21 Konstantin Shalygin <k0ste at k0ste.ru
>     <mailto:k0ste at k0ste.ru>> wrote:
>
>>         Hi,
>>
>>         Ceph version: Luminous 12.2.7
>>
>>         Following upgrading to Luminous from Jewel we have been stuck with a
>>         cluster in HEALTH_WARN state that is complaining about large omap objects.
>>         These all seem to be located in our .rgw.buckets.index pool.  We've
>>         disabled auto resharding on bucket indexes due to seeming looping issues
>>         after our upgrade.  We've reduced the number reported of reported large
>>         omap objects by initially increasing the following value:
>>
>>         ~# ceph daemon mon.ceph-mon-1 config get
>>         osd_deep_scrub_large_omap_object_value_sum_threshold
>>         {
>>              "osd_deep_scrub_large_omap_object_value_sum_threshold": "2147483648 <tel:%28214%29%20748-3648>"
>>         }
>>
>>         However we're still getting a warning about a single large OMAP object,
>>         however I don't believe this is related to an unsharded index - here's the
>>         log entry:
>>
>>         2018-10-01 13:46:24.427213 osd.477 osd.477172.26.216.6:6804/2311858 <http://172.26.216.6:6804/2311858>  8482 :
>>         cluster [WRN] Large omap object found. Object:
>>         15:333d5ad7:::.dir.default.5689810.107:head Key count: 17467251 Size
>>         (bytes):4458647149 <tel:%28445%29%20864-7149>
>>
>>         The object in the logs is the "marker" object, rather than the bucket_id -
>>         I've put some details regarding the bucket here:
>>
>>         https://pastebin.com/hW53kTxL
>>
>>         The bucket limit check shows that the index is sharded, so I think this
>>         might be related to versioning, although I was unable to get confirmation
>>         that the bucket in question has versioning enabled through the aws
>>         cli(snipped debug output below)
>>
>>         2018-10-02 15:11:17,530 - MainThread - botocore.parsers - DEBUG - Response
>>         headers: {'date': 'Tue, 02 Oct 2018 14:11:17 GMT', 'content-length': '137',
>>         'x-amz-request-id': 'tx0000000000000020e3b15-005bb37c85-15870fe0-default',
>>         'content-type': 'application/xml'}
>>         2018-10-02 15:11:17,530 - MainThread - botocore.parsers - DEBUG - Response
>>         body:
>>         <?xml version="1.0" encoding="UTF-8"?><VersioningConfiguration xmlns="
>>         http://s3.amazonaws.com/doc/2006-03-01/"></VersioningConfiguration>
>>
>>         After dumping the contents of large omap object mentioned above into a file
>>         it does seem to be a simple listing of the bucket contents, potentially an
>>         old index:
>>
>>         ~# wc -l omap_keys
>>         17467251 omap_keys
>>
>>         This is approximately 5 million below the currently reported number of
>>         objects in the bucket.
>>
>>         When running the commands listed here:
>>         http://tracker.ceph.com/issues/34307#note-1
>>
>>         The problematic bucket is listed in the output (along with 72 other
>>         buckets):
>>         "CLIENTBUCKET", "default.294495648.690", "default.5689810.107"
>>
>>         As this tests for bucket_id and marker fields not matching to print out the
>>         information, is the implication here that both of these should match in
>>         order to fully migrate to the new sharded index?
>>
>>         I was able to do a "metadata get" using what appears to be the old index
>>         object ID, which seems to support this (there's a "new_bucket_instance_id"
>>         field, containing a newer "bucket_id" and reshard_status is 2, which seems
>>         to suggest it has completed).
>>
>>         I am able to take the "new_bucket_instance_id" and get additional metadata
>>         about the bucket, each time I do this I get a slightly newer
>>         "new_bucket_instance_id", until it stops suggesting updated indexes.
>>
>>         It's probably worth pointing out that when going through this process the
>>         final "bucket_id" doesn't match the one that I currently get when running
>>         'radosgw-admin bucket stats --bucket "CLIENTBUCKET"', even though it also
>>         suggests that no further resharding has been done as "reshard_status" = 0
>>         and "new_bucket_instance_id" is blank.  The output is available to view
>>         here:
>>
>>         https://pastebin.com/g1TJfKLU
>>
>>         It would be useful if anyone can offer some clarification on how to proceed
>>         from this situation, identifying and removing any old/stale indexes from
>>         the index pool (if that is the case), as I've not been able to spot
>>         anything in the archives.
>>
>>         If there's any further information that is needed for additional context
>>         please let me know.
>
>
>         Usually, when you bucket is automatically resharded in some
>         case old big index is not deleted - this is your large omap
>         object.
>
>         This index is safe to delete. Also look at [1].
>
>
>         [1] https://tracker.ceph.com/issues/24457
>
>
>
>         k
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181017/6ec0fbae/attachment.html>


More information about the ceph-users mailing list