[ceph-users] Deleting large pools

David Turner drakonstein at gmail.com
Tue Nov 14 11:49:51 PST 2017

2 weeks later and things are still deleting, but getting really close to
being done.  I tried to use ceph-objectstore-tool to remove one of the
PGs.  I only tested on 1 PG on 1 OSD, but it's doing something really
weird.  While it was running, my connection to the DC reset and the command
died.  Now when I try to run the tool it segfaults and just running the OSD
it doesn't try to delete the data.  The data in this PG does not matter and
I figure the worst case scenario is that it just sits there taking up 200GB
until I redeploy the OSD.

However, I like to learn things about Ceph.  Is there anyone with any
insight to what is happening with this PG?

[root at osd1 ~] # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0
--journal-path /var/lib/ceph/osd/ceph-0/journal --pgid 97.314s0 --op remove
SG_IO: questionable sense data, results may be incorrect
SG_IO: questionable sense data, results may be incorrect
 marking collection for removal
mark_pg_for_removal warning: peek_map_epoch reported error
terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
  what():  buffer::end_of_buffer
*** Caught signal (Aborted) **
 in thread 7f98ab2dc980 thread_name:ceph-objectstor
 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
 1: (()+0x95209a) [0x7f98abc4b09a]
 2: (()+0xf100) [0x7f98a91d7100]
 3: (gsignal()+0x37) [0x7f98a7d825f7]
 4: (abort()+0x148) [0x7f98a7d83ce8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f98a86879d5]
 6: (()+0x5e946) [0x7f98a8685946]
 7: (()+0x5e973) [0x7f98a8685973]
 8: (()+0x5eb93) [0x7f98a8685b93]
 9: (ceph::buffer::list::iterator_impl<false>::copy(unsigned int,
char*)+0xa5) [0x7f98abd498a5]
 10: (PG::read_info(ObjectStore*, spg_t, coll_t const&,
ceph::buffer::list&, pg_info_t&, std::map<unsigned int, pg_interval_t,
std::less<unsigned int>, std::allocator<std::pair<unsigned int const,
pg_interval_t> > >&, unsigned char&)+0x324) [0x7f98ab6d3094]
 11: (mark_pg_for_removal(ObjectStore*, spg_t,
ObjectStore::Transaction*)+0x87c) [0x7f98ab66615c]
 12: (initiate_new_remove_pg(ObjectStore*, spg_t,
ObjectStore::Sequencer&)+0x131) [0x7f98ab666a51]
 13: (main()+0x39b7) [0x7f98ab610437]
 14: (__libc_start_main()+0xf5) [0x7f98a7d6eb15]
 15: (()+0x363a57) [0x7f98ab65ca57]

On Thu, Nov 2, 2017 at 12:45 PM Gregory Farnum <gfarnum at redhat.com> wrote:

> Deletion is throttled, though I don’t know the configs to change it you
> could poke around if you want stuff to go faster.
> Don’t just remove the directory in the filesystem; you need to clean up
> the leveldb metadata as well. ;)
> Removing the pg via Ceph-objectstore-tool would work fine but I’ve seen
> too many people kill the wrong thing to recommend it.
> -Greg
> On Thu, Nov 2, 2017 at 9:40 AM David Turner <drakonstein at gmail.com> wrote:
>> Jewel 10.2.7; XFS formatted OSDs; no dmcrypt or LVM.  I have a pool that
>> I deleted 16 hours ago that accounted for about 70% of the available space
>> on each OSD (averaging 84% full), 370M objects in 8k PGs, ec 4+2 profile.
>> Based on the rate that the OSDs are freeing up space after deleting the
>> pool, it will take about a week to finish deleting the PGs from the OSDs.
>> Is there anything I can do to speed this process up?  I feel like there
>> may be a way for me to go through the OSDs and delete the PG folders either
>> with the objectstore tool or while the OSD is offline.  I'm not sure what
>> Ceph is doing to delete the pool, but I don't think that an `rm -Rf` of the
>> PG folder would take nearly this long.
>> Thank you all for your help.
> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171114/027048f7/attachment.html>

More information about the ceph-users mailing list