[ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

David Turner drakonstein at gmail.com
Fri Nov 2 07:17:39 PDT 2018


Pavan, which version of Ceph were you using when you changed your backend
to rocksdb?

On Mon, Oct 1, 2018 at 4:24 PM Pavan Rallabhandi <
PRallabhandi at walmartlabs.com> wrote:

> Yeah, I think this is something to do with the CentOS binaries, sorry that
> I couldn’t be of much help here.
>
> Thanks,
> -Pavan.
>
> From: David Turner <drakonstein at gmail.com>
> Date: Monday, October 1, 2018 at 1:37 PM
> To: Pavan Rallabhandi <PRallabhandi at walmartlabs.com>
> Cc: ceph-users <ceph-users at lists.ceph.com>
> Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> I tried modifying filestore_rocksdb_options
> by removing compression=kNoCompression as well as setting it
> to compression=kSnappyCompression.  Leaving it with kNoCompression or
> removing it results in the same segfault in the previous log.  Setting it
> to kSnappyCompression resulted in [1] this being logged and the OSD just
> failing to start instead of segfaulting.  Is there anything else you would
> suggest trying before I purge this OSD from the cluster?  I'm afraid it
> might be something with the CentOS binaries.
>
> [1] 2018-10-01 17:10:37.134930 7f1415dfcd80  0  set rocksdb option
> compression = kSnappyCompression
> 2018-10-01 17:10:37.134986 7f1415dfcd80 -1 rocksdb: Invalid argument:
> Compression type Snappy is not linked with the binary.
> 2018-10-01 17:10:37.135004 7f1415dfcd80 -1
> filestore(/var/lib/ceph/osd/ceph-1) mount(1723): Error initializing rocksdb
> :
> 2018-10-01 17:10:37.135020 7f1415dfcd80 -1 osd.1 0 OSD:init: unable to
> mount object store
> 2018-10-01 17:10:37.135029 7f1415dfcd80 -1 ESC[0;31m ** ERROR: osd init
> failed: (1) Operation not permittedESC[0m
>
> On Sat, Sep 29, 2018 at 1:57 PM Pavan Rallabhandi <mailto:
> PRallabhandi at walmartlabs.com> wrote:
> I looked at one of my test clusters running Jewel on Ubuntu 16.04, and
> interestingly I found this(below) in one of the OSD logs, which is
> different from your OSD boot log, where none of the compression algorithms
> seem to be supported. This hints more at how rocksdb was built on CentOS
> for Ceph.
>
> 2018-09-29 17:38:38.629112 7fbd318d4b00  4 rocksdb: Compression algorithms
> supported:
> 2018-09-29 17:38:38.629112 7fbd318d4b00  4 rocksdb:     Snappy supported: 1
> 2018-09-29 17:38:38.629113 7fbd318d4b00  4 rocksdb:     Zlib supported: 1
> 2018-09-29 17:38:38.629113 7fbd318d4b00  4 rocksdb:     Bzip supported: 0
> 2018-09-29 17:38:38.629114 7fbd318d4b00  4 rocksdb:     LZ4 supported: 0
> 2018-09-29 17:38:38.629114 7fbd318d4b00  4 rocksdb:     ZSTD supported: 0
> 2018-09-29 17:38:38.629115 7fbd318d4b00  4 rocksdb: Fast CRC32 supported: 0
>
> On 9/27/18, 2:56 PM, "Pavan Rallabhandi" <mailto:
> PRallabhandi at walmartlabs.com> wrote:
>
>     I see Filestore symbols on the stack, so the bluestore config doesn’t
> affect. And the top frame of the stack hints at a RocksDB issue, and there
> are a whole lot of these too:
>
>     “2018-09-17 19:23:06.480258 7f1f3d2a7700  2 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/table/block_based_table_reader.cc:636]
> Cannot find Properties block from file.”
>
>     It really seems to be something with RocksDB on centOS. I still think
> you can try removing “compression=kNoCompression” from the
> filestore_rocksdb_options And/Or check if rocksdb is expecting snappy to be
> enabled.
>
>     Thanks,
>     -Pavan.
>
>     From: David Turner <mailto:drakonstein at gmail.com>
>     Date: Thursday, September 27, 2018 at 1:18 PM
>     To: Pavan Rallabhandi <mailto:PRallabhandi at walmartlabs.com>
>     Cc: ceph-users <mailto:ceph-users at lists.ceph.com>
>     Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
>     I got pulled away from this for a while.  The error in the log is
> "abort: Corruption: Snappy not supported or corrupted Snappy compressed
> block contents" and the OSD has 2 settings set to snappy by default,
> async_compressor_type and bluestore_compression_algorithm.  Do either of
> these settings affect the omap store?
>
>     On Wed, Sep 19, 2018 at 2:33 PM Pavan Rallabhandi <mailto:mailto:
> PRallabhandi at walmartlabs.com> wrote:
>     Looks like you are running on CentOS, fwiw. We’ve successfully ran the
> conversion commands on Jewel, Ubuntu 16.04.
>
>     Have a feel it’s expecting the compression to be enabled, can you try
> removing “compression=kNoCompression” from the filestore_rocksdb_options?
> And/or you might want to check if rocksdb is expecting snappy to be enabled.
>
>     From: David Turner <mailto:mailto:drakonstein at gmail.com>
>     Date: Tuesday, September 18, 2018 at 6:01 PM
>     To: Pavan Rallabhandi <mailto:mailto:PRallabhandi at walmartlabs.com>
>     Cc: ceph-users <mailto:mailto:ceph-users at lists.ceph.com>
>     Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
>     Here's the [1] full log from the time the OSD was started to the end
> of the crash dump.  These logs are so hard to parse.  Is there anything
> useful in them?
>
>     I did confirm that all perms were set correctly and that the
> superblock was changed to rocksdb before the first time I attempted to
> start the OSD with it's new DB.  This is on a fully Luminous cluster with
> [2] the defaults you mentioned.
>
>     [1]
> https://gist.github.com/drakonstein/fa3ac0ad9b2ec1389c957f95e05b79ed
>     [2] "filestore_omap_backend": "rocksdb",
>     "filestore_rocksdb_options":
> "max_background_compactions=8,compaction_readahead_size=2097152,compression=kNoCompression",
>
>     On Tue, Sep 18, 2018 at 5:29 PM Pavan Rallabhandi <mailto:mailto
> :mailto:mailto:PRallabhandi at walmartlabs.com> wrote:
>     I meant the stack trace hints that the superblock still has leveldb in
> it, have you verified that already?
>
>     On 9/18/18, 5:27 PM, "Pavan Rallabhandi" <mailto:mailto:mailto:mailto:
> PRallabhandi at walmartlabs.com> wrote:
>
>         You should be able to set them under the global section and that
> reminds me, since you are on Luminous already, I guess those values are
> already the default, you can verify from the admin socket of any OSD.
>
>         But the stack trace didn’t hint as if the superblock on the OSD is
> still considering the omap backend to be leveldb and to do with the
> compression.
>
>         Thanks,
>         -Pavan.
>
>         From: David Turner <mailto:mailto:mailto:mailto:
> drakonstein at gmail.com>
>         Date: Tuesday, September 18, 2018 at 5:07 PM
>         To: Pavan Rallabhandi <mailto:mailto:mailto:mailto:
> PRallabhandi at walmartlabs.com>
>         Cc: ceph-users <mailto:mailto:mailto:mailto:
> ceph-users at lists.ceph.com>
>         Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes
> the cluster unusable and takes forever
>
>         Are those settings fine to have be global even if not all OSDs on
> a node have rocksdb as the backend?  Or will I need to convert all OSDs on
> a node at the same time?
>
>         On Tue, Sep 18, 2018 at 5:02 PM Pavan Rallabhandi <mailto:mailto
> :mailto:mailto:mailto:mailto:mailto:PRallabhandi at walmartlabs.com> wrote:
>         The steps that were outlined for conversion are correct, have you
> tried setting some the relevant ceph conf values too:
>
>         filestore_rocksdb_options =
> "max_background_compactions=8;compaction_readahead_size=2097152;compression=kNoCompression"
>
>         filestore_omap_backend = rocksdb
>
>         Thanks,
>         -Pavan.
>
>         From: ceph-users <mailto:mailto:mailto:mailto:mailto:mailto
> :mailto:ceph-users-bounces at lists.ceph.com> on behalf of David Turner
> <mailto:mailto:mailto:mailto:mailto:mailto:mailto:drakonstein at gmail.com>
>         Date: Tuesday, September 18, 2018 at 4:09 PM
>         To: ceph-users <mailto:mailto:mailto:mailto:mailto:mailto:mailto:
> ceph-users at lists.ceph.com>
>         Subject: EXT: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
>         I've finally learned enough about the OSD backend track down this
> issue to what I believe is the root cause.  LevelDB compaction is the
> common thread every time we move data around our cluster.  I've ruled out
> PG subfolder splitting, EC doesn't seem to be the root cause of this, and
> it is cluster wide as opposed to specific hardware.
>
>         One of the first things I found after digging into leveldb omap
> compaction was [1] this article with a heading "RocksDB instead of LevelDB"
> which mentions that leveldb was replaced with rocksdb as the default db
> backend for filestore OSDs and was even backported to Jewel because of the
> performance improvements.
>
>         I figured there must be a way to be able to upgrade an OSD to use
> rocksdb from leveldb without needing to fully backfill the entire OSD.
> There is [2] this article, but you need to have an active service account
> with RedHat to access it.  I eventually came across [3] this article about
> optimizing Ceph Object Storage which mentions a resolution to OSDs flapping
> due to omap compaction to migrate to using rocksdb.  It links to the RedHat
> article, but also has [4] these steps outlined in it.  I tried to follow
> the steps, but the OSD I tested this on was unable to start with [5] this
> segfault.  And then trying to move the OSD back to the original LevelDB
> omap folder resulted in [6] this in the log.  I apologize that all of my
> logging is with log level 1.  If needed I can get some higher log levels.
>
>         My Ceph version is 12.2.4.  Does anyone have any suggestions for
> how I can update my filestore backend from leveldb to rocksdb?  Or if
> that's the wrong direction and I should be looking elsewhere?  Thank you.
>
>
>         [1] https://ceph.com/community/new-luminous-rados-improvements/
>         [2] https://access.redhat.com/solutions/3210951
>         [3]
> https://hubb.blob.core.windows.net/c2511cea-81c5-4386-8731-cc444ff806df-public/resources/Optimize
> Ceph object storage for production in multisite clouds.pdf
>
>         [4] ■ Stop the OSD
>         ■ mv /var/lib/ceph/osd/ceph-/current/omap
> /var/lib/ceph/osd/ceph-/omap.orig
>         ■ ulimit -n 65535
>         ■ ceph-kvstore-tool leveldb /var/lib/ceph/osd/ceph-/omap.orig
> store-copy /var/lib/ceph/osd/ceph-/current/omap 10000 rocksdb
>         ■ ceph-osdomap-tool --omap-path
> /var/lib/ceph/osd/ceph-/current/omap --command check
>         ■ sed -i s/leveldb/rocksdb/g /var/lib/ceph/osd/ceph-/superblock
>         ■ chown ceph.ceph /var/lib/ceph/osd/ceph-/current/omap -R
>         ■ cd /var/lib/ceph/osd/ceph-; rm -rf omap.orig
>         ■ Start the OSD
>
>         [5] 2018-09-17 19:23:10.826227 7f1f3f2ab700 -1 abort: Corruption:
> Snappy not supported or corrupted Snappy compressed block contents
>         2018-09-17 19:23:10.830525 7f1f3f2ab700 -1 *** Caught signal
> (Aborted) **
>
>         [6] 2018-09-17 19:27:34.010125 7fcdee97cd80 -1 osd.0 0 OSD:init:
> unable to mount object store
>         2018-09-17 19:27:34.010131 7fcdee97cd80 -1 ESC[0;31m ** ERROR: osd
> init failed: (1) Operation not permittedESC[0m
>         2018-09-17 19:27:54.225941 7f7f03308d80  0 set uid:gid to 167:167
> (ceph:ceph)
>         2018-09-17 19:27:54.225975 7f7f03308d80  0 ceph version 12.2.4
> (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
> (unknown), pid 361535
>         2018-09-17 19:27:54.231275 7f7f03308d80  0 pidfile_write: ignore
> empty --pid-file
>         2018-09-17 19:27:54.260207 7f7f03308d80  0 load: jerasure load:
> lrc load: isa
>         2018-09-17 19:27:54.260520 7f7f03308d80  0
> filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
>         2018-09-17 19:27:54.261135 7f7f03308d80  0
> filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
>         2018-09-17 19:27:54.261750 7f7f03308d80  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
>         2018-09-17 19:27:54.261757 7f7f03308d80  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
>         2018-09-17 19:27:54.261758 7f7f03308d80  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice()
> is disabled via 'filestore splice' config option
>         2018-09-17 19:27:54.286454 7f7f03308d80  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
>         2018-09-17 19:27:54.286572 7f7f03308d80  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is
> disabled by conf
>         2018-09-17 19:27:54.287119 7f7f03308d80  0
> filestore(/var/lib/ceph/osd/ceph-0) start omap initiation
>         2018-09-17 19:27:54.287527 7f7f03308d80 -1
> filestore(/var/lib/ceph/osd/ceph-0) mount(1723): Error initializing leveldb
> : Corruption: VersionEdit: unknown tag
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181102/17578875/attachment.html>


More information about the ceph-users mailing list