[ceph-users] Broken upgrade from Hammer to Luminous

Gregory Farnum gfarnum at redhat.com
Tue Nov 28 15:28:22 PST 2017


I thought somebody else was going to contact you about this, but in case it
didn't happen off-list:

This appears to be an embarrassing issue on our end where we alter the disk
state despite not being able to start up all the way, and rely on our users
to read release notes carefully. ;) :/

At this point, you're going to need to manually manipulate the OSDs. It
will involve identifying exactly what the Luminous daemons did; *hopefully*
they have only set new features on disk. If that's so, you can probably use
ceph-dencoder on whatever the feature flags file is and pull out everything
added after hammer.

But I'm not sure if that's the only thing that happened. You may need to
get some consulting from somebody who has experience doing Ceph cluster
recovery.
-Greg

On Thu, Nov 16, 2017 at 7:58 PM Gianfilippo <gianfi at gmail.com> wrote:

> Hi all,
> I did a pretty bit mistake doing our upgrade from hammer to luminous,
> skipping the jewel release.
> When I realized and tried to switch back to jewel, it was too late  -
> the cluster now won't start, complaining about "The disk uses features
> unsupported by the executable.":
>
> 2017-11-17 01:27:26.190971 7fb446ab58c0  0 ceph version 0.94.10
> (b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid 19638
> 2017-11-17 01:27:26.209600 7fb446ab58c0  0
> filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)
> 2017-11-17 01:27:26.277323 7fb446ab58c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> FIEMAP ioctl is supported and appears to work
> 2017-11-17 01:27:26.277353 7fb446ab58c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> FIEMAP ioctl is disabled via 'filestore fiemap' config option
> 2017-11-17 01:27:26.302508 7fb446ab58c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2017-11-17 01:27:26.302668 7fb446ab58c0  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: extsize is
> disabled by conf
> 2017-11-17 01:27:26.325121 7fb446ab58c0  0
> filestore(/var/lib/ceph/osd/ceph-2) mount: enabling WRITEAHEAD journal
> mode: checkpoint is not enabled
> 2017-11-17 01:27:26.343360 7fb446ab58c0  1 journal _open
> /var/lib/ceph/osd/ceph-2/journal fd 20: 5368709120 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2017-11-17 01:27:26.393876 7fb446ab58c0  1 journal _open
> /var/lib/ceph/osd/ceph-2/journal fd 20: 5368709120 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2017-11-17 01:27:26.394746 7fb446ab58c0 -1 osd.2 0 The disk uses
> features unsupported by the executable.
> 2017-11-17 01:27:26.394758 7fb446ab58c0 -1 osd.2 0  ondisk features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
>
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
> attr,16=deletes in missing set}
> 2017-11-17 01:27:26.394780 7fb446ab58c0 -1 osd.2 0  daemon features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
>
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
> objects,12=transaction hints,13=pg meta object}
> 2017-11-17 01:27:26.394794 7fb446ab58c0 -1 osd.2 0 Cannot write to disk!
> Missing features: compat={},rocompat={},incompat={14=explicit missing
> set,15=fastinfo pg attr,16=deletes in missing set}
> 2017-11-17 01:27:26.419854 7fb446ab58c0  1 journal close
> /var/lib/ceph/osd/ceph-2/journal
> 2017-11-17 01:27:26.422687 7fb446ab58c0 -1 ESC[0;31m ** ERROR: osd init
> failed: (95) Operation not supportedESC[0m
> 2017-11-17 01:27:26.863514 7fcc5f1428c0  0 ceph version 0.94.10
> (b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid 19731
> 2017-11-17 01:27:26.878617 7fcc5f1428c0  0
> filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)
> 2017-11-17 01:27:26.880689 7fcc5f1428c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> FIEMAP ioctl is supported and appears to work
> 2017-11-17 01:27:26.880703 7fcc5f1428c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> FIEMAP ioctl is disabled via 'filestore fiemap' config option
> 2017-11-17 01:27:26.898681 7fcc5f1428c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2017-11-17 01:27:26.898829 7fcc5f1428c0  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: extsize is
> disabled by conf
> 2017-11-17 01:27:26.906300 7fcc5f1428c0  0
> filestore(/var/lib/ceph/osd/ceph-2) mount: enabling WRITEAHEAD journal
> mode: checkpoint is not enabled
> 2017-11-17 01:27:26.917013 7fcc5f1428c0  1 journal _open
> /var/lib/ceph/osd/ceph-2/journal fd 20: 5368709120 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2017-11-17 01:27:26.925628 7fcc5f1428c0  1 journal _open
> /var/lib/ceph/osd/ceph-2/journal fd 20: 5368709120 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2017-11-17 01:27:26.926496 7fcc5f1428c0 -1 osd.2 0 The disk uses
> features unsupported by the executable.
> 2017-11-17 01:27:26.926509 7fcc5f1428c0 -1 osd.2 0  ondisk features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
>
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
> attr,16=deletes in missing set}
> 2017-11-17 01:27:26.926533 7fcc5f1428c0 -1 osd.2 0  daemon features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
>
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
> objects,12=transaction hints,13=pg meta object}
> 2017-11-17 01:27:26.926553 7fcc5f1428c0 -1 osd.2 0 Cannot write to disk!
> Missing features: compat={},rocompat={},incompat={14=explicit missing
> set,15=fastinfo pg attr,16=deletes in missing set}
> 2017-11-17 01:27:26.927159 7fcc5f1428c0  1 journal close
> /var/lib/ceph/osd/ceph-2/journal
> 2017-11-17 01:27:26.929073 7fcc5f1428c0 -1 ESC[0;31m ** ERROR: osd init
> failed: (95) Operation not supportedESC[0m
> 2017-11-17 01:27:27.364931 7f16ccdc78c0  0 ceph version 0.94.10
> (b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid 19821
> 2017-11-17 01:27:27.379962 7f16ccdc78c0  0
> filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)
> 2017-11-17 01:27:27.381509 7f16ccdc78c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> FIEMAP ioctl is supported and appears to work
> 2017-11-17 01:27:27.381524 7f16ccdc78c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> FIEMAP ioctl is disabled via 'filestore fiemap' config option
> 2017-11-17 01:27:27.397192 7f16ccdc78c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2017-11-17 01:27:27.397324 7f16ccdc78c0  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: extsize is
> disabled by conf
> 2017-11-17 01:27:27.402018 7f16ccdc78c0  0
> filestore(/var/lib/ceph/osd/ceph-2) mount: enabling WRITEAHEAD journal
> mode: checkpoint is not enabled
> 2017-11-17 01:27:27.412815 7f16ccdc78c0  1 journal _open
> /var/lib/ceph/osd/ceph-2/journal fd 19: 5368709120 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2017-11-17 01:27:27.421621 7f16ccdc78c0  1 journal _open
> /var/lib/ceph/osd/ceph-2/journal fd 19: 5368709120 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2017-11-17 01:27:27.422471 7f16ccdc78c0 -1 osd.2 0 The disk uses
> features unsupported by the executable.
> 2017-11-17 01:27:27.422482 7f16ccdc78c0 -1 osd.2 0  ondisk features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
>
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
> attr,16=deletes in missing set}
> 2017-11-17 01:27:27.422495 7f16ccdc78c0 -1 osd.2 0  daemon features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
>
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
> objects,12=transaction hints,13=pg meta object}
> 2017-11-17 01:27:27.422515 7f16ccdc78c0 -1 osd.2 0 Cannot write to disk!
> Missing features: compat={},rocompat={},incompat={14=explicit missing
> set,15=fastinfo pg attr,16=deletes in missing set}
> 2017-11-17 01:27:27.424247 7f16ccdc78c0  1 journal close
> /var/lib/ceph/osd/ceph-2/journal
> 2017-11-17 01:27:27.426533 7f16ccdc78c0 -1 ESC[0;31m ** ERROR: osd init
> failed: (95) Operation not supportedESC[0m
>
>
>
>
> As the cluster won't start on Jewel, i can't give a ceph osd set
> sortbitwise to complete a successful upgrade to Luminous. So trying to
> start on Luminous i get the following errors from OSD:
>
> 2017-11-17 04:57:37.779635 7f04c28ead00  0 set uid:gid to 64045:64045
> (ceph:ceph)
> 2017-11-17 04:57:37.779682 7f04c28ead00  0 ceph version 12.2.1
> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process
> (unknown), pid 8531
> 2017-11-17 04:57:37.783538 7f04c28ead00 -1 Public network was set, but
> cluster network was not set
> 2017-11-17 04:57:37.783562 7f04c28ead00 -1     Using public network also
> for cluster network
> 2017-11-17 04:57:37.790066 7f04c28ead00  0 pidfile_write: ignore empty
> --pid-file
> 2017-11-17 04:57:37.800558 7f04c28ead00  0 load: jerasure load: lrc
> load: isa
> 2017-11-17 04:57:37.801160 7f04c28ead00  0
> filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)
> 2017-11-17 04:57:37.802397 7f04c28ead00  0
> filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)
> 2017-11-17 04:57:37.802951 7f04c28ead00  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> FIEMAP ioctl is disabled via 'filestore fiemap' config option
> 2017-11-17 04:57:37.802965 7f04c28ead00  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config
> option
> 2017-11-17 04:57:37.802970 7f04c28ead00  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> splice() is disabled via 'filestore splice' config option
> 2017-11-17 04:57:37.817358 7f04c28ead00  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2017-11-17 04:57:37.817510 7f04c28ead00  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: extsize is
> disabled by conf
> 2017-11-17 04:57:37.818348 7f04c28ead00  0
> filestore(/var/lib/ceph/osd/ceph-2) start omap initiation
> 2017-11-17 04:57:37.818791 7f04c28ead00  1 leveldb: Recovering log #275311
> 2017-11-17 04:57:37.820333 7f04c28ead00  1 leveldb: Delete type=0 #275311
>
> 2017-11-17 04:57:37.820397 7f04c28ead00  1 leveldb: Delete type=3 #275310
>
> 2017-11-17 04:57:38.555491 7f04c28ead00  0
> filestore(/var/lib/ceph/osd/ceph-2) mount(1758): enabling WRITEAHEAD
> journal mode: checkpoint is not enabled
> 2017-11-17 04:57:38.559310 7f04c28ead00  1 journal _open
> /var/lib/ceph/osd/ceph-2/journal fd 28: 5368709120 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2017-11-17 04:57:38.561591 7f04c28ead00  1 journal _open
> /var/lib/ceph/osd/ceph-2/journal fd 28: 5368709120 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2017-11-17 04:57:38.563244 7f04c28ead00  1
> filestore(/var/lib/ceph/osd/ceph-2) upgrade(1365)
> 2017-11-17 04:57:38.564625 7f04c28ead00  0 <cls>
> /build/ceph-12.2.1/src/cls/hello/cls_hello.cc:296: loading cls_hello
> 2017-11-17 04:57:38.564658 7f04c28ead00  0 _get_class not permitted to
> load lua
> 2017-11-17 04:57:38.567611 7f04c28ead00  0 <cls>
> /build/ceph-12.2.1/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs
> 2017-11-17 04:57:38.568554 7f04c28ead00  0 _get_class not permitted to
> load sdk
> 2017-11-17 04:57:38.568774 7f04c28ead00  0 _get_class not permitted to
> load kvs
> 2017-11-17 04:57:38.568793 7f04c28ead00  1 osd.2 0 warning: got an error
> loading one or more classes: (1) Operation not permitted
> 2017-11-17 04:57:38.569061 7f04c28ead00  0 osd.2 1531 crush map has
> features 1107558400, adjusting msgr requires for clients
> 2017-11-17 04:57:38.569077 7f04c28ead00  0 osd.2 1531 crush map has
> features 1107558400 was 8705, adjusting msgr requires for mons
> 2017-11-17 04:57:38.569085 7f04c28ead00  0 osd.2 1531 crush map has
> features 1107558400, adjusting msgr requires for osds
> 2017-11-17 04:57:38.735392 7f04c28ead00  0 osd.2 1531 load_pgs
> 2017-11-17 04:57:38.740951 7f04c28ead00 -1 *** Caught signal (Aborted) **
>   in thread 7f04c28ead00 thread_name:ceph-osd
>
>   ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)
>   1: (()+0xa088b9) [0x5618d594b8b9]
>   2: (()+0x10330) [0x7f04c0e0d330]
>   3: (gsignal()+0x37) [0x7f04bfe2dc37]
>   4: (abort()+0x148) [0x7f04bfe31028]
>   5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f04c073c535]
>   6: (()+0x5e6d6) [0x7f04c073a6d6]
>   7: (()+0x5e703) [0x7f04c073a703]
>   8: (()+0x5e922) [0x7f04c073a922]
>   9: (object_stat_sum_t::decode(ceph::buffer::list::iterator&)+0x5cc)
> [0x5618d563ec1c]
>   10:
> (object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x54)
> [0x5618d5659e14]
>   11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x1d5)
> [0x5618d565a455]
>   12: (pg_info_t::decode(ceph::buffer::list::iterator&)+0x12a)
> [0x5618d566059a]
>   13: (PG::read_info(ObjectStore*, spg_t, coll_t const&,
> ceph::buffer::list&, pg_info_t&, PastIntervals&, unsigned char&)+0x231)
> [0x5618d54d1231]
>   14: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x7b)
> [0x5618d54d8c6b]
>   15: (OSD::load_pgs()+0x994) [0x5618d5435e74]
>   16: (OSD::init()+0x2127) [0x5618d544ddc7]
>   17: (main()+0x2ba8) [0x5618d5355798]
>   18: (__libc_start_main()+0xf5) [0x7f04bfe18f45]
>   19: (()+0x4b0826) [0x5618d53f3826]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
>
>
> Thank you for any help, the storage cluster is hosting crucial data.
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171128/f69b2389/attachment.html>


More information about the ceph-users mailing list