[ceph-users] Another OSD broken today. How can I recover it?

Gonzalo Aguilar Delgado gaguilar at aguilardelgado.com
Mon Dec 4 01:22:58 PST 2017


Hello,

Things are going worse every day.


ceph -w
    cluster 9028f4da-0d77-462b-be9b-dbdf7fa57771
     health HEALTH_ERR
            1 pgs are stuck inactive for more than 300 seconds
            8 pgs inconsistent
            1 pgs repair
            1 pgs stale
            1 pgs stuck stale
            recovery 20266198323167232/288980 objects degraded
(7013010700798.405%)
            37154696925806624 scrub errors
            no legacy OSD present but 'sortbitwise' flag is not set


But I'm finally finding time to recover. The disk seems to be correct,
no smart errors and everything looks fine just ceph not starting. Today
I started to look for the ceph-objectstore-tool. That I don't really
know much.

It just works nice. No crash as expected like on the OSD.

So I'm lost. Since both OSD and ceph objectstore tool use same backend
how is this posible?

Can someone help me on fixing this, please?



----------------------------------------------------------------------------------

ceph-objectstore-tool --debug --op list-pgs --data-path
/var/lib/ceph/osd/ceph-4 --journal-path /dev/sdf3
2017-12-03 13:27:58.206069 7f02c203aa40  0
filestore(/var/lib/ceph/osd/ceph-4) backend xfs (magic 0x58465342)
2017-12-03 13:27:58.206528 7f02c203aa40  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2017-12-03 13:27:58.206546 7f02c203aa40  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2017-12-03 13:27:58.206569 7f02c203aa40  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
splice is supported
2017-12-03 13:27:58.251393 7f02c203aa40  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2017-12-03 13:27:58.251459 7f02c203aa40  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: extsize is
disabled by conf
2017-12-03 13:27:58.978809 7f02c203aa40  0
filestore(/var/lib/ceph/osd/ceph-4) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
2017-12-03 13:27:58.990051 7f02c203aa40  1 journal _open /dev/sdf3 fd
11: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1
2017-12-03 13:27:59.002345 7f02c203aa40  1 journal _open /dev/sdf3 fd
11: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1
2017-12-03 13:27:59.004846 7f02c203aa40  1
filestore(/var/lib/ceph/osd/ceph-4) upgrade
Cluster fsid=9028f4da-0d77-462b-be9b-dbdf7fa57771
Supported features: compat={},rocompat={},incompat={1=initial feature
set(~v.18),2=pginfo object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
objects,12=transaction hints,13=pg meta object}
On-disk features: compat={},rocompat={},incompat={1=initial feature
set(~v.18),2=pginfo object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
objects,12=transaction hints,13=pg meta object}
Performing list-pgs operation
11.7f
10.4b
....
10.8d
2017-12-03 13:27:59.009327 7f02c203aa40  1 journal close /dev/sdf3




It looks like the problem has something to do with map. cause there's an
assertion that's failing on size.

Can this have something to do with the fact I got this from map?

      pgmap v71223952: 764 pgs, 6 pools, 561 GB data, 141 kobjects
            1124 GB used, 1514 GB / 2639 GB avail
            *20266198323167232*/288980 objects degraded (7013010700798.405%)

This is the current crash from the command line.

starting osd.4 at :/0 osd_data /var/lib/ceph/osd/ceph-4
/var/lib/ceph/osd/ceph-4/journal
osd/PG.cc: In function 'static int PG::peek_map_epoch(ObjectStore*,
spg_t, epoch_t*, ceph::bufferlist*)' thread 7f467ba0b8c0 time 2017-12-03
13:39:29.495311
osd/PG.cc: 3025: FAILED assert(values.size() == 2)
 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x80) [0x5556eab28790]
 2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*,
ceph::buffer::list*)+0x661) [0x5556ea4e6601]
 3: (OSD::load_pgs()+0x75a) [0x5556ea43a8aa]
 4: (OSD::init()+0x2026) [0x5556ea445ca6]
 5: (main()+0x2ef1) [0x5556ea3b7301]
 6: (__libc_start_main()+0xf0) [0x7f467886b830]
 7: (_start()+0x29) [0x5556ea3f8b09]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
2017-12-03 13:39:29.497091 7f467ba0b8c0 -1 osd/PG.cc: In function
'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*,
ceph::bufferlist*)' thread 7f467ba0b8c0 time 2017-12-03 13:39:29.495311
osd/PG.cc: 3025: FAILED assert(values.size() == 2)


So it looks like the offending code is this one:

  int r = store->omap_get_values(coll, pgmeta_oid, keys, &values);
  if (r == 0) {
    assert(values.size() == 2);     <------ Here

    // sanity check version

How can this values be different of 2. Can this have something to do
with the map values showing in ceph?

      pgmap v71223952: 764 pgs, 6 pools, 561 GB data, 141 kobjects
            1124 GB used, 1514 GB / 2639 GB avail
            20266198323167232/288980 objects degraded (7013010700798.405%)

Best regards



On 03/12/17 13:31, Gonzalo Aguilar Delgado wrote:
>
> Hi,
>
> Yes. Nice. Until all your OSD fails and you don't know what else to
> try. Looking at the faillure rates it will happen very soon.
>
> I want to recover them. I'm writing in another mail what I tried. Let
> see if someone can help me.
>
> I'm not doing anything. Just looking at my cluster from time to time
> to find that something else failed. I will do hard to recover this
> situation.
>
> Thank you.
>
>
> On 26/11/17 16:13, Marc Roos wrote:
>>  
>> If I am not mistaken, the whole idea with the 3 replica's is dat you 
>> have enough copies to recover from a failed osd. In my tests this seems 
>> to go fine automatically. Are you doing something that is not adviced?
>>
>>
>>
>>
>> -----Original Message-----
>> From: Gonzalo Aguilar Delgado [mailto:gaguilar at aguilardelgado.com] 
>> Sent: zaterdag 25 november 2017 20:44
>> To: 'ceph-users'
>> Subject: [ceph-users] Another OSD broken today. How can I recover it?
>>
>> Hello, 
>>
>>
>> I had another blackout with ceph today. It seems that ceph osd's fall 
>> from time to time and they are unable to recover. I have 3 OSD's down 
>> now. 1 removed from the cluster and 2 down because I'm unable to recover 
>> them. 
>>
>>
>> We really need a recovery tool. It's not normal that an OSD breaks and 
>> there's no way to recover. Is there any way to do it?
>>
>>
>> Last one shows this:
>>
>>
>>
>>
>> ] enter Reset
>>    -12> 2017-11-25 20:34:19.548891 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
>> pg[0.34(unlocked)] enter Initial
>>    -11> 2017-11-25 20:34:19.548983 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
>> pg[0.34( empty local-les=9685 n=0 ec=404 les/c/f 9685/9685/0 
>> 9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] 
>> exit Initial 0.000091 0 0.000000
>>    -10> 2017-11-25 20:34:19.548994 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
>> pg[0.34( empty local-les=9685 n=0 ec=404 les/c/f 9685/9685/0 
>> 9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] 
>> enter Reset
>>     -9> 2017-11-25 20:34:19.549166 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
>> pg[10.36(unlocked)] enter Initial
>>     -8> 2017-11-25 20:34:19.566781 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
>> pg[10.36( v 9686'7301894 (9686'7298879,9686'7301894] local-les=9685 
>> n=534 ec=419 les/c/f 9685/9686/0 9684/9684/9684) [4,0] r=0 lpr=0 
>> crt=9686'7301894 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] exit Initial 
>> 0.017614 0 0.000000
>>     -7> 2017-11-25 20:34:19.566811 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
>> pg[10.36( v 9686'7301894 (9686'7298879,9686'7301894] local-les=9685 
>> n=534 ec=419 les/c/f 9685/9686/0 9684/9684/9684) [4,0] r=0 lpr=0 
>> crt=9686'7301894 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] enter Reset
>>     -6> 2017-11-25 20:34:19.585411 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
>> pg[8.5c(unlocked)] enter Initial
>>     -5> 2017-11-25 20:34:19.602888 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
>> pg[8.5c( empty local-les=9685 n=0 ec=348 les/c/f 9685/9685/0 
>> 9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] 
>> exit Initial 0.017478 0 0.000000
>>     -4> 2017-11-25 20:34:19.602912 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
>> pg[8.5c( empty local-les=9685 n=0 ec=348 les/c/f 9685/9685/0 
>> 9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] 
>> enter Reset
>>     -3> 2017-11-25 20:34:19.603082 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
>> pg[9.10(unlocked)] enter Initial
>>     -2> 2017-11-25 20:34:19.615456 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
>> pg[9.10( v 9686'2322547 (9031'2319518,9686'2322547] local-les=9685 n=261 
>> ec=417 les/c/f 9685/9685/0 9684/9684/9684) [4,0] r=0 lpr=0 
>> crt=9686'2322547 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] exit Initial 
>> 0.012373 0 0.000000
>>     -1> 2017-11-25 20:34:19.615481 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
>> pg[9.10( v 9686'2322547 (9031'2319518,9686'2322547] local-les=9685 n=261 
>> ec=417 les/c/f 9685/9685/0 9684/9684/9684) [4,0] r=0 lpr=0 
>> crt=9686'2322547 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] enter Reset
>>      0> 2017-11-25 20:34:19.617400 7f6e5dc158c0 -1 osd/PG.cc: In 
>> function 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, 
>> ceph::bufferlist*)' thread 7f6e5dc158c0 time 2017-11-25 20:34:19.615633
>> osd/PG.cc: 3025: FAILED assert(values.size() == 2)
>>
>>  ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>> const*)+0x80) [0x5562d318d790]
>>  2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, 
>> ceph::buffer::list*)+0x661) [0x5562d2b4b601]
>>  3: (OSD::load_pgs()+0x75a) [0x5562d2a9f8aa]
>>  4: (OSD::init()+0x2026) [0x5562d2aaaca6]
>>  5: (main()+0x2ef1) [0x5562d2a1c301]
>>  6: (__libc_start_main()+0xf0) [0x7f6e5aa75830]
>>  7: (_start()+0x29) [0x5562d2a5db09]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
>> needed to interpret this.
>>
>> --- logging levels ---
>>    0/ 5 none
>>    0/ 1 lockdep
>>    0/ 1 context
>>    1/ 1 crush
>>    1/ 5 mds
>>    1/ 5 mds_balancer
>>    1/ 5 mds_locker
>>    1/ 5 mds_log
>>    1/ 5 mds_log_expire
>>    1/ 5 mds_migrator
>>    0/ 1 buffer
>>    0/ 1 timer
>>    0/ 1 filer
>>    0/ 1 striper
>>    0/ 1 objecter
>>    0/ 5 rados
>>    0/ 5 rbd
>>    0/ 5 rbd_mirror
>>    0/ 5 rbd_replay
>>    0/ 5 journaler
>>    0/ 5 objectcacher
>>    0/ 5 client
>>    0/ 5 osd
>>    0/ 5 optracker
>>    0/ 5 objclass
>>    1/ 3 filestore
>>    1/ 3 journal
>>    0/ 5 ms
>>    1/ 5 mon
>>    0/10 monc
>>    1/ 5 paxos
>>    0/ 5 tp
>>    1/ 5 auth
>>    1/ 5 crypto
>>    1/ 1 finisher
>>    1/ 5 heartbeatmap
>>    1/ 5 perfcounter
>>    1/ 5 rgw
>>    1/10 civetweb
>>    1/ 5 javaclient
>>    1/ 5 asok
>>    1/ 1 throttle
>>    0/ 0 refs
>>    1/ 5 xio
>>    1/ 5 compressor
>>    1/ 5 newstore
>>    1/ 5 bluestore
>>    1/ 5 bluefs
>>    1/ 3 bdev
>>    1/ 5 kstore
>>    4/ 5 rocksdb
>>    4/ 5 leveldb
>>    1/ 5 kinetic
>>    1/ 5 fuse
>>   -2/-2 (syslog threshold)
>>   -1/-1 (stderr threshold)
>>   max_recent     10000
>>   max_new         1000
>>   log_file /var/log/ceph/ceph-osd.4.log
>> --- end dump of recent events ---
>> 2017-11-25 20:34:19.622559 7f6e5dc158c0 -1 *** Caught signal (Aborted) 
>> **  in thread 7f6e5dc158c0 thread_name:ceph-osd
>>
>>  ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
>>  1: (()+0x98653e) [0x5562d308d53e]
>>  2: (()+0x11390) [0x7f6e5caee390]
>>  3: (gsignal()+0x38) [0x7f6e5aa8a428]
>>  4: (abort()+0x16a) [0x7f6e5aa8c02a]
>>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>> const*)+0x26b) [0x5562d318d97b]
>>  6: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, 
>> ceph::buffer::list*)+0x661) [0x5562d2b4b601]
>>  7: (OSD::load_pgs()+0x75a) [0x5562d2a9f8aa]
>>  8: (OSD::init()+0x2026) [0x5562d2aaaca6]
>>  9: (main()+0x2ef1) [0x5562d2a1c301]
>>  10: (__libc_start_main()+0xf0) [0x7f6e5aa75830]
>>  11: (_start()+0x29) [0x5562d2a5db09]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
>> needed to interpret this.
>>
>> --- begin dump of recent events ---
>>      0> 2017-11-25 20:34:19.622559 7f6e5dc158c0 -1 *** Caught signal 
>> (Aborted) **  in thread 7f6e5dc158c0 thread_name:ceph-osd
>>
>>  ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
>>  1: (()+0x98653e) [0x5562d308d53e]
>>  2: (()+0x11390) [0x7f6e5caee390]
>>  3: (gsignal()+0x38) [0x7f6e5aa8a428]
>>  4: (abort()+0x16a) [0x7f6e5aa8c02a]
>>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>> const*)+0x26b) [0x5562d318d97b]
>>  6: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, 
>> ceph::buffer::list*)+0x661) [0x5562d2b4b601]
>>  7: (OSD::load_pgs()+0x75a) [0x5562d2a9f8aa]
>>  8: (OSD::init()+0x2026) [0x5562d2aaaca6]
>>  9: (main()+0x2ef1) [0x5562d2a1c301]
>>  10: (__libc_start_main()+0xf0) [0x7f6e5aa75830]
>>  11: (_start()+0x29) [0x5562d2a5db09]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
>> needed to interpret this.
>>
>> --- logging levels ---
>>    0/ 5 none
>>    0/ 1 lockdep
>>    0/ 1 context
>>    1/ 1 crush
>>    1/ 5 mds
>>    1/ 5 mds_balancer
>>    1/ 5 mds_locker
>>    1/ 5 mds_log
>>    1/ 5 mds_log_expire
>>    1/ 5 mds_migrator
>>    0/ 1 buffer
>>    0/ 1 timer
>>    0/ 1 filer
>>    0/ 1 striper
>>    0/ 1 objecter
>>    0/ 5 rados
>>    0/ 5 rbd
>>    0/ 5 rbd_mirror
>>    0/ 5 rbd_replay
>>    0/ 5 journaler
>>    0/ 5 objectcacher
>>    0/ 5 client
>>    0/ 5 osd
>>    0/ 5 optracker
>>    0/ 5 objclass
>>    1/ 3 filestore
>>    1/ 3 journal
>>    0/ 5 ms
>>    1/ 5 mon
>>    0/10 monc
>>    1/ 5 paxos
>>    0/ 5 tp
>>    1/ 5 auth
>>    1/ 5 crypto
>>    1/ 1 finisher
>>    1/ 5 heartbeatmap
>>    1/ 5 perfcounter
>>    1/ 5 rgw
>>    1/10 civetweb
>>    1/ 5 javaclient
>>    1/ 5 asok
>>    1/ 1 throttle
>>    0/ 0 refs
>>    1/ 5 xio
>>    1/ 5 compressor
>>    1/ 5 newstore
>>    1/ 5 bluestore
>>    1/ 5 bluefs
>>    1/ 3 bdev
>>    1/ 5 kstore
>>    4/ 5 rocksdb
>>    4/ 5 leveldb
>>    1/ 5 kinetic
>>    1/ 5 fuse
>>   -2/-2 (syslog threshold)
>>   -1/-1 (stderr threshold)
>>   max_recent     10000
>>   max_new         1000
>>   log_file /var/log/ceph/ceph-osd.4.log
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171204/7285c4c9/attachment.html>


More information about the ceph-users mailing list