[ceph-users] OSD Random Failures - Latest Luminous

Ashley Merrick ashley at amerrick.co.uk
Mon Nov 20 02:26:39 PST 2017


One thing I have been trying on the new down OSD is exporting a PG and importing to another OSD using ceph-objectstore-tool.


Export & Import goes fine, however when the OSD is then started back up the PG query still show's its looking for the old down OSD, should the OSD starting with a copy of the PG not communicate it now hold's the data the PG want's?


Or do I need to force it to see this somehow?


I can't mark down or lost the old OSD as doing that causes further OSD's to go down so just have to leave them stopped by still listed as an OSD.


,Ashley

________________________________
From: Ashley Merrick
Sent: 20 November 2017 08:56:15
To: Gregory Farnum
Cc: David Turner; ceph-users at ceph.com
Subject: Re: [ceph-users] OSD Random Failures - Latest Luminous


Hello,


So I tried as suggested marking one OSD that continuously failed as lost and add a new OSD to take it's place.


However all this does is make another 2-3 OSD's fail with the exact same error.


Seems this is a pretty huge and nasty bug / issue!


Greg your have to give me some more information about what you need if you want me to try and get some information.


However right now the cluster it self is pretty much toast due to the amount of OSD's now with this assert.


,Ashley

________________________________
From: Gregory Farnum <gfarnum at redhat.com>
Sent: 19 November 2017 09:25:39
To: Ashley Merrick
Cc: David Turner; ceph-users at ceph.com
Subject: Re: [ceph-users] OSD Random Failures - Latest Luminous

I only see two asserts (in my local checkout) in that function; one is metadata
    assert(info.history.same_interval_since != 0);
and the other is a sanity check
    assert(!deleting);

Can you open a core dump with gdb and look at what line it's on in the start_peering_interval frame? (May need to install the debug packages.)

I think we've run across that first assert as an issue before, but both of them ought to be dumping out more cleanly about what line they're on.
-Greg


On Sun, Nov 19, 2017 at 1:32 AM Ashley Merrick <ashley at amerrick.co.uk<mailto:ashley at amerrick.co.uk>> wrote:

Hello,



So seems noup does not help.



Still have the same error :



2017-11-18 14:26:40.982827 7fb4446cd700 -1 *** Caught signal (Aborted) **in thread 7fb4446cd700 thread_name:tp_peering



ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)

1: (()+0xa0c554) [0x56547f500554]

2: (()+0x110c0) [0x7fb45cabe0c0]

3: (gsignal()+0xcf) [0x7fb45ba85fcf]

4: (abort()+0x16a) [0x7fb45ba873fa]

5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x56547f547f0e]

6: (PG::start_peering_interval(std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> > const&, int, std::vector<int, std::allocator<int> > const&, int, ObjectStore::Transaction*)+0x1569) [0x56547f029ad9]

7: (PG::RecoveryState::Reset::react(PG::AdvMap const&)+0x479) [0x56547f02a099]

8: (boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x188) [0x56547f06c6d8]

9: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x69) [0x56547f045549]

10: (PG::handle_advance_map(std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, PG::RecoveryCtx*)+0x4a7) [0x56547f00e837]

11: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >*)+0x2e7) [0x56547ef56e67]

12: (OSD::process_peering_events(std::__cxx11::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x1e4) [0x56547ef57cb4]

13: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x2c) [0x56547efc2a0c]

14: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x56547f54ef28]

15: (ThreadPool::WorkThread::entry()+0x10) [0x56547f5500c0]

16: (()+0x7494) [0x7fb45cab4494]

17: (clone()+0x3f) [0x7fb45bb3baff]

NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.



I guess even with noup the OSD/PG still has the peer with the other PG’s which is the stage that causes the failure, most OSD’s seem to stay up for about 30 seconds, and every time it’s a different PG listed on the failure.



,Ashley



From: David Turner [mailto:drakonstein at gmail.com<mailto:drakonstein at gmail.com>]

Sent: 18 November 2017 22:19
To: Ashley Merrick <ashley at amerrick.co.uk<mailto:ashley at amerrick.co.uk>>

Cc: Eric Nelson <ericnelson at gmail.com<mailto:ericnelson at gmail.com>>; ceph-users at ceph.com<mailto:ceph-users at ceph.com>

Subject: Re: [ceph-users] OSD Random Failures - Latest Luminous



Does letting the cluster run with noup for a while until all down disks are idle, and then letting them come in help at all?  I don't know your specific issue and haven't touched bluestore yet, but that is generally sound advice when is won't start.

Also is there any pattern to the osds that are down? Common PGs, common hosts, common ssds, etc?



On Sat, Nov 18, 2017, 7:08 AM Ashley Merrick <ashley at amerrick.co.uk<mailto:ashley at amerrick.co.uk>> wrote:

Hello,



Any further suggestions or work around’s from anyone?



Cluster is hard down now with around 2% PG’s offline, on the occasion able to get an OSD to start for a bit but then will seem to do some peering and again crash with “*** Caught signal (Aborted) **in thread 7f3471c55700 thread_name:tp_peering”



,Ashley



From: Ashley Merrick

Sent: 16 November 2017 17:27
To: Eric Nelson <ericnelson at gmail.com<mailto:ericnelson at gmail.com>>

Cc: ceph-users at ceph.com<mailto:ceph-users at ceph.com>
Subject: Re: [ceph-users] OSD Random Failures - Latest Luminous



Hello,



Good to hear it's not just me, however have a cluster basically offline due to too many OSD's dropping for this issue.



Anybody have any suggestions?



,Ashley

________________________________

From: Eric Nelson <ericnelson at gmail.com<mailto:ericnelson at gmail.com>>
Sent: 16 November 2017 00:06:14
To: Ashley Merrick
Cc: ceph-users at ceph.com<mailto:ceph-users at ceph.com>
Subject: Re: [ceph-users] OSD Random Failures - Latest Luminous



I've been seeing these as well on our SSD cachetier that's been ravaged by disk failures as of late.... Same tp_peering assert as above even running luminous branch from git.



Let me know if you have a bug filed I can +1 or have found a workaround.



E



On Wed, Nov 15, 2017 at 10:25 AM, Ashley Merrick <ashley at amerrick.co.uk<mailto:ashley at amerrick.co.uk>> wrote:

Hello,



After replacing a single OSD disk due to a failed disk I am now seeing 2-3 OSD’s randomly stop and fail to start, do a boot loop get to load_pgs and then fail with the following (I tried setting OSD log’s to 5/5 but didn’t get any extra lines around the error just more information pre boot.



Could this be a certain PG causing these OSD’s to crash (6.2f2s10 for example)?



    -9> 2017-11-15 17:37:14.696229 7fa4ec50f700  1 osd.37 pg_epoch: 161571 pg[6.2f9s1( v 161563'158209 lc 161175'158153 (150659'148187,161563'158209] local-lis/les=161519/161521 n=47572 ec=31534/31534 lis/c 161519/152474 les/c/f 161521/152523/159786 161517/161519/161519) [34,37,13,12,66,69,118,120,28,20,88,0,2]/[34,37,13,12,66,69,118,120,28,20,53,54,2147483647] r=1 lpr=161563 pi=[152474,161519)/1 crt=161562'158208 lcod 0'0 unknown NOTIFY m=21] state<Start>: transitioning to Stray

    -8> 2017-11-15 17:37:14.696239 7fa4ec50f700  5 osd.37 pg_epoch: 161571 pg[6.2f9s1( v 161563'158209 lc 161175'158153 (150659'148187,161563'158209] local-lis/les=161519/161521 n=47572 ec=31534/31534 lis/c 161519/152474 les/c/f 161521/152523/159786 161517/161519/161519) [34,37,13,12,66,69,118,120,28,20,88,0,2]/[34,37,13,12,66,69,118,120,28,20,53,54,2147483647] r=1 lpr=161563 pi=[152474,161519)/1 crt=161562'158208 lcod 0'0 unknown NOTIFY m=21] exit Start 0.000019 0 0.000000

    -7> 2017-11-15 17:37:14.696250 7fa4ec50f700  5 osd.37 pg_epoch: 161571 pg[6.2f9s1( v 161563'158209 lc 161175'158153 (150659'148187,161563'158209] local-lis/les=161519/161521 n=47572 ec=31534/31534 lis/c 161519/152474 les/c/f 161521/152523/159786 161517/161519/161519) [34,37,13,12,66,69,118,120,28,20,88,0,2]/[34,37,13,12,66,69,118,120,28,20,53,54,2147483647] r=1 lpr=161563 pi=[152474,161519)/1 crt=161562'158208 lcod 0'0 unknown NOTIFY m=21] enter Started/Stray

    -6> 2017-11-15 17:37:14.696324 7fa4ec50f700  5 osd.37 pg_epoch: 161571 pg[6.2f2s10( v 161570'157712 lc 161175'157648 (160455'154564,161570'157712] local-lis/les=161517/161519 n=47328 ec=31534/31534 lis/c 161517/160962 les/c/f 161519/160963/159786 161517/161517/108939) [96,100,79,4,69,65,57,59,135,134,37,35,18] r=10 lpr=161570 pi=[160962,161517)/2 crt=161560'157711 lcod 0'0 unknown NOTIFY m=5] exit Reset 3.363755 2 0.000076

    -5> 2017-11-15 17:37:14.696337 7fa4ec50f700  5 osd.37 pg_epoch: 161571 pg[6.2f2s10( v 161570'157712 lc 161175'157648 (160455'154564,161570'157712] local-lis/les=161517/161519 n=47328 ec=31534/31534 lis/c 161517/160962 les/c/f 161519/160963/159786 161517/161517/108939) [96,100,79,4,69,65,57,59,135,134,37,35,18] r=10 lpr=161570 pi=[160962,161517)/2 crt=161560'157711 lcod 0'0 unknown NOTIFY m=5] enter Started

    -4> 2017-11-15 17:37:14.696346 7fa4ec50f700  5 osd.37 pg_epoch: 161571 pg[6.2f2s10( v 161570'157712 lc 161175'157648 (160455'154564,161570'157712] local-lis/les=161517/161519 n=47328 ec=31534/31534 lis/c 161517/160962 les/c/f 161519/160963/159786 161517/161517/108939) [96,100,79,4,69,65,57,59,135,134,37,35,18] r=10 lpr=161570 pi=[160962,161517)/2 crt=161560'157711 lcod 0'0 unknown NOTIFY m=5] enter Start

    -3> 2017-11-15 17:37:14.696353 7fa4ec50f700  1 osd.37 pg_epoch: 161571 pg[6.2f2s10( v 161570'157712 lc 161175'157648 (160455'154564,161570'157712] local-lis/les=161517/161519 n=47328 ec=31534/31534 lis/c 161517/160962 les/c/f 161519/160963/159786 161517/161517/108939) [96,100,79,4,69,65,57,59,135,134,37,35,18] r=10 lpr=161570 pi=[160962,161517)/2 crt=161560'157711 lcod 0'0 unknown NOTIFY m=5] state<Start>: transitioning to Stray

    -2> 2017-11-15 17:37:14.696364 7fa4ec50f700  5 osd.37 pg_epoch: 161571 pg[6.2f2s10( v 161570'157712 lc 161175'157648 (160455'154564,161570'157712] local-lis/les=161517/161519 n=47328 ec=31534/31534 lis/c 161517/160962 les/c/f 161519/160963/159786 161517/161517/108939) [96,100,79,4,69,65,57,59,135,134,37,35,18] r=10 lpr=161570 pi=[160962,161517)/2 crt=161560'157711 lcod 0'0 unknown NOTIFY m=5] exit Start 0.000018 0 0.000000

    -1> 2017-11-15 17:37:14.696372 7fa4ec50f700  5 osd.37 pg_epoch: 161571 pg[6.2f2s10( v 161570'157712 lc 161175'157648 (160455'154564,161570'157712] local-lis/les=161517/161519 n=47328 ec=31534/31534 lis/c 161517/160962 les/c/f 161519/160963/159786 161517/161517/108939) [96,100,79,4,69,65,57,59,135,134,37,35,18] r=10 lpr=161570 pi=[160962,161517)/2 crt=161560'157711 lcod 0'0 unknown NOTIFY m=5] enter Started/Stray

     0> 2017-11-15 17:37:14.697245 7fa4ebd0e700 -1 *** Caught signal (Aborted) **

in thread 7fa4ebd0e700 thread_name:tp_peering



ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)

1: (()+0xa3acdc) [0x55dfb6ba3cdc]

2: (()+0xf890) [0x7fa510e2c890]

3: (gsignal()+0x37) [0x7fa50fe66067]

4: (abort()+0x148) [0x7fa50fe67448]

5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27f) [0x55dfb6be6f5f]

6: (PG::start_peering_interval(std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> > const&, int, std::vector<int, std::allocator<int> > const&, int, ObjectStore::Transaction*)+0x14e3) [0x55dfb670f8a3]

7: (PG::RecoveryState::Reset::react(PG::AdvMap const&)+0x539) [0x55dfb670ff39]

8: (boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x244) [0x55dfb67552a4]

9: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x6b) [0x55dfb6732c1b]

10: (PG::handle_advance_map(std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, PG::RecoveryCtx*)+0x3e3) [0x55dfb6702ef3]

11: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >*)+0x20a) [0x55dfb664db2a]

12: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x175) [0x55dfb664e6b5]

13: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x27) [0x55dfb66ae5a7]

14: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa8f) [0x55dfb6bedb1f]

15: (ThreadPool::WorkThread::entry()+0x10) [0x55dfb6beea50]

16: (()+0x8064) [0x7fa510e25064]

17: (clone()+0x6d) [0x7fa50ff1962d]

NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.



--- logging levels ---

   0/ 5 none

   0/ 1 lockdep

   0/ 1 context

   1/ 1 crush

   1/ 5 mds

   1/ 5 mds_balancer

   1/ 5 mds_locker

   1/ 5 mds_log

   1/ 5 mds_log_expire

   1/ 5 mds_migrator

   0/ 1 buffer

   0/ 1 timer

   0/ 1 filer

   0/ 1 striper

   0/ 1 objecter

   0/ 5 rados

   0/ 5 rbd

   0/ 5 rbd_mirror

   0/ 5 rbd_replay

   0/ 5 journaler

   0/ 5 objectcacher

   0/ 5 client

   1/ 5 osd

   0/ 5 optracker

   0/ 5 objclass

   1/ 3 filestore

   1/ 3 journal

   0/ 5 ms

   1/ 5 mon

   0/10 monc

   1/ 5 paxos

   0/ 5 tp

   1/ 5 auth

   1/ 5 crypto

   1/ 1 finisher

   1/ 5 heartbeatmap

   1/ 5 perfcounter

   1/ 5 rgw

   1/10 civetweb

   1/ 5 javaclient

   1/ 5 asok

   1/ 1 throttle

   0/ 0 refs

   1/ 5 xio

   1/ 5 compressor

   1/ 5 bluestore

   1/ 5 bluefs

   1/ 3 bdev

   1/ 5 kstore

   4/ 5 rocksdb

   4/ 5 leveldb

   4/ 5 memdb

   1/ 5 kinetic

   1/ 5 fuse

   1/ 5 mgr

   1/ 5 mgrc

   1/ 5 dpdk

   1/ 5 eventtrace

  -2/-2 (syslog threshold)

  -1/-1 (stderr threshold)

  max_recent     10000

  max_new         1000

  log_file /var/log/ceph/ceph-osd.37.log

_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171120/dd75e587/attachment.html>


More information about the ceph-users mailing list