[ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

David Turner drakonstein at gmail.com
Thu Oct 18 06:42:44 PDT 2018


What are you OSD node stats?  CPU, RAM, quantity and size of OSD disks.
You might need to modify some bluestore settings to speed up the time it
takes to peer or perhaps you might just be underpowering the amount of OSD
disks you're trying to do and your servers and OSD daemons are going as
fast as they can.
On Sat, Oct 13, 2018 at 4:08 PM Stefan Priebe - Profihost AG <
s.priebe at profihost.ag> wrote:

> and a 3rd one:
>
>     health: HEALTH_WARN
>             1 MDSs report slow metadata IOs
>             1 MDSs report slow requests
>
> 2018-10-13 21:44:08.150722 mds.cloud1-1473 [WRN] 7 slow requests, 1
> included below; oldest blocked for > 199.922552 secs
> 2018-10-13 21:44:08.150725 mds.cloud1-1473 [WRN] slow request 34.829662
> seconds old, received at 2018-10-13 21:43:33.321031:
> client_request(client.216121228:929114 lookup #0x1/.active.lock
> 2018-10-13 21:43:33.321594 caller_uid=0, caller_gid=0{}) currently
> failed to rdlock, waiting
>
> The relevant OSDs are bluestore again running at 100% I/O:
>
> iostat shows:
> sdi              77,00     0,00  580,00   97,00 511032,00   972,00
> 1512,57    14,88   22,05   24,57    6,97   1,48 100,00
>
> so it reads with 500MB/s which completely saturates the osd. And it does
> for > 10 minutes.
>
> Greets,
> Stefan
>
> Am 13.10.2018 um 21:29 schrieb Stefan Priebe - Profihost AG:
> >
> > ods.19 is a bluestore osd on a healthy 2TB SSD.
> >
> > Log of osd.19 is here:
> > https://pastebin.com/raw/6DWwhS0A
> >
> > Am 13.10.2018 um 21:20 schrieb Stefan Priebe - Profihost AG:
> >> Hi David,
> >>
> >> i think this should be the problem - form a new log from today:
> >>
> >> 2018-10-13 20:57:20.367326 mon.a [WRN] Health check update: 4 osds down
> >> (OSD_DOWN)
> >> ...
> >> 2018-10-13 20:57:41.268674 mon.a [WRN] Health check update: Reduced data
> >> availability: 3 pgs peering (PG_AVAILABILITY)
> >> ...
> >> 2018-10-13 20:58:08.684451 mon.a [WRN] Health check failed: 1 osds down
> >> (OSD_DOWN)
> >> ...
> >> 2018-10-13 20:58:22.841210 mon.a [WRN] Health check failed: Reduced data
> >> availability: 8 pgs inactive (PG_AVAILABILITY)
> >> ....
> >> 2018-10-13 20:58:47.570017 mon.a [WRN] Health check update: Reduced data
> >> availability: 5 pgs inactive (PG_AVAILABILITY)
> >> ...
> >> 2018-10-13 20:58:49.142108 osd.19 [WRN] Monitor daemon marked osd.19
> >> down, but it is still running
> >> 2018-10-13 20:58:53.750164 mon.a [WRN] Health check update: Reduced data
> >> availability: 3 pgs inactive (PG_AVAILABILITY)
> >> ...
> >>
> >> so there is a timeframe of > 90s whee PGs are inactive and unavail -
> >> this would at least explain stalled I/O to me?
> >>
> >> Greets,
> >> Stefan
> >>
> >>
> >> Am 12.10.2018 um 15:59 schrieb David Turner:
> >>> The PGs per OSD does not change unless the OSDs are marked out.  You
> >>> have noout set, so that doesn't change at all during this test.  All of
> >>> your PGs peered quickly at the beginning and then were
> active+undersized
> >>> the rest of the time, you never had any blocked requests, and you
> always
> >>> had 100MB/s+ client IO.  I didn't see anything wrong with your cluster
> >>> to indicate that your clients had any problems whatsoever accessing
> data.
> >>>
> >>> Can you confirm that you saw the same problems while you were running
> >>> those commands?  The next thing would seem that possibly a client isn't
> >>> getting an updated OSD map to indicate that the host and its OSDs are
> >>> down and it's stuck trying to communicate with host7.  That would
> >>> indicate a potential problem with the client being unable to
> communicate
> >>> with the Mons maybe?  Have you completely ruled out any network
> problems
> >>> between all nodes and all of the IPs in the cluster.  What does your
> >>> client log show during these times?
> >>>
> >>> On Fri, Oct 12, 2018 at 8:35 AM Nils Fahldieck - Profihost AG
> >>> <n.fahldieck at profihost.ag <mailto:n.fahldieck at profihost.ag>> wrote:
> >>>
> >>>     Hi, in our `ceph.conf` we have:
> >>>
> >>>       mon_max_pg_per_osd = 300
> >>>
> >>>     While the host is offline (9 OSDs down):
> >>>
> >>>       4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD
> >>>
> >>>     If all OSDs are online:
> >>>
> >>>       4352 PGs * 3 / 71 OSDs ~ 183 PGs per OSD
> >>>
> >>>     ... so this doesn't seem to be the issue.
> >>>
> >>>     If I understood you right, that's what you've meant. If I got you
> wrong,
> >>>     would you mind to point to one of those threads you mentioned?
> >>>
> >>>     Thanks :)
> >>>
> >>>     Am 12.10.2018 um 14:03 schrieb Burkhard Linke:
> >>>     > Hi,
> >>>     >
> >>>     >
> >>>     > On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
> >>>     >> I rebooted a Ceph host and logged `ceph status` & `ceph health
> >>>     detail`
> >>>     >> every 5 seconds. During this I encountered 'PG_AVAILABILITY
> >>>     Reduced data
> >>>     >> availability: pgs peering'. At the same time some VMs hung as
> >>>     described
> >>>     >> before.
> >>>     >
> >>>     > Just a wild guess... you have 71 OSDs and about 4500 PG with
> size=3.
> >>>     > 13500 PG instance overall, resulting in ~190 PGs per OSD under
> normal
> >>>     > circumstances.
> >>>     >
> >>>     > If one host is down and the PGs have to re-peer, you might reach
> the
> >>>     > limit of 200 PG/OSDs on some of the OSDs, resulting in stuck
> peering.
> >>>     >
> >>>     > You can try to raise this limit. There are several threads on the
> >>>     > mailing list about this.
> >>>     >
> >>>     > Regards,
> >>>     > Burkhard
> >>>     >
> >>>     _______________________________________________
> >>>     ceph-users mailing list
> >>>     ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
> >>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181018/ede2618b/attachment.html>


More information about the ceph-users mailing list