[ceph-users] ceph osd pg-upmap-items not working

Kári Bertilsson karibertils at gmail.com
Mon Mar 18 08:37:21 PDT 2019


Because i have tested failing the mgr & rebooting all the servers in random
order multiple times. The upmap optimizer did never find more optimizations
to do after the initial optimizations. I tried leaving the balancer ON for
days and also OFF and running manually several times.

i did manually move just a few pg's from the fullest disks to the lowest
disks and free space increased by 7% so it was clearly not perfectly
distributed.

I have now replaced 10 disks with larger ones, and after finshing syncing &
then running the upmap balancer i am having similar results. The upmap
optimizer did a few optimizations, but now says "Error EALREADY: Unable to
find further optimization,or distribution is already perfect".

Looking at a snippet from "ceph osd df tree"... you can see it's not quite
perfect. I am wondering if this could be because of the size difference
between OSD's ? as i am running disks ranging from 1-10TB in the same host.

17   hdd   1.09200  1.00000 1.09TiB  741GiB  377GiB 66.27 1.09  13
        osd.17
18   hdd   1.09200  1.00000 1.09TiB  747GiB  370GiB 66.86 1.10  13
        osd.18
19   hdd   1.09200  1.00000 1.09TiB  572GiB  546GiB 51.20 0.84  10
        osd.19
23   hdd   2.72899  1.00000 2.73TiB 1.70TiB 1.03TiB 62.21 1.02  31
        osd.23
29   hdd   1.09200  1.00000 1.09TiB  627GiB  491GiB 56.11 0.92  11
        osd.29
30   hdd   1.09200  1.00000 1.09TiB  574GiB  544GiB 51.34 0.84  10
        osd.30
32   hdd   2.72899  1.00000 2.73TiB 1.73TiB 1023GiB 63.41 1.04  31
        osd.32
43   hdd   2.72899  1.00000 2.73TiB 1.57TiB 1.16TiB 57.37 0.94  28
        osd.43
45   hdd   2.72899  1.00000 2.73TiB 1.68TiB 1.05TiB 61.51 1.01  30
        osd.45


Config keys are as follows
"mgr/balancer/max_misplaced" = 1
"mgr/balancer/upmax_max_deviation" = 0.0001
"mgr/balancer/upmax_max_iterations" = 1000

Any ideas what could cause this ? Any info i can give to help diagnose ?

On Fri, Mar 15, 2019 at 3:48 PM David Turner <drakonstein at gmail.com> wrote:

> Why do you think that it can't resolve this by itself?  You just said that
> the balancer was able to provide an optimization, but then that the
> distribution isn't perfect.  When there are no further optimizations,
> running `ceph balancer optimize plan` won't create a plan with any
> changes.  Possibly the active mgr needs a kick.  When my cluster isn't
> balancing when it's supposed to, I just run `ceph mgr fail {active mgr}`
> and within a minute or so the cluster is moving PGs around.
>
> On Sat, Mar 9, 2019 at 8:05 PM Kári Bertilsson <karibertils at gmail.com>
> wrote:
>
>> Thanks
>>
>> I did apply https://github.com/ceph/ceph/pull/26179.
>>
>> Running manual upmap commands work now. I did run "ceph balancer
>> optimize new"and It did add a few upmaps.
>>
>> But now another issue. Distribution is far from perfect but the balancer
>> can't find further optimization.
>> Specifically OSD 23 is getting way more pg's than the other 3tb OSD's.
>>
>> See https://pastebin.com/f5g5Deak
>>
>> On Fri, Mar 1, 2019 at 10:25 AM <xie.xingguo at zte.com.cn> wrote:
>>
>>> > Backports should be available in v12.2.11.
>>>
>>> s/v12.2.11/ v12.2.12/
>>>
>>> Sorry for the typo.
>>>
>>>
>>>
>>>
>>> 原始邮件
>>> *发件人:*谢型果10072465
>>> *收件人:*dan at vanderster.com <dan at vanderster.com>;
>>> *抄送人:*ceph-users at lists.ceph.com <ceph-users at lists.ceph.com>;
>>> *日 期 :*2019年03月01日 17:09
>>> *主 题 :**Re: [ceph-users] ceph osd pg-upmap-items not working*
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> See  <https://github.com/ceph/ceph/pull/26179>
>>> <https://github.com/ceph/ceph/pull/26179>
>>> <https://github.com/ceph/ceph/pull/26179>
>>> <https://github.com/ceph/ceph/pull/26179>
>>> https://github.com/ceph/ceph/pull/26179
>>>
>>> Backports should be available in v12.2.11.
>>>
>>> Or you can manually do it by simply adopting
>>> <https://github.com/ceph/ceph/pull/26127>
>>> <https://github.com/ceph/ceph/pull/26127>
>>> <https://github.com/ceph/ceph/pull/26127>
>>> <https://github.com/ceph/ceph/pull/26127>
>>> <https://github.com/ceph/ceph/pull/26127>
>>> https://github.com/ceph/ceph/pull/26127   if you are eager to get out
>>> of the trap right now.
>>>
>>> <https://github.com/ceph/ceph/pull/26179>
>>> <https://github.com/ceph/ceph/pull/26179>
>>> <https://github.com/ceph/ceph/pull/26179>
>>>
>>> <https://github.com/ceph/ceph/pull/26179>
>>> <https://github.com/ceph/ceph/pull/26179>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *发件人:*DanvanderSter <dan at vanderster.com>
>>> *收件人:*Kári Bertilsson <karibertils at gmail.com>;
>>> *抄送人:*ceph-users <ceph-users at lists.ceph.com>;谢型果10072465;
>>> *日 期 :*2019年03月01日 14:48
>>> *主 题 :**Re: [ceph-users] ceph osd pg-upmap-items not working*
>>> It looks like that somewhat unusual crush rule is confusing the new
>>> upmap cleaning.
>>> (debug_mon 10 on the active mon should show those cleanups).
>>>
>>>
>>> I'm copying Xie Xingguo, and probably you should create a tracker for this.
>>>
>>> -- dan
>>>
>>>
>>>
>>>
>>> On Fri, Mar 1, 2019 at 3:12 AM Kári Bertilsson <karibertils at gmail.com
>>> > wrote:
>>> >
>>> > This is the pool
>>>
>>> > pool 41 'ec82_pool' erasure size 10 min_size 8 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512 last_change 63794 lfor 21731/21731 flags hashpspool,ec_overwrites stripe_width 32768 application cephfs
>>> >        removed_snaps [1~5]
>>> >
>>> > Here is the relevant crush rule:
>>>
>>> > rule ec_pool { id 1 type erasure min_size 3 max_size 10 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class hdd step choose indep 5 type host step choose indep 2 type osd step emit }
>>> >
>>>
>>> > Both OSD 23 and 123 are in the same host. So this change should be perfectly acceptable by the rule set.
>>>
>>> > Something must be blocking the change, but i can't find anything about it in any logs.
>>> >
>>> > - Kári
>>> >
>>> > On Thu, Feb 28, 2019 at 8:07 AM Dan van der Ster <dan at vanderster.com
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> pg-upmap-items became more strict in v12.2.11 when validating upmaps.
>>> >> E.g., it now won't let you put two PGs in the same rack if the crush
>>> >> rule doesn't allow it.
>>> >>
>>>
>>> >> Where are OSDs 23 and 123 in your cluster? What is the relevant crush rule?
>>> >>
>>> >> -- dan
>>> >>
>>> >>
>>> >> On Wed, Feb 27, 2019 at 9:17 PM Kári Bertilsson <
>>> karibertils at gmail.com> wrote:
>>> >> >
>>> >> > Hello
>>> >> >
>>>
>>> >> > I am trying to diagnose why upmap stopped working where it was previously working fine.
>>> >> >
>>> >> > Trying to move pg 41.1 to 123 has no effect and seems to be ignored.
>>> >> >
>>> >> > # ceph osd pg-upmap-items 41.1 23 123
>>> >> > set 41.1 pg_upmap_items mapping to [23->123]
>>> >> >
>>>
>>> >> > No rebalacing happens and if i run it again it shows the same output every time.
>>> >> >
>>> >> > I have in config
>>> >> >         debug mgr = 4/5
>>> >> >         debug mon = 4/5
>>> >> >
>>> >> > Paste from mon & mgr logs. Also output from "ceph osd dump"
>>> >> > https://pastebin.com/9VrT4YcU
>>> >> >
>>> >> >
>>>
>>> >> > I have run "ceph osd set-require-min-compat-client luminous" long time ago. And all servers running ceph have been rebooted numerous times since then.
>>>
>>> >> > But somehow i am still seeing "min_compat_client jewel". I believe that upmap was previously working anyway with that "jewel" line present.
>>> >> >
>>>
>>> >> > I see no indication in any logs why the upmap commands are being ignored.
>>> >> >
>>> >> > Any suggestions on how to debug further or what could be the issue ?
>>> >> > _______________________________________________
>>> >> > ceph-users mailing list
>>> >> > ceph-users at lists.ceph.com
>>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>> > _______________________________________________
>>> > ceph-users mailing list
>>> > ceph-users at lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20190318/61066f1a/attachment.html>


More information about the ceph-users mailing list