[ceph-users] Balancer module not balancing perfectly

Steve Taylor steve.taylor at storagecraft.com
Tue Nov 6 08:02:12 PST 2018


I ended up balancing my osdmap myself offline to figure out why the balancer couldn't do better. I had similar issues with osdmaptool, which of course is what I expected, but it's a lot easier to run osdmaptool in a debugger to see what's happening. When I dug into the upmap code I discovered that my problem was due to the way that code balances OSDs. In my case the average PG count per OSD is 56.882, so as soon as any OSD had 56 PGs it wouldn't get any more no matter what I used as my max deviation. I got into a state where each OSD had 56-61 PGs, and the upmap code wouldn't do any better because there were no "underfull" OSDs onto which to move PGs.

I made some changes to the osdmap code to insure the computed "overfull" and "underfull" OSD lists were the same size even if the least or most full OSDs were within the expected deviation in order to allow those outside of the expected deviation some relief, and it worked nicely. I have two independent, production pools that were both in this state, and now every OSD across both pools has 56 or 57 PGs as expected.

I intend to put together a pull request to push this upstream. I haven't reviewed the balancer module code to see how it's doing things, but assuming it uses osdmaptool or the same upmap code as osdmaptool this should also improve the balancer module.


________________________________

[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |


________________________________
If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited.
________________________________


On Tue, 2018-11-06 at 12:23 +0700, Konstantin Shalygin wrote:

From the balancer module's code for v 12.2.7 I noticed [1] these lines

which reference [2] these 2 config options for upmap. You might try using

more max iterations or a smaller max deviation to see if you can get a

better balance in your cluster. I would try to start with [3] these

commands/values and see if it improves your balance and/or allows you to

generate a better map.


[1]

https://github.com/ceph/ceph/blob/v12.2.7/src/pybind/mgr/balancer/module.py#L671-L672

[2] upmap_max_iterations (default 10)

upmap_max_deviation (default .01)

[3] ceph config-key set mgr/balancer/upmap_max_iterations 50

ceph config-key set mgr/balancer/upmap_max_deviation .005


This was not help to my 12.2.8 cluster. When first iterations of balancing was performing I decreased max_misplaced from default 0.05 to 0.01. After this balancing operations was stopped.

After cluster is HEALTH_OK, I not see no any balancer run's. I'll try to lower balancer variables and restart mgr - message is still: "Error EALREADY: Unable to find further optimization,or distribution is already perfect"

# ceph config-key dump | grep balancer
    "mgr/balancer/active": "1",
    "mgr/balancer/max_misplaced": ".50",
    "mgr/balancer/mode": "upmap",
    "mgr/balancer/upmap_max_deviation": ".001",
    "mgr/balancer/upmap_max_iterations": "100",


So may be I need delete upmaps and start over?


ID  CLASS WEIGHT    REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS TYPE NAME
 -1       414.00000        -  445TiB  129TiB  316TiB 29.01 1.00   - root default
 -7       414.00000        -  445TiB  129TiB  316TiB 29.01 1.00   -     datacenter rtcloud
 -8       138.00000        -  148TiB 42.9TiB  105TiB 28.93 1.00   -         rack rack2
 -2        69.00000        - 74.2TiB 21.5TiB 52.7TiB 28.93 1.00   -             host ceph-osd0
  0   hdd   5.00000  1.00000 5.46TiB 1.64TiB 3.82TiB 30.06 1.04  62                 osd.0
  4   hdd   5.00000  1.00000 5.46TiB 1.65TiB 3.80TiB 30.29 1.04  64                 osd.4
  7   hdd   5.00000  1.00000 5.46TiB 1.61TiB 3.85TiB 29.44 1.01  63                 osd.7
  9   hdd   5.00000  1.00000 5.46TiB 1.68TiB 3.78TiB 30.77 1.06  63                 osd.9
 46   hdd   5.00000  1.00000 5.46TiB 1.68TiB 3.77TiB 30.86 1.06  65                 osd.46
 47   hdd   5.00000  1.00000 5.46TiB 1.68TiB 3.78TiB 30.73 1.06  66                 osd.47
 48   hdd   5.00000  1.00000 5.46TiB 1.65TiB 3.81TiB 30.22 1.04  66                 osd.48
 49   hdd   5.00000  1.00000 5.46TiB 1.71TiB 3.74TiB 31.41 1.08  65                 osd.49
 54   hdd   5.00000  1.00000 5.46TiB 1.64TiB 3.82TiB 30.08 1.04  65                 osd.54
 55   hdd   5.00000  1.00000 5.46TiB 1.65TiB 3.80TiB 30.30 1.04  64                 osd.55
 56   hdd   5.00000  1.00000 5.46TiB 1.66TiB 3.80TiB 30.35 1.05  64                 osd.56
 57   hdd   5.00000  1.00000 5.46TiB 1.63TiB 3.83TiB 29.81 1.03  64                 osd.57
 24  nvme   3.00000  1.00000 2.89TiB  559GiB 2.34TiB 18.88 0.65  63                 osd.24
 74  nvme   3.00000  1.00000 2.89TiB  526GiB 2.38TiB 17.76 0.61  67                 osd.74
 84  nvme   3.00000  1.00000 2.89TiB  522GiB 2.38TiB 17.63 0.61  66                 osd.84
 -6        69.00000        - 74.2TiB 21.5TiB 52.7TiB 28.94 1.00   -             host ceph-osd2
 12   hdd   5.00000  1.00000 5.46TiB 1.66TiB 3.80TiB 30.35 1.05  66                 osd.12
 15   hdd   5.00000  1.00000 5.46TiB 1.67TiB 3.79TiB 30.58 1.05  68                 osd.15
 18   hdd   5.00000  1.00000 5.46TiB 1.64TiB 3.82TiB 30.05 1.04  65                 osd.18
 19   hdd   5.00000  1.00000 5.46TiB 1.59TiB 3.86TiB 29.21 1.01  64                 osd.19
 50   hdd   5.00000  1.00000 5.46TiB 1.63TiB 3.83TiB 29.84 1.03  65                 osd.50
 51   hdd   5.00000  1.00000 5.46TiB 1.72TiB 3.74TiB 31.44 1.08  66                 osd.51
 52   hdd   5.00000  1.00000 5.46TiB 1.70TiB 3.75TiB 31.24 1.08  64                 osd.52
 53   hdd   5.00000  1.00000 5.46TiB 1.60TiB 3.86TiB 29.36 1.01  64                 osd.53
 58   hdd   5.00000  1.00000 5.46TiB 1.71TiB 3.74TiB 31.38 1.08  64                 osd.58
 59   hdd   5.00000  1.00000 5.46TiB 1.66TiB 3.80TiB 30.37 1.05  66                 osd.59
 60   hdd   5.00000  1.00000 5.46TiB 1.60TiB 3.85TiB 29.38 1.01  66                 osd.60
 61   hdd   5.00000  1.00000 5.46TiB 1.71TiB 3.75TiB 31.31 1.08  66                 osd.61
 75  nvme   3.00000  1.00000 2.89TiB  535GiB 2.37TiB 18.06 0.62  62                 osd.75
 85  nvme   3.00000  1.00000 2.89TiB  510GiB 2.39TiB 17.24 0.59  63                 osd.85
 86  nvme   3.00000  1.00000 2.89TiB  560GiB 2.34TiB 18.92 0.65  66                 osd.86
 -9       138.00000        -  148TiB 43.2TiB  105TiB 29.10 1.00   -         rack rack3
 -3        69.00000        - 74.2TiB 21.6TiB 52.5TiB 29.18 1.01   -             host ceph-osd3
 20   hdd   5.00000  1.00000 5.46TiB 1.66TiB 3.79TiB 30.50 1.05  69                 osd.20
 21   hdd   5.00000  1.00000 5.46TiB 1.68TiB 3.77TiB 30.85 1.06  64                 osd.21
 22   hdd   5.00000  1.00000 5.46TiB 1.72TiB 3.74TiB 31.43 1.08  65                 osd.22
 23   hdd   5.00000  1.00000 5.46TiB 1.62TiB 3.83TiB 29.75 1.03  64                 osd.23
 34   hdd   5.00000  1.00000 5.46TiB 1.66TiB 3.80TiB 30.33 1.05  65                 osd.34
 35   hdd   5.00000  1.00000 5.46TiB 1.66TiB 3.79TiB 30.47 1.05  65                 osd.35
 36   hdd   5.00000  1.00000 5.46TiB 1.67TiB 3.79TiB 30.54 1.05  65                 osd.36
 37   hdd   5.00000  1.00000 5.46TiB 1.69TiB 3.76TiB 31.03 1.07  64                 osd.37
 62   hdd   5.00000  1.00000 5.46TiB 1.71TiB 3.74TiB 31.42 1.08  67                 osd.62
 63   hdd   5.00000  1.00000 5.46TiB 1.66TiB 3.80TiB 30.43 1.05  66                 osd.63
 64   hdd   5.00000  1.00000 5.46TiB 1.66TiB 3.80TiB 30.38 1.05  65                 osd.64
 65   hdd   5.00000  1.00000 5.46TiB 1.64TiB 3.82TiB 30.10 1.04  64                 osd.65
 30  nvme   3.00000  1.00000 2.89TiB  562GiB 2.34TiB 19.00 0.65  65                 osd.30
 76  nvme   3.00000  1.00000 2.89TiB  531GiB 2.37TiB 17.96 0.62  65                 osd.76
 88  nvme   3.00000  1.00000 2.89TiB  546GiB 2.36TiB 18.43 0.64  68                 osd.88
-11        69.00000        - 74.2TiB 21.5TiB 52.6TiB 29.01 1.00   -             host ceph-osd5
 10   hdd   5.00000  1.00000 5.46TiB 1.71TiB 3.74TiB 31.39 1.08  64                 osd.10
 13   hdd   5.00000  1.00000 5.46TiB 1.68TiB 3.78TiB 30.77 1.06  65                 osd.13
 16   hdd   5.00000  1.00000 5.46TiB 1.71TiB 3.75TiB 31.32 1.08  64                 osd.16
 17   hdd   5.00000  1.00000 5.46TiB 1.71TiB 3.74TiB 31.42 1.08  69                 osd.17
 27   hdd   5.00000  1.00000 5.46TiB 1.62TiB 3.84TiB 29.62 1.02  62                 osd.27
 31   hdd   5.00000  1.00000 5.46TiB 1.72TiB 3.74TiB 31.53 1.09  64                 osd.31
 32   hdd   5.00000  1.00000 5.46TiB 1.63TiB 3.83TiB 29.78 1.03  65                 osd.32
 33   hdd   5.00000  1.00000 5.46TiB 1.68TiB 3.78TiB 30.73 1.06  67                 osd.33
 70   hdd   5.00000  1.00000 5.46TiB 1.65TiB 3.80TiB 30.31 1.05  65                 osd.70
 71   hdd   5.00000  1.00000 5.46TiB 1.64TiB 3.82TiB 30.06 1.04  65                 osd.71
 72   hdd   5.00000  1.00000 5.46TiB 1.56TiB 3.90TiB 28.61 0.99  61                 osd.72
 73   hdd   5.00000  1.00000 5.46TiB 1.60TiB 3.86TiB 29.27 1.01  65                 osd.73
 29  nvme   3.00000  1.00000 2.89TiB  541GiB 2.36TiB 18.27 0.63  65                 osd.29
 78  nvme   3.00000  1.00000 2.89TiB  541GiB 2.36TiB 18.28 0.63  64                 osd.78
 89  nvme   3.00000  1.00000 2.89TiB  562GiB 2.34TiB 19.00 0.66  63                 osd.89
-10       138.00000        -  148TiB 43.0TiB  105TiB 28.99 1.00   -         rack rack4
 -4        69.00000        - 74.2TiB 21.2TiB 52.9TiB 28.65 0.99   -             host ceph-osd1
  1   hdd   5.00000  1.00000 5.46TiB 1.65TiB 3.81TiB 30.16 1.04  67                 osd.1
  2   hdd   5.00000  1.00000 5.46TiB 1.63TiB 3.83TiB 29.82 1.03  64                 osd.2
  3   hdd   5.00000  1.00000 5.46TiB 1.62TiB 3.83TiB 29.74 1.03  62                 osd.3
  5   hdd   5.00000  1.00000 5.46TiB 1.59TiB 3.86TiB 29.20 1.01  63                 osd.5
 38   hdd   5.00000  1.00000 5.46TiB 1.64TiB 3.82TiB 30.02 1.03  63                 osd.38
 39   hdd   5.00000  1.00000 5.46TiB 1.62TiB 3.83TiB 29.77 1.03  63                 osd.39
 40   hdd   5.00000  1.00000 5.46TiB 1.68TiB 3.78TiB 30.79 1.06  63                 osd.40
 41   hdd   5.00000  1.00000 5.46TiB 1.69TiB 3.76TiB 31.04 1.07  67                 osd.41
 80   hdd   5.00000  1.00000 5.46TiB 1.65TiB 3.81TiB 30.17 1.04  69                 osd.80
 81   hdd   5.00000  1.00000 5.46TiB 1.61TiB 3.84TiB 29.56 1.02  68                 osd.81
 82   hdd   5.00000  1.00000 5.46TiB 1.70TiB 3.76TiB 31.06 1.07  65                 osd.82
 83   hdd   5.00000  1.00000 5.46TiB 1.56TiB 3.90TiB 28.58 0.99  65                 osd.83
 25  nvme   3.00000  1.00000 2.89TiB  558GiB 2.34TiB 18.87 0.65  65                 osd.25
 79  nvme   3.00000  1.00000 2.89TiB  541GiB 2.36TiB 18.29 0.63  63                 osd.79
 87  nvme   3.00000  1.00000 2.89TiB  540GiB 2.36TiB 18.26 0.63  65                 osd.87
 -5        69.00000        - 74.2TiB 21.8TiB 52.4TiB 29.34 1.01   -             host ceph-osd4
  6   hdd   5.00000  1.00000 5.46TiB 1.67TiB 3.79TiB 30.62 1.06  66                 osd.6
  8   hdd   5.00000  1.00000 5.46TiB 1.64TiB 3.82TiB 30.04 1.04  63                 osd.8
 11   hdd   5.00000  1.00000 5.46TiB 1.66TiB 3.80TiB 30.41 1.05  66                 osd.11
 14   hdd   5.00000  1.00000 5.46TiB 1.71TiB 3.75TiB 31.36 1.08  66                 osd.14
 42   hdd   5.00000  1.00000 5.46TiB 1.69TiB 3.77TiB 30.95 1.07  65                 osd.42
 43   hdd   5.00000  1.00000 5.46TiB 1.71TiB 3.75TiB 31.36 1.08  65                 osd.43
 44   hdd   5.00000  1.00000 5.46TiB 1.67TiB 3.78TiB 30.66 1.06  67                 osd.44
 45   hdd   5.00000  1.00000 5.46TiB 1.66TiB 3.80TiB 30.38 1.05  62                 osd.45
 66   hdd   5.00000  1.00000 5.46TiB 1.66TiB 3.80TiB 30.33 1.05  65                 osd.66
 67   hdd   5.00000  1.00000 5.46TiB 1.72TiB 3.74TiB 31.43 1.08  67                 osd.67
 68   hdd   5.00000  1.00000 5.46TiB 1.71TiB 3.75TiB 31.32 1.08  65                 osd.68
 69   hdd   5.00000  1.00000 5.46TiB 1.64TiB 3.82TiB 30.10 1.04  62                 osd.69
 26  nvme   3.00000  1.00000 2.89TiB  559GiB 2.34TiB 18.89 0.65  66                 osd.26
 28  nvme   3.00000  1.00000 2.89TiB  563GiB 2.34TiB 19.01 0.66  66                 osd.28
 77  nvme   3.00000  1.00000 2.89TiB  541GiB 2.36TiB 18.30 0.63  66                 osd.77
                       TOTAL  445TiB  129TiB  316TiB 29.01
MIN/MAX VAR: 0.59/1.09  STDDEV: 4.96




k
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181106/9c695463/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg
Type: image/jpeg
Size: 4776 bytes
Desc: SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181106/9c695463/attachment.jpg>


More information about the ceph-users mailing list