[ceph-users] OSD is near full and slow in accessing storage from client

David Turner drakonstein at gmail.com
Wed Nov 22 06:31:26 PST 2017


Yes, increasing the PG count for the data pool will be what you want to do
when you add osds to your cluster.

On Wed, Nov 22, 2017, 9:25 AM gjprabu <gjprabu at zohocorp.com> wrote:

> Hi David,
>
>          Thanks, will check osd weight settings and we are not using rbd
> and will delete. As per the pg calculation for the 8 osd we should keep 512
> pg but our cause unfortunately have set 256 for meta data and 256 for
> data.  Now is that ok to increase the pg count in data pool alone . if we
> need to add more osd we should required to increase pg count. Please
> suggest.
>
> Regards
> Prabu GJ
>
>
> ---- On Tue, 21 Nov 2017 21:43:13 +0530 *David Turner
> <drakonstein at gmail.com <drakonstein at gmail.com>>* wrote ----
>
> Your rbd pool can be removed (unless you're planning to use it) which will
> delete those PGs from your cluster/OSDs.  Also all of your backfilling
> finished and has settled.  Now you just need to work on balancing the
> weights for the OSDs in your cluster.
>
> There are multiple ways to balance the usage of the clusters.  Changing
> the crush weight of the OSD, changing the reweight of the OSD, doing that
> by using `ceph osd reweight-by-utilization`, doing that by using Cern's
> modified version of that which can weight things up as well as down, etc.
> I use a method that changes the crush weight of the OSD, but does so by
> downloading the crush map and using the crushtool to generate a balanced
> map and do it in one go.  A very popular method on the list is to create a
> cron that does very small modifications in the background and keeps things
> balanced by utilization.
>
> You should be able to find a lot of references in the ML or in blog posts
> about doing these various options.  The take away is that the CRUSH
> algorithm is putting too much data on osd.4 and not enough data on osd.2
> (those are the extremes, but there are others not quite as extreme) and you
> need to modify the weight and/or reweight of the osd to help the algorithm
> balance that out.
>
> On Tue, Nov 21, 2017 at 12:11 AM gjprabu <gjprabu at zohocorp.com> wrote:
>
>
> Hi David,
>
>            This is our current status.
>
>
> ~]# ceph status
>     cluster b466e09c-f7ae-4e89-99a7-99d30eba0a13
>      health HEALTH_WARN
>             mds0: Client integ-hm3 failing to respond to cache pressure
>             mds0: Client integ-hm9-bkp failing to respond to cache pressure
>             mds0: Client me-build1-bkp failing to respond to cache pressure
>      monmap e2: 3 mons at {intcfs-mon1=
> 192.168.113.113:6789/0,intcfs-mon2=192.168.113.114:6789/0,intcfs-mon3=192.168.113.72:6789/0
> }
>             election epoch 16, quorum 0,1,2
> intcfs-mon3,intcfs-mon1,intcfs-mon2
>       fsmap e177798: 1/1/1 up {0=intcfs-osd1=up:active}, 1 up:standby
>      osdmap e4388: 8 osds: 8 up, 8 in
>             flags sortbitwise
>       pgmap v24129785: 564 pgs, 3 pools, 6885 GB data, 17138 kobjects
>             14023 GB used, 12734 GB / 26757 GB avail
>                  560 active+clean
>                    3 active+clean+scrubbing
>                    1 active+clean+scrubbing+deep
>   client io 47187 kB/s rd, 965 kB/s wr, 125 op/s rd, 525 op/s wr
>
> ]# ceph df
> GLOBAL:
>     SIZE       AVAIL      RAW USED     %RAW USED
>     26757G     12735G       14022G         52.41
> POOLS:
>     NAME                   ID     USED       %USED     MAX AVAIL
> OBJECTS
>     rbd                    0           0         0
> 3787G            0
>     downloads_data         3       6885G     51.46         3787G
> 16047944
>     downloads_metadata     4      84773k         0         3787G
> 1501805
>
>
> Regards
> Prabu GJ
>
> ---- On Mon, 20 Nov 2017 21:35:17 +0530 *David Turner
> <drakonstein at gmail.com <drakonstein at gmail.com>>* wrote ----
>
>
> What is your current `ceph status` and `ceph df`? The status of your
> cluster has likely changed a bit in the last week.
>
> On Mon, Nov 20, 2017 at 6:00 AM gjprabu <gjprabu at zohocorp.com> wrote:
>
>
> Hi David,
>
>             Sorry for the late reply and its completed OSD Sync and more
> ever still fourth OSD available size is keep reducing. Is there any option
> to check or fix .
>
>
> ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS
>
> 0 3.29749  1.00000  3376G  2320G  1056G 68.71 1.10 144
> 1 3.26869  1.00000  3347G  1871G  1475G 55.92 0.89 134
> 2 3.27339  1.00000  3351G  1699G  1652G 50.69 0.81 134
> 3 3.24089  1.00000  3318G  1865G  1452G 56.22 0.90 142
> 4 3.24089  1.00000  3318G  2839G   478G 85.57 1.37 158
> 5 3.32669  1.00000  3406G  2249G  1156G 66.04 1.06 136
> 6 3.27800  1.00000  3356G  1924G  1432G 57.33 0.92 139
> 7 3.20470  1.00000  3281G  1949G  1331G 59.42 0.95 141
>               TOTAL 26757G 16720G 10037G 62.49
> MIN/MAX VAR: 0.81/1.37  STDDEV: 10.26
>
>
> Regards
> Prabu GJ
>
>
>
> ---- On Mon, 13 Nov 2017 00:27:47 +0530 *David Turner
> <drakonstein at gmail.com <drakonstein at gmail.com>>* wrote ----
>
> You cannot reduce the PG count for a pool.  So there isn't anything you
> can really do for this unless you create a new FS with better PG counts and
> migrate your data into it.
>
> The problem with having more PGs than you need is in the memory footprint
> for the osd daemon. There are warning thresholds for having too many PGs
> per osd.  Also in future expansions, if you need to add pools, you might
> not be able to create the pools with the proper amount of PGs due to older
> pools that have way too many PGs.
>
> It would still be nice to see the output from those commands I asked about.
>
> The built-in reweighting scripts might help your data distribution.
> reweight-by-utilization
>
> On Sun, Nov 12, 2017, 11:41 AM gjprabu <gjprabu at zohocorp.com> wrote:
>
>
> Hi David,
>
> Thanks for your valuable reply , once complete the backfilling for new osd
> and will consider by increasing replica value asap. Is it possible to
> decrease the metadata pg count ?  if the pg count for metadata for value
> same as data count what kind of issue may occur ?
>
> Regards
> PrabuGJ
>
>
> ---- On Sun, 12 Nov 2017 21:25:05 +0530 David Turner<drakonstein at gmail.com>
> wrote ----
>
> What's the output of `ceph df` to see if your PG counts are good or not?
> Like everyone else has said, the space on the original osds can't be
> expected to free up until the backfill from adding the new osd has finished.
>
> You don't have anything in your cluster health to indicate that your
> cluster will not be able to finish this backfilling operation on its own.
>
> You might find this URL helpful in calculating your PG counts.
> http://ceph.com/pgcalc/  As a side note. It is generally better to keep
> your PG counts as base 2 numbers (16, 64, 256, etc). When you do not have a
> base 2 number then some of your PGs will take up twice as much space as
> others. In your case with 250, you have 244 PGs that are the same size and
> 6 PGs that are twice the size of those 244 PGs.  Bumping that up to 256
> will even things out.
>
> Assuming that the metadata pool is for a CephFS volume, you do not need
> nearly so many PGs for that pool. Also, I would recommend changing at least
> the metadata pool to 3 replica_size. If we can talk you into 3 replica for
> everything else, great! But if not, at least do the metadata pool. If you
> lose an object in the data pool, you just lose that file. If you lose an
> object in the metadata pool, you might lose access to the entire CephFS
> volume.
>
> On Sun, Nov 12, 2017, 9:39 AM gjprabu <gjprabu at zohocorp.com> wrote:
>
>
> Hi Cassiano,
>
>        Thanks for your valuable feedback and will wait for some time till
> new osd sync get complete. Also for by increasing pg count it is the issue
> will solve? our setup pool size for data and metadata pg number is 250. Is
> this correct for 7 OSD with 2 replica. Also currently stored data size is
> 17TB.
>
> ceph osd df
> ID WEIGHT  REWEIGHT SIZE   USE    AVAIL %USE  VAR  PGS
> 0 3.29749  1.00000  3376G  2814G  562G 83.35 1.23 165
> 1 3.26869  1.00000  3347G  1923G 1423G 57.48 0.85 152
> 2 3.27339  1.00000  3351G  1980G 1371G 59.10 0.88 161
> 3 3.24089  1.00000  3318G  2131G 1187G 64.23 0.95 168
> 4 3.24089  1.00000  3318G  2998G  319G 90.36 1.34 176
> 5 3.32669  1.00000  3406G  2476G  930G 72.68 1.08 165
> 6 3.27800  1.00000  3356G  1518G 1838G 45.24 0.67 166
>               TOTAL 23476G 15843G 7632G 67.49
> MIN/MAX VAR: 0.67/1.34  STDDEV: 14.53
>
> ceph osd tree
> ID WEIGHT   TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 22.92604 root default
> -2  3.29749     host intcfs-osd1
> 0  3.29749         osd.0             up  1.00000          1.00000
> -3  3.26869     host intcfs-osd2
> 1  3.26869         osd.1             up  1.00000          1.00000
> -4  3.27339     host intcfs-osd3
> 2  3.27339         osd.2             up  1.00000          1.00000
> -5  3.24089     host intcfs-osd4
> 3  3.24089         osd.3             up  1.00000          1.00000
> -6  3.24089     host intcfs-osd5
> 4  3.24089         osd.4             up  1.00000          1.00000
> -7  3.32669     host intcfs-osd6
> 5  3.32669         osd.5             up  1.00000          1.00000
> -8  3.27800     host intcfs-osd7
> 6  3.27800         osd.6             up  1.00000          1.00000
>
> *ceph osd pool ls detail*
>
> pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
> pool 3 '*downloads_data*' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins* pg_num 250 pgp_num 250* last_change 39 flags
> hashpspool crash_replay_interval 45 stripe_width 0
> pool 4 '*downloads_metadata*' replicated size 2 min_size 1 crush_ruleset
> 0 object_hash rjenkins *pg_num 250 pgp_num 250 *last_change 36 flags
> hashpspool stripe_width 0
>
> Regards
> Prabu GJ
>
> ---- On Sun, 12 Nov 2017 19:20:34 +0530 *Cassiano Pilipavicius
> <cassiano at tips.com.br <cassiano at tips.com.br>>* wrote ----
>
>
> I am also not an expert, but it looks like you have big data volumes on
> few PGs, from what I've seen, the pg data is only deleted from the old OSD
> when is completed copied to the new osd.
>
> So, if 1 pg have 100G por example, only when it is fully copied to the new
> OSD, the space will be released on the old OSD.
>
> If you have a busy cluster/network, it may take a good while. Maybe just
> wait a litle and check from time to time and the space will eventually be
> released.
>
> Em 11/12/2017 11:44 AM, Sébastien VIGNERON escreveu:
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> I’m not an expert either so if someone in the list have some ideas on this
> problem, don’t be shy, share them with us.
>
> For now, I only have hypothese that the OSD space will be recovered as
> soon as the recovery process is complete.
> Hope everything will get back in order soon (before reaching 95% or above).
>
> I saw some messages on the list about the fstrim tool which can help
> reclaim unused free space, but i don’t know if it’s apply to your case.
>
> Cordialement / Best regards,
>
> Sébastien VIGNERON
> CRIANN,
> Ingénieur / Engineer
> Technopôle du Madrillet
> 745, avenue de l'Université
> <https://maps.google.com/?q=745,+avenue+de+l'Universit%C3%A9%C2%A0+76800+Saint-Etienne+du+Rouvray+-+France&entry=gmail&source=g>
>
> 76800 Saint-Etienne du Rouvray - France
> <https://maps.google.com/?q=745,+avenue+de+l'Universit%C3%A9%C2%A0+76800+Saint-Etienne+du+Rouvray+-+France&entry=gmail&source=g>
>
> tél. +33 2 32 91 42 91
> fax. +33 2 32 91 42 92
> http://www.criann.fr
> mailto:sebastien.vigneron at criann.fr <sebastien.vigneron at criann.fr>
> support: support at criann.fr
>
> Le 12 nov. 2017 à 13:29, gjprabu <gjprabu at zohocorp.com> a écrit :
>
> Hi Sebastien,
>
>     Below is the query details. I am not that much expert and still
> learning . pg's are not stuck stat before adding osd and pg are slowly
> clearing stat to active-clean. Today morning there was around
> 53 active+undersized+degraded+remapped+wait_backfill and now it is 21 only,
> hope its going on and i am seeing the space keep increasing in newly added
> OSD (osd.6)
>
>
> ID WEIGHT  REWEIGHT SIZE   USE    AVAIL %USE  VAR  PGS
> *0 3.29749  1.00000  3376G  2814G  562G 83.35 1.23 165  ( Available Spaces
> not reduced after adding new OSD)*
> 1 3.26869  1.00000  3347G  1923G 1423G 57.48 0.85 152
> 2 3.27339  1.00000  3351G  1980G 1371G 59.10 0.88 161
> 3 3.24089  1.00000  3318G  2131G 1187G 64.23 0.95 168
> *4 3.24089  1.00000  3318G  2998G  319G 90.36 1.34 176  ( Available Spaces
> not reduced after adding new OSD)*
> *5 3.32669  1.00000  3406G  2476G  930G 72.68 1.08 165  ( Available Spaces
> not reduced after adding new OSD)*
> 6 3.27800  1.00000  3356G  1518G 1838G 45.24 0.67 166
>               TOTAL 23476G 15843G 7632G 67.49
> MIN/MAX VAR: 0.67/1.34  STDDEV: 14.53
>
> ...
>
>
>
> _______________________________________________ ceph-users mailing list ceph-users at lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171122/27c2b4ec/attachment.html>


More information about the ceph-users mailing list