[ceph-users] OSD is near full and slow in accessing storage from client

gjprabu gjprabu at zohocorp.com
Sun Nov 12 01:59:28 PST 2017


Hi Sebastien



 Thanks for you reply , yes undersize pgs and recovery in process becuase of we added new osd after getting 2 OSD is near full warning .   Yes newly added osd is reblancing the size.





[root at intcfs-osd6 ~]# ceph osd df

ID WEIGHT  REWEIGHT SIZE   USE    AVAIL %USE  VAR  PGS

0 3.29749  1.00000  3376G  2875G  501G 85.15 1.26 165

1 3.26869  1.00000  3347G  1923G 1423G 57.46 0.85 152

2 3.27339  1.00000  3351G  1980G 1371G 59.08 0.88 161

3 3.24089  1.00000  3318G  2130G 1187G 64.21 0.95 168

4 3.24089  1.00000  3318G  2997G  320G 90.34 1.34 176

5 3.32669  1.00000  3406G  2466G  939G 72.42 1.07 165

6 3.27800  1.00000  3356G  1463G 1893G 43.60 0.65 166  



ceph osd crush rule dump



[

    {

        "rule_id": 0,

        "rule_name": "replicated_ruleset",

        "ruleset": 0,

        "type": 1,

        "min_size": 1,

        "max_size": 10,

        "steps": [

            {

                "op": "take",

                "item": -1,

                "item_name": "default"

            },

            {

                "op": "chooseleaf_firstn",

                "num": 0,

                "type": "host"

            },

            {

                "op": "emit"

            }

        ]

    }

]





ceph version 10.2.2 and ceph version 10.2.9





ceph osd pool ls detail



pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0

pool 3 'downloads_data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 250 pgp_num 250 last_change 39 flags hashpspool crash_replay_interval 45 stripe_width 0

pool 4 'downloads_metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 250 pgp_num 250 last_change 36 flags hashpspool stripe_width 0





---- On Sun, 12 Nov 2017 15:04:02 +0530 Sébastien VIGNERON <sebastien.vigneron at criann.fr> wrote ----




Hi,



Can you share:

 - your placement rules: ceph osd crush rule dump

 - your CEPH version: ceph versions

 - your pools definitions: ceph osd pool ls detail



With these we can determine is your pgs are stuck because of a misconfiguration or something else.



You seems to have some undersized pgs and a recovery in process. Does your OSDs showed some rebalance of your datas? Does your OSDs use percentage change over time? (changes in "ceph osd df")



Cordialement / Best regards,



Sébastien VIGNERON 

CRIANN, 

Ingénieur / Engineer

Technopôle du Madrillet 

745, avenue de l'Université 

76800 Saint-Etienne du Rouvray - France 

tél. +33 2 32 91 42 91 

fax. +33 2 32 91 42 92 

http://www.criann.fr 

mailto:sebastien.vigneron at criann.fr

support: support at criann.fr




Le 12 nov. 2017 à 10:04, gjprabu <gjprabu at zohocorp.com> a écrit :



Hi Team,



         We have ceph setup with 6 OSD and we got alert with 2 OSD is near full . We faced issue like slow in accessing ceph from client. So i have added 7th OSD and still 2 OSD is showing near full ( OSD.0 and OSD.4) , I have restarted ceph service in osd.0 and osd.4 .  Kindly check the below ceph osd status and please provide us the solutions. 





# ceph health detail

HEALTH_WARN 46 pgs backfill_wait; 1 pgs backfilling; 32 pgs degraded; 50 pgs stuck unclean; 32 pgs undersized; recovery 1098780/40253637 objects degraded (2.730%); recovery 3401433/40253637 objects misplaced (8.450%); 2 near full osd(s); mds0: Client integ-hm3 failing to respond to cache pressure; mds0: Client integ-hm8 failing to respond to cache pressure; mds0: Client integ-hm2 failing to respond to cache pressure; mds0: Client integ-hm9 failing to respond to cache pressure; mds0: Client integ-hm5 failing to respond to cache pressure; mds0: Client integ-hm9-bkp failing to respond to cache pressure; mds0: Client me-build1-bkp failing to respond to cache pressure



pg 3.f6 is stuck unclean for 511223.069161, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]

pg 4.f6 is stuck unclean for 511232.770419, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]

pg 3.ec is stuck unclean for 510902.815668, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]

pg 3.eb is stuck unclean for 511285.576487, current state active+remapped+wait_backfill, last acting [3,0]

pg 4.17 is stuck unclean for 511235.326709, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]

pg 4.2f is stuck unclean for 511232.356371, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]

pg 4.3d is stuck unclean for 511300.446982, current state active+remapped, last acting [3,0]

pg 4.93 is stuck unclean for 511295.539229, current state active+undersized+degraded+remapped+wait_backfill, last acting [3]

pg 3.47 is stuck unclean for 511288.104965, current state active+remapped+wait_backfill, last acting [3,0]

pg 4.d5 is stuck unclean for 510916.509825, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]

pg 3.31 is stuck unclean for 511221.542878, current state active+remapped+wait_backfill, last acting [0,3]

pg 3.62 is stuck unclean for 511221.551662, current state active+undersized+degraded+remapped+wait_backfill, last acting [4]

pg 4.4d is stuck unclean for 511232.279602, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]

pg 4.48 is stuck unclean for 510911.095367, current state active+remapped+wait_backfill, last acting [5,4]

pg 3.4f is stuck unclean for 511226.712285, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]

pg 3.78 is stuck unclean for 511221.531199, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]

pg 3.24 is stuck unclean for 510903.483324, current state active+remapped+backfilling, last acting [1,2]

pg 4.8c is stuck unclean for 511231.668693, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]

pg 3.b4 is stuck unclean for 511222.612012, current state active+undersized+degraded+remapped+wait_backfill, last acting [0]

pg 4.41 is stuck unclean for 511287.031264, current state active+remapped+wait_backfill, last acting [3,2]

pg 3.d1 is stuck unclean for 510903.797329, current state active+remapped+wait_backfill, last acting [0,3]

pg 3.7f is stuck unclean for 511222.929722, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]

pg 4.af is stuck unclean for 511262.494659, current state active+undersized+degraded+remapped, last acting [0]

pg 3.66 is stuck unclean for 510903.296711, current state active+remapped+wait_backfill, last acting [3,0]

pg 3.76 is stuck unclean for 511224.615144, current state active+undersized+degraded+remapped+wait_backfill, last acting [3]

pg 4.57 is stuck unclean for 511234.514343, current state active+remapped, last acting [0,4]

pg 3.69 is stuck unclean for 511224.672085, current state active+undersized+degraded+remapped+wait_backfill, last acting [4]

pg 3.9a is stuck unclean for 510967.300000, current state active+remapped+wait_backfill, last acting [3,2]

pg 4.50 is stuck unclean for 510903.825565, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]

pg 4.53 is stuck unclean for 510921.975268, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]

pg 3.e7 is stuck unclean for 511221.530592, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]

pg 4.6a is stuck unclean for 510911.284877, current state active+undersized+degraded+remapped+wait_backfill, last acting [0]

pg 4.16 is stuck unclean for 511232.702762, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]

pg 3.2c is stuck unclean for 511222.443893, current state active+remapped+wait_backfill, last acting [2,3]

pg 4.89 is stuck unclean for 511228.846614, current state active+undersized+degraded+remapped+wait_backfill, last acting [4]

pg 4.39 is stuck unclean for 511239.544231, current state active+remapped+wait_backfill, last acting [3,2]

pg 4.ce is stuck unclean for 511232.294586, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]

pg 3.91 is stuck unclean for 511232.341380, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]

pg 3.96 is stuck unclean for 510904.043900, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]

pg 4.c0 is stuck unclean for 510904.253281, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]

pg 4.9c is stuck unclean for 511237.612850, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]

pg 3.ab is stuck unclean for 510960.756324, current state active+remapped+wait_backfill, last acting [3,2]

pg 4.aa is stuck unclean for 511229.307559, current state active+remapped+wait_backfill, last acting [0,3]

pg 3.ad is stuck unclean for 510903.764157, current state active+remapped+wait_backfill, last acting [0,3]

pg 3.b5 is stuck unclean for 511226.560774, current state active+undersized+degraded+remapped+wait_backfill, last acting [3]

pg 4.58 is stuck unclean for 510919.273667, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]

pg 4.b9 is stuck unclean for 511232.760066, current state active+remapped+wait_backfill, last acting [5,4]

pg 3.be is stuck unclean for 511224.422931, current state active+remapped+wait_backfill, last acting [0,4]

pg 4.d4 is stuck unclean for 510962.810416, current state active+undersized+degraded+remapped+wait_backfill, last acting [3]

pg 4.da is stuck unclean for 511259.506962, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]

pg 4.8c is active+undersized+degraded+remapped+wait_backfill, acting [1]

pg 3.7f is active+undersized+degraded+remapped+wait_backfill, acting [1]

pg 3.78 is active+undersized+degraded+remapped+wait_backfill, acting [2]

pg 3.76 is active+undersized+degraded+remapped+wait_backfill, acting [3]

pg 4.6a is active+undersized+degraded+remapped+wait_backfill, acting [0]

pg 3.69 is active+undersized+degraded+remapped+wait_backfill, acting [4]

pg 3.66 is active+remapped+wait_backfill, acting [3,0]

pg 3.62 is active+undersized+degraded+remapped+wait_backfill, acting [4]

pg 4.58 is active+undersized+degraded+remapped+wait_backfill, acting [1]

pg 4.50 is active+undersized+degraded+remapped+wait_backfill, acting [1]

pg 4.53 is active+undersized+degraded+remapped+wait_backfill, acting [2]

pg 3.4f is active+undersized+degraded+remapped+wait_backfill, acting [1]

pg 4.48 is active+remapped+wait_backfill, acting [5,4]

pg 4.4d is active+undersized+degraded+remapped+wait_backfill, acting [2]

pg 3.47 is active+remapped+wait_backfill, acting [3,0]

pg 4.41 is active+remapped+wait_backfill, acting [3,2]

pg 3.31 is active+remapped+wait_backfill, acting [0,3]

pg 4.2f is active+undersized+degraded+remapped+wait_backfill, acting [2]

pg 3.24 is active+remapped+backfilling, acting [1,2]

pg 4.17 is active+undersized+degraded+remapped+wait_backfill, acting [1]

pg 4.16 is active+undersized+degraded+remapped+wait_backfill, acting [1]

pg 3.2c is active+remapped+wait_backfill, acting [2,3]

pg 4.39 is active+remapped+wait_backfill, acting [3,2]

pg 4.89 is active+undersized+degraded+remapped+wait_backfill, acting [4]

pg 3.91 is active+undersized+degraded+remapped+wait_backfill, acting [2]

pg 4.93 is active+undersized+degraded+remapped+wait_backfill, acting [3]

pg 3.96 is active+undersized+degraded+remapped+wait_backfill, acting [2]

pg 3.9a is active+remapped+wait_backfill, acting [3,2]

pg 4.9c is active+undersized+degraded+remapped+wait_backfill, acting [1]

pg 4.af is active+undersized+degraded+remapped, acting [0]

pg 3.ab is active+remapped+wait_backfill, acting [3,2]

pg 4.aa is active+remapped+wait_backfill, acting [0,3]

pg 3.ad is active+remapped+wait_backfill, acting [0,3]

pg 3.b4 is active+undersized+degraded+remapped+wait_backfill, acting [0]

pg 3.b5 is active+undersized+degraded+remapped+wait_backfill, acting [3]

pg 4.b9 is active+remapped+wait_backfill, acting [5,4]

pg 3.be is active+remapped+wait_backfill, acting [0,4]

pg 4.c0 is active+undersized+degraded+remapped+wait_backfill, acting [2]

pg 4.ce is active+undersized+degraded+remapped+wait_backfill, acting [1]

pg 3.d1 is active+remapped+wait_backfill, acting [0,3]

pg 4.d5 is active+undersized+degraded+remapped+wait_backfill, acting [2]

pg 4.d4 is active+undersized+degraded+remapped+wait_backfill, acting [3]

pg 4.da is active+undersized+degraded+remapped+wait_backfill, acting [2]

pg 3.e7 is active+undersized+degraded+remapped+wait_backfill, acting [2]

pg 3.eb is active+remapped+wait_backfill, acting [3,0]

pg 3.ec is active+undersized+degraded+remapped+wait_backfill, acting [2]

pg 4.f6 is active+undersized+degraded+remapped+wait_backfill, acting [2]

pg 3.f6 is active+undersized+degraded+remapped+wait_backfill, acting [2]

recovery 1098780/40253637 objects degraded (2.730%)

recovery 3401433/40253637 objects misplaced (8.450%)

osd.0 is near full at 85%

osd.4 is near full at 90%

mds0: Client integ-hm3 failing to respond to cache pressure(client_id: 733998)

mds0: Client integ-hm8 failing to respond to cache pressure(client_id: 843866)

mds0: Client integ-hm2 failing to respond to cache pressure(client_id: 844939)

mds0: Client integ-hm9 failing to respond to cache pressure(client_id: 845065)

mds0: Client integ-hm5 failing to respond to cache pressure(client_id: 845068)

mds0: Client integ-hm9-bkp failing to respond to cache pressure(client_id: 895898)

mds0: Client me-build1-bkp failing to respond to cache pressure(client_id: 888666)





hm ~]# ceph osd tree

ID WEIGHT   TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY

-1 22.92604 root default                                          

-2  3.29749     host intcfs-osd1                                  

0  3.29749         osd.0             up  1.00000          1.00000

-3  3.26869     host intcfs-osd2                                  

1  3.26869         osd.1             up  1.00000          1.00000

-4  3.27339     host intcfs-osd3                                  

2  3.27339         osd.2             up  1.00000          1.00000

-5  3.24089     host intcfs-osd4                                  

3  3.24089         osd.3             up  1.00000          1.00000

-6  3.24089     host intcfs-osd5                                  

4  3.24089         osd.4             up  1.00000          1.00000

-7  3.32669     host intcfs-osd6                                  

5  3.32669         osd.5             up  1.00000          1.00000

-8  3.27800     host intcfs-osd7                                  

6  3.27800         osd.6             up  1.00000          1.00000





hm5 ~]# ceph osd df

ID WEIGHT  REWEIGHT SIZE   USE    AVAIL %USE  VAR  PGS

0 3.29749  1.00000  3376G  2874G  502G 85.13 1.26 165

1 3.26869  1.00000  3347G  1922G 1424G 57.44 0.85 152

2 3.27339  1.00000  3351G  2009G 1342G 59.95 0.89 162

3 3.24089  1.00000  3318G  2130G 1188G 64.19 0.95 168

4 3.24089  1.00000  3318G  2996G  321G 90.30 1.34 176

5 3.32669  1.00000  3406G  2465G  940G 72.39 1.07 165

6 3.27800  1.00000  3356G  1435G 1921G 42.76 0.63 166

              TOTAL 23476G 15834G 7641G 67.45         

MIN/MAX VAR: 0.63/1.34  STDDEV: 15.29





Regards

Prabu GJ



_______________________________________________

ceph-users mailing list

ceph-users at lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com









-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171112/5b7a2ac7/attachment.html>


More information about the ceph-users mailing list