[ceph-users] OSD is near full and slow in accessing storage from client

Sébastien VIGNERON sebastien.vigneron at criann.fr
Sun Nov 12 01:34:02 PST 2017


Hi,

Can you share:
 - your placement rules: ceph osd crush rule dump
 - your CEPH version: ceph versions
 - your pools definitions: ceph osd pool ls detail

With these we can determine is your pgs are stuck because of a misconfiguration or something else.

You seems to have some undersized pgs and a recovery in process. Does your OSDs showed some rebalance of your datas? Does your OSDs use percentage change over time? (changes in "ceph osd df")

Cordialement / Best regards,

Sébastien VIGNERON 
CRIANN, 
Ingénieur / Engineer
Technopôle du Madrillet 
745, avenue de l'Université 
76800 Saint-Etienne du Rouvray - France 
tél. +33 2 32 91 42 91 
fax. +33 2 32 91 42 92 
http://www.criann.fr 
mailto:sebastien.vigneron at criann.fr
support: support at criann.fr

> Le 12 nov. 2017 à 10:04, gjprabu <gjprabu at zohocorp.com> a écrit :
> 
> Hi Team,
> 
>          We have ceph setup with 6 OSD and we got alert with 2 OSD is near full . We faced issue like slow in accessing ceph from client. So i have added 7th OSD and still 2 OSD is showing near full ( OSD.0 and OSD.4) , I have restarted ceph service in osd.0 and osd.4 .  Kindly check the below ceph osd status and please provide us the solutions. 
> 
> 
> # ceph health detail
> HEALTH_WARN 46 pgs backfill_wait; 1 pgs backfilling; 32 pgs degraded; 50 pgs stuck unclean; 32 pgs undersized; recovery 1098780/40253637 objects degraded (2.730%); recovery 3401433/40253637 objects misplaced (8.450%); 2 near full osd(s); mds0: Client integ-hm3 failing to respond to cache pressure; mds0: Client integ-hm8 failing to respond to cache pressure; mds0: Client integ-hm2 failing to respond to cache pressure; mds0: Client integ-hm9 failing to respond to cache pressure; mds0: Client integ-hm5 failing to respond to cache pressure; mds0: Client integ-hm9-bkp failing to respond to cache pressure; mds0: Client me-build1-bkp failing to respond to cache pressure
> 
> pg 3.f6 is stuck unclean for 511223.069161, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
> pg 4.f6 is stuck unclean for 511232.770419, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
> pg 3.ec is stuck unclean for 510902.815668, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
> pg 3.eb is stuck unclean for 511285.576487, current state active+remapped+wait_backfill, last acting [3,0]
> pg 4.17 is stuck unclean for 511235.326709, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
> pg 4.2f is stuck unclean for 511232.356371, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
> pg 4.3d is stuck unclean for 511300.446982, current state active+remapped, last acting [3,0]
> pg 4.93 is stuck unclean for 511295.539229, current state active+undersized+degraded+remapped+wait_backfill, last acting [3]
> pg 3.47 is stuck unclean for 511288.104965, current state active+remapped+wait_backfill, last acting [3,0]
> pg 4.d5 is stuck unclean for 510916.509825, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
> pg 3.31 is stuck unclean for 511221.542878, current state active+remapped+wait_backfill, last acting [0,3]
> pg 3.62 is stuck unclean for 511221.551662, current state active+undersized+degraded+remapped+wait_backfill, last acting [4]
> pg 4.4d is stuck unclean for 511232.279602, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
> pg 4.48 is stuck unclean for 510911.095367, current state active+remapped+wait_backfill, last acting [5,4]
> pg 3.4f is stuck unclean for 511226.712285, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
> pg 3.78 is stuck unclean for 511221.531199, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
> pg 3.24 is stuck unclean for 510903.483324, current state active+remapped+backfilling, last acting [1,2]
> pg 4.8c is stuck unclean for 511231.668693, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
> pg 3.b4 is stuck unclean for 511222.612012, current state active+undersized+degraded+remapped+wait_backfill, last acting [0]
> pg 4.41 is stuck unclean for 511287.031264, current state active+remapped+wait_backfill, last acting [3,2]
> pg 3.d1 is stuck unclean for 510903.797329, current state active+remapped+wait_backfill, last acting [0,3]
> pg 3.7f is stuck unclean for 511222.929722, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
> pg 4.af is stuck unclean for 511262.494659, current state active+undersized+degraded+remapped, last acting [0]
> pg 3.66 is stuck unclean for 510903.296711, current state active+remapped+wait_backfill, last acting [3,0]
> pg 3.76 is stuck unclean for 511224.615144, current state active+undersized+degraded+remapped+wait_backfill, last acting [3]
> pg 4.57 is stuck unclean for 511234.514343, current state active+remapped, last acting [0,4]
> pg 3.69 is stuck unclean for 511224.672085, current state active+undersized+degraded+remapped+wait_backfill, last acting [4]
> pg 3.9a is stuck unclean for 510967.300000, current state active+remapped+wait_backfill, last acting [3,2]
> pg 4.50 is stuck unclean for 510903.825565, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
> pg 4.53 is stuck unclean for 510921.975268, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
> pg 3.e7 is stuck unclean for 511221.530592, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
> pg 4.6a is stuck unclean for 510911.284877, current state active+undersized+degraded+remapped+wait_backfill, last acting [0]
> pg 4.16 is stuck unclean for 511232.702762, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
> pg 3.2c is stuck unclean for 511222.443893, current state active+remapped+wait_backfill, last acting [2,3]
> pg 4.89 is stuck unclean for 511228.846614, current state active+undersized+degraded+remapped+wait_backfill, last acting [4]
> pg 4.39 is stuck unclean for 511239.544231, current state active+remapped+wait_backfill, last acting [3,2]
> pg 4.ce is stuck unclean for 511232.294586, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
> pg 3.91 is stuck unclean for 511232.341380, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
> pg 3.96 is stuck unclean for 510904.043900, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
> pg 4.c0 is stuck unclean for 510904.253281, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
> pg 4.9c is stuck unclean for 511237.612850, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
> pg 3.ab is stuck unclean for 510960.756324, current state active+remapped+wait_backfill, last acting [3,2]
> pg 4.aa is stuck unclean for 511229.307559, current state active+remapped+wait_backfill, last acting [0,3]
> pg 3.ad is stuck unclean for 510903.764157, current state active+remapped+wait_backfill, last acting [0,3]
> pg 3.b5 is stuck unclean for 511226.560774, current state active+undersized+degraded+remapped+wait_backfill, last acting [3]
> pg 4.58 is stuck unclean for 510919.273667, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
> pg 4.b9 is stuck unclean for 511232.760066, current state active+remapped+wait_backfill, last acting [5,4]
> pg 3.be is stuck unclean for 511224.422931, current state active+remapped+wait_backfill, last acting [0,4]
> pg 4.d4 is stuck unclean for 510962.810416, current state active+undersized+degraded+remapped+wait_backfill, last acting [3]
> pg 4.da is stuck unclean for 511259.506962, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
> pg 4.8c is active+undersized+degraded+remapped+wait_backfill, acting [1]
> pg 3.7f is active+undersized+degraded+remapped+wait_backfill, acting [1]
> pg 3.78 is active+undersized+degraded+remapped+wait_backfill, acting [2]
> pg 3.76 is active+undersized+degraded+remapped+wait_backfill, acting [3]
> pg 4.6a is active+undersized+degraded+remapped+wait_backfill, acting [0]
> pg 3.69 is active+undersized+degraded+remapped+wait_backfill, acting [4]
> pg 3.66 is active+remapped+wait_backfill, acting [3,0]
> pg 3.62 is active+undersized+degraded+remapped+wait_backfill, acting [4]
> pg 4.58 is active+undersized+degraded+remapped+wait_backfill, acting [1]
> pg 4.50 is active+undersized+degraded+remapped+wait_backfill, acting [1]
> pg 4.53 is active+undersized+degraded+remapped+wait_backfill, acting [2]
> pg 3.4f is active+undersized+degraded+remapped+wait_backfill, acting [1]
> pg 4.48 is active+remapped+wait_backfill, acting [5,4]
> pg 4.4d is active+undersized+degraded+remapped+wait_backfill, acting [2]
> pg 3.47 is active+remapped+wait_backfill, acting [3,0]
> pg 4.41 is active+remapped+wait_backfill, acting [3,2]
> pg 3.31 is active+remapped+wait_backfill, acting [0,3]
> pg 4.2f is active+undersized+degraded+remapped+wait_backfill, acting [2]
> pg 3.24 is active+remapped+backfilling, acting [1,2]
> pg 4.17 is active+undersized+degraded+remapped+wait_backfill, acting [1]
> pg 4.16 is active+undersized+degraded+remapped+wait_backfill, acting [1]
> pg 3.2c is active+remapped+wait_backfill, acting [2,3]
> pg 4.39 is active+remapped+wait_backfill, acting [3,2]
> pg 4.89 is active+undersized+degraded+remapped+wait_backfill, acting [4]
> pg 3.91 is active+undersized+degraded+remapped+wait_backfill, acting [2]
> pg 4.93 is active+undersized+degraded+remapped+wait_backfill, acting [3]
> pg 3.96 is active+undersized+degraded+remapped+wait_backfill, acting [2]
> pg 3.9a is active+remapped+wait_backfill, acting [3,2]
> pg 4.9c is active+undersized+degraded+remapped+wait_backfill, acting [1]
> pg 4.af is active+undersized+degraded+remapped, acting [0]
> pg 3.ab is active+remapped+wait_backfill, acting [3,2]
> pg 4.aa is active+remapped+wait_backfill, acting [0,3]
> pg 3.ad is active+remapped+wait_backfill, acting [0,3]
> pg 3.b4 is active+undersized+degraded+remapped+wait_backfill, acting [0]
> pg 3.b5 is active+undersized+degraded+remapped+wait_backfill, acting [3]
> pg 4.b9 is active+remapped+wait_backfill, acting [5,4]
> pg 3.be is active+remapped+wait_backfill, acting [0,4]
> pg 4.c0 is active+undersized+degraded+remapped+wait_backfill, acting [2]
> pg 4.ce is active+undersized+degraded+remapped+wait_backfill, acting [1]
> pg 3.d1 is active+remapped+wait_backfill, acting [0,3]
> pg 4.d5 is active+undersized+degraded+remapped+wait_backfill, acting [2]
> pg 4.d4 is active+undersized+degraded+remapped+wait_backfill, acting [3]
> pg 4.da is active+undersized+degraded+remapped+wait_backfill, acting [2]
> pg 3.e7 is active+undersized+degraded+remapped+wait_backfill, acting [2]
> pg 3.eb is active+remapped+wait_backfill, acting [3,0]
> pg 3.ec is active+undersized+degraded+remapped+wait_backfill, acting [2]
> pg 4.f6 is active+undersized+degraded+remapped+wait_backfill, acting [2]
> pg 3.f6 is active+undersized+degraded+remapped+wait_backfill, acting [2]
> recovery 1098780/40253637 objects degraded (2.730%)
> recovery 3401433/40253637 objects misplaced (8.450%)
> osd.0 is near full at 85%
> osd.4 is near full at 90%
> mds0: Client integ-hm3 failing to respond to cache pressure(client_id: 733998)
> mds0: Client integ-hm8 failing to respond to cache pressure(client_id: 843866)
> mds0: Client integ-hm2 failing to respond to cache pressure(client_id: 844939)
> mds0: Client integ-hm9 failing to respond to cache pressure(client_id: 845065)
> mds0: Client integ-hm5 failing to respond to cache pressure(client_id: 845068)
> mds0: Client integ-hm9-bkp failing to respond to cache pressure(client_id: 895898)
> mds0: Client me-build1-bkp failing to respond to cache pressure(client_id: 888666)
> 
> 
> hm ~]# ceph osd tree
> ID WEIGHT   TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 22.92604 root default                                          
> -2  3.29749     host intcfs-osd1                                  
> 0  3.29749         osd.0             up  1.00000          1.00000
> -3  3.26869     host intcfs-osd2                                  
> 1  3.26869         osd.1             up  1.00000          1.00000
> -4  3.27339     host intcfs-osd3                                  
> 2  3.27339         osd.2             up  1.00000          1.00000
> -5  3.24089     host intcfs-osd4                                  
> 3  3.24089         osd.3             up  1.00000          1.00000
> -6  3.24089     host intcfs-osd5                                  
> 4  3.24089         osd.4             up  1.00000          1.00000
> -7  3.32669     host intcfs-osd6                                  
> 5  3.32669         osd.5             up  1.00000          1.00000
> -8  3.27800     host intcfs-osd7                                  
> 6  3.27800         osd.6             up  1.00000          1.00000
> 
> 
> hm5 ~]# ceph osd df
> ID WEIGHT  REWEIGHT SIZE   USE    AVAIL %USE  VAR  PGS
> 0 3.29749  1.00000  3376G  2874G  502G 85.13 1.26 165
> 1 3.26869  1.00000  3347G  1922G 1424G 57.44 0.85 152
> 2 3.27339  1.00000  3351G  2009G 1342G 59.95 0.89 162
> 3 3.24089  1.00000  3318G  2130G 1188G 64.19 0.95 168
> 4 3.24089  1.00000  3318G  2996G  321G 90.30 1.34 176
> 5 3.32669  1.00000  3406G  2465G  940G 72.39 1.07 165
> 6 3.27800  1.00000  3356G  1435G 1921G 42.76 0.63 166
>               TOTAL 23476G 15834G 7641G 67.45         
> MIN/MAX VAR: 0.63/1.34  STDDEV: 15.29
> 
> 
> Regards
> Prabu GJ
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171112/59b3a30c/attachment.html>


More information about the ceph-users mailing list