[ceph-users] How to repair active+clean+inconsistent?

K.C. Wong kcwong at verseon.com
Sun Nov 11 23:02:34 PST 2018


Thanks, Ashley.

Should I expect the deep-scrubbing to start immediately?

[root at mgmt01 ~]# ceph pg deep-scrub 1.65
instructing pg 1.65 on osd.62 to deep-scrub
[root at mgmt01 ~]# ceph pg ls deep_scrub
pg_stat	objects	mip	degr	misp	unf	bytes	log	disklog	state	state_stamp	v	reported	up	up_primary	acting	acting_primary	last_scrub	scrub_stamp	last_deep_scrub	deep_scrub_stamp
16.75	430657	0	0	0	0	30754735820	3007	3007	active+clean+scrubbing+deep	2018-11-11 11:05:11.572325	39934'549067	39934:1311893	[4,64,35]	4	[4,64,35]	4	28743'539264	2018-11-07 02:17:53.293336	28743'539264	2018-11-03 14:39:44.837702
16.86	430617	0	0	0	0	30316842298	3048	3048	active+clean+scrubbing+deep	2018-11-11 15:56:30.148527	39934'548012	39934:1038058	[18,2,62]	18	[18,2,62]	18	26347'529815	2018-10-28 01:06:55.526624	26347'529815	2018-10-28 01:06:55.526624
16.eb	432196	0	0	0	0	30612459543	3071	3071	active+clean+scrubbing+deep	2018-11-11 11:02:46.993022	39934'550340	39934:3662047	[56,44,42]	56	[56,44,42]	56	28507'540255	2018-11-02 03:28:28.013949	28507'540255	2018-11-02 03:28:28.013949
16.f3	431399	0	0	0	0	30672009253	3067	3067	active+clean+scrubbing+deep	2018-11-11 17:40:55.732162	39934'549240	39934:2212192	[69,82,6]	69	[69,82,6]	69	28743'539336	2018-11-02 17:22:05.745972	28743'539336	2018-11-02 17:22:05.745972
16.f7	430885	0	0	0	0	30796505272	3100	3100	active+clean+scrubbing+deep	2018-11-11 22:50:05.231599	39934'548910	39934:683169	[59,63,119]	59	[59,63,119]	59	28743'539167	2018-11-03 07:24:43.776341	26347'530830	2018-10-28 04:44:12.276982
16.14c	430565	0	0	0	0	31177011073	3042	3042	active+clean+scrubbing+deep	2018-11-11 20:11:31.107313	39934'550564	39934:1545200	[41,12,70]	41	[41,12,70]	41	28743'540758	2018-11-03 23:04:49.155741	28743'540758	2018-11-03 23:04:49.155741
16.156	430356	0	0	0	0	31021738479	3006	3006	active+clean+scrubbing+deep	2018-11-11 20:44:14.019537	39934'549241	39934:2958053	[83,47,1]	83	[83,47,1]	83	28743'539462	2018-11-04 14:46:56.890822	28743'539462	2018-11-04 14:46:56.890822
16.19f	431613	0	0	0	0	30746145827	3063	3063	active+clean+scrubbing+deep	2018-11-11 19:06:40.693002	39934'549429	39934:1189872	[14,54,37]	14	[14,54,37]	14	28743'539660	2018-11-04 18:25:13.225962	26347'531345	2018-10-28 20:08:45.286421
16.1b1	431225	0	0	0	0	30988996529	3048	3048	active+clean+scrubbing+deep	2018-11-11 20:12:35.367935	39934'549604	39934:778127	[34,106,11]	34	[34,106,11]	34	26347'531560	2018-10-27 16:49:46.944748	26347'531560	2018-10-27 16:49:46.944748
16.1e2	431724	0	0	0	0	30247732969	3070	3070	active+clean+scrubbing+deep	2018-11-11 20:55:17.591646	39934'550105	39934:1428341	[103,48,3]	103	[103,48,3]	103	28743'540270	2018-11-06 03:36:30.531106	28507'539840	2018-11-02 01:08:23.268409
16.1f3	430604	0	0	0	0	30633545866	3039	3039	active+clean+scrubbing+deep	2018-11-11 20:15:28.557464	39934'548804	39934:1354817	[66,102,33]	66	[66,102,33]	66	28743'538896	2018-11-04 04:59:33.118414	28743'538896	2018-11-04 04:59:33.118414
[root at mgmt01 ~]# ceph pg ls inconsistent
pg_stat	objects	mip	degr	misp	unf	bytes	log	disklog	state	state_stamp	v	reported	up	up_primary	acting	acting_primary	last_scrub	scrub_stamp	last_deep_scrub	deep_scrub_stamp
1.65	12806	0	0	0	0	30010463024	3008	3008	active+clean+inconsistent	2018-11-10 00:16:43.965966	39934'184512	39934:388820	[62,67,47]	62	[62,67,47]	62	28743'183853	2018-11-04 01:31:27.042458	28743'183853	2018-11-04 01:31:27.042458

It’s similar to when I issued “ceph pg repair 1.65”, instructing
osd.62 to repair 1.65, and then nothing seems to happen.

-kc

K.C. Wong
kcwong at verseon.com <mailto:kcwong at verseon.com>
M: +1 (408) 769-8235

-----------------------------------------------------
Confidentiality Notice:
This message contains confidential information. If you are not the
intended recipient and received this message in error, any use or
distribution is strictly prohibited. Please also notify us
immediately by return e-mail, and delete this message from your
computer system. Thank you.
-----------------------------------------------------
4096R/B8995EDE <https://sks-keyservers.net/pks/lookup?op=get&search=0x23A692E9B8995EDE>  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
hkps://hkps.pool.sks-keyservers.net

> On Nov 11, 2018, at 10:22 PM, Ashley Merrick <singapore at amerrick.co.uk> wrote:
> 
> Your need to run "ceph pg deep-scrub 1.65" first
> 
> On Mon, Nov 12, 2018 at 2:20 PM K.C. Wong <kcwong at verseon.com <mailto:kcwong at verseon.com>> wrote:
> Hi Brad,
> 
> I got the following:
> 
> [root at mgmt01 ~]# ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
> pg 1.65 is active+clean+inconsistent, acting [62,67,47]
> 1 scrub errors
> [root at mgmt01 ~]# rados list-inconsistent-obj 1.65
> No scrub information available for pg 1.65
> error 2: (2) No such file or directory
> [root at mgmt01 ~]# rados list-inconsistent-snapset 1.65
> No scrub information available for pg 1.65
> error 2: (2) No such file or directory
> 
> Rather odd output, I’d say; not that I understand what
> that means. I also tried ceph list-inconsistent-pg:
> 
> [root at mgmt01 ~]# rados lspools
> rbd
> cephfs_data
> cephfs_metadata
> .rgw.root
> default.rgw.control
> default.rgw.data.root
> default.rgw.gc
> default.rgw.log
> ctrl-p
> prod
> corp
> camp
> dev
> default.rgw.users.uid
> default.rgw.users.keys
> default.rgw.buckets.index
> default.rgw.buckets.data
> default.rgw.buckets.non-ec
> [root at mgmt01 ~]# for i in $(rados lspools); do rados list-inconsistent-pg $i; done
> []
> ["1.65"]
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
> 
> So, that’d put the inconsistency in the cephfs_data pool.
> 
> Thank you for your help,
> 
> -kc
> 
> K.C. Wong
> kcwong at verseon.com <mailto:kcwong at verseon.com>
> M: +1 (408) 769-8235
> 
> -----------------------------------------------------
> Confidentiality Notice:
> This message contains confidential information. If you are not the
> intended recipient and received this message in error, any use or
> distribution is strictly prohibited. Please also notify us
> immediately by return e-mail, and delete this message from your
> computer system. Thank you.
> -----------------------------------------------------
> 4096R/B8995EDE <https://sks-keyservers.net/pks/lookup?op=get&search=0x23A692E9B8995EDE>  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
> hkps://hkps.pool.sks-keyservers.net <>
> 
>> On Nov 11, 2018, at 5:43 PM, Brad Hubbard <bhubbard at redhat.com <mailto:bhubbard at redhat.com>> wrote:
>> 
>> What does "rados list-inconsistent-obj <pg>" say?
>> 
>> Note that you may have to do a deep scrub to populate the output.
>> On Mon, Nov 12, 2018 at 5:10 AM K.C. Wong <kcwong at verseon.com <mailto:kcwong at verseon.com>> wrote:
>>> 
>>> Hi folks,
>>> 
>>> I would appreciate any pointer as to how I can resolve a
>>> PG stuck in “active+clean+inconsistent” state. This has
>>> resulted in HEALTH_ERR status for the last 5 days with no
>>> end in sight. The state got triggered when one of the drives
>>> in the PG returned I/O error. I’ve since replaced the failed
>>> drive.
>>> 
>>> I’m running Jewel (out of centos-release-ceph-jewel) on
>>> CentOS 7. I’ve tried “ceph pg repair <pg>” and it didn’t seem
>>> to do anything. I’ve tried even more drastic measures such as
>>> comparing all the files (using filestore) under that PG_head
>>> on all 3 copies and then nuking the outlier. Nothing worked.
>>> 
>>> Many thanks,
>>> 
>>> -kc
>>> 
>>> K.C. Wong
>>> kcwong at verseon.com <mailto:kcwong at verseon.com>
>>> M: +1 (408) 769-8235
>>> 
>>> -----------------------------------------------------
>>> Confidentiality Notice:
>>> This message contains confidential information. If you are not the
>>> intended recipient and received this message in error, any use or
>>> distribution is strictly prohibited. Please also notify us
>>> immediately by return e-mail, and delete this message from your
>>> computer system. Thank you.
>>> -----------------------------------------------------
>>> 4096R/B8995EDE  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
>>> hkps://hkps.pool.sks-keyservers.net <http://hkps.pool.sks-keyservers.net/>
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> 
>> 
>> 
>> --
>> Cheers,
>> Brad
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181111/67237094/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181111/67237094/attachment.sig>


More information about the ceph-users mailing list