[ceph-users] fixing another remapped+incomplete EC 4+2 pg

Graham Allan gta at umn.edu
Thu Oct 11 15:22:04 PDT 2018


As the osd crash implies, setting "nobackfill" appears to let all the 
osds keep running and the pg stays active and can apparently serve data.

If I track down the object referenced below in the object store, I can 
download it without error via s3... though as I can't generate a 
matching etag, it may well be corrupt.

Still I do wonder if deleting this object - either via s3, or maybe more 
likely directly within filestore, might permit backfill to continue.

Of other objects referenced in "last_backfill" for each osd in the pg, 
both also download via s3, but one I have seen implicated in similar 
crash logs for otehr OSDs and the etag again does not match; the other I 
have not seen in crash logs and does generate a matching etag.

Opened a tracker issue for this: http://tracker.ceph.com/issues/36411

Graham

On 10/09/2018 06:55 PM, Graham Allan wrote:
> 
> On 10/09/2018 01:14 PM, Graham Allan wrote:
>>
>> On 10/9/2018 12:19 PM, Gregory Farnum wrote:
>>>
>>> I think unfortunately the easiest thing for you to fix this will be 
>>> to set the min_size back to 4 until the PG is recovered (or at least 
>>> has 5 shards done). This will be fixed in a later version of Ceph and 
>>> probably backported, but sadly it's not done yet.
>>> -Greg
> 
>> Thanks Greg, though sadly I've tried that; whatever I do, one of the 4 
>> osds involved will simply crash (not just the ones I previously tried 
>> to re-import via ceph-objectstore-tool). I just spend time chasing 
>> them around but never succeeding in having a complete set run long 
>> enough to make progress. They seem to crash when starting backfill on 
>> the next object. There has to be something in the current set of 
>> shards which it can't handle.
>>
>> Since then I've been focusing on trying to get the pg to revert to an 
>> earlier interval using osd_find_best_info_ignore_history_les, though 
>> the information I find around it is minimal.
> 
> Since my experiments with osd_find_best_info_ignore_history_les have not 
> borne any results, I'm looking again at the osd crashes when I get 
> enough of them running for backfill to start.
> 
> They all crash in the same way; with "debug osd=10", the very last bit is:
> 
>>     -2> 2018-10-09 16:27:25.425004 7faa866bd700 10 osd.190 pg_epoch: 
>> 710808 pg[70.82ds2( v 710799'704745 (586066'698574,710799'704745] 
>> local-lis/les=710807/710808 n=102929 ec=21494/21494 lis/c 
>> 710807/588565 les/c/f 710808/588566/0 710711/710807/710807) 
>> [820,761,105,789,562,485]/[2147483647,2147483647,190,448,61,315]p190(2) r=2 
>> lpr=710807 pi=[588565,710807)/39 rops=1 
>> bft=105(2),485(5),562(4),761(1),789(3),820(0) crt=710799'70
>> 4745 lcod 0'0 mlcod 0'0 
>> active+undersized+degraded+remapped+backfilling] continue_recovery_op: 
>> continuing 
>> RecoveryOp(hoid=70:b415ca12:::default.10341087.1__shadow_Subjects%2fsub-NDARINV6BVVAY29%2fses-baselineYear1Arm1%2ffunc%2frun-04_AROMA%2fdenoised_func_data_nonaggr.nii.2~VfI6rAYnU4XtzUcFAoecJDU5TpVR8AP.16_3:head 
>> v=579167'695462 missing_on=105(2),485(5),562(4),761(1),789(3),820(0) 
>> missing_on_shards=0,1,2,3,4,5 recovery_inf
>> o=ObjectRecoveryInfo(70:b415ca12:::default.10341087.1__shadow_Subjects%2fsub-NDARINV6BVVAY29%2fses-baselineYear1Arm1%2ffunc%2frun-04_AROMA%2fdenoised_func_data_nonaggr.nii.2~VfI6rAYnU4XtzUcFAoecJDU5TpVR8AP.16_3:head at 579167'695462, 
>> size: 0, copy_subset: [], clone_subset: {}, snapset: 0=[]:[]) 
>> recovery_progress=ObjectRecoveryProgress(first, data_recovered_to:0, 
>> data_complete:false, omap_recovered_to:, omap_complete:true, error
>> :false) obc refcount=3 state=READING waiting_on_pushes= 
>> extent_requested=0,8388608)
>>     -1> 2018-10-09 16:27:25.425105 7faa866bd700 10 osd.190 pg_epoch: 
>> 710808 pg[70.82ds2( v 710799'704745 (586066'698574,710799'704745] 
>> local-lis/les=710807/710808 n=102929 ec=21494/21494 lis/c 
>> 710807/588565 les/c/f 710808/588566/0 710711/710807/710807) 
>> [820,761,105,789,562,485]/[2147483647,2147483647,190,448,61,315]p190(2) r=2 
>> lpr=710807 pi=[588565,710807)/39 rops=1 
>> bft=105(2),485(5),562(4),761(1),789(3),820(0) crt=710799'70
>> 4745 lcod 0'0 mlcod 0'0 
>> active+undersized+degraded+remapped+backfilling] continue_recovery_op: 
>> before_progress=ObjectRecoveryProgress(first, data_recovered_to:0, 
>> data_complete:false, omap_recovered_to:, omap_complete:true, 
>> error:false), after_progress=ObjectRecoveryProgress(!first, 
>> data_recovered_to:0, data_complete:true, omap_recovered_to:, 
>> omap_complete:true, error:false), pop.data.length()=786432, size=0
>>      0> 2018-10-09 16:27:25.430239 7faa866bd700 -1 
>> /build/ceph-12.2.8/src/osd/ECBackend.cc: In function 'void 
>> ECBackend::continue_recovery_op(ECBackend::RecoveryOp&, 
>> RecoveryMessages*)' thread 7faa866bd700 time 2018-10-09 16:27:25.425135
>> /build/ceph-12.2.8/src/osd/ECBackend.cc: 619: FAILED 
>> assert(pop.data.length() == 
>> sinfo.aligned_logical_offset_to_chunk_offset( 
>> after_progress.data_recovered_to - 
>> op.recovery_progress.data_recovered_to))
> 
> It sounds like we're getting an unexpected data chunk size, presumably 
> due to some kind of data corruption.
> 
> Though if I search on all acting OSDs (up or down) for the object on 
> which backfill crashed, I do find all shards and they do at least have a 
> consistent size... perhaps that size itself is different from expected.
> 
> osd 233:
>> root at ceph01:/var/lib/ceph/osd/ceph-233/current# find 70.82ds1_head 
>> -name '*NDARINV6BVVAY29*' -type f -ls 23624833062  772 -rw-r--r--   1 
>> root     root       786432 Sep 26 15:32 
>> 70.82ds1_head/DIR_D/DIR_2/DIR_8/DIR_A/DIR_3/default.10341087.1\\u\\ushadow\\uSubjects\\ssub-NDARINV6BVVAY29\\sses-baselineYear1Arm1\\sfunc\\srun-04\\uAROMA\\sdenoised\\ufunc\\udata\\unonaggr.nii.2~VfI6rAYnU4XtzUcFAoecJDU5TpVR8AP.16\\u3__head_4853A82D__46_ffffffffffffffff_1 
>>
> osd 190:
>> root at ceph07:/var/lib/ceph/osd/ceph-190/current# find 70.82ds2_head 
>> -name '*NDARINV6BVVAY29*' -type f -ls 23710021684  772 -rw-r--r--   1 
>> root     root       786432 Jun 27 07:13 
>> 70.82ds2_head/DIR_D/DIR_2/DIR_8/DIR_A/DIR_3/DIR_5/default.10341087.1\\u\\ushadow\\uSubjects\\ssub-NDARINV6BVVAY29\\sses-baselineYear1Arm1\\sfunc\\srun-04\\uAROMA\\sdenoised\\ufunc\\udata\\unonaggr.nii.2~VfI6rAYnU4XtzUcFAoecJDU5TpVR8AP.16\\u3__head_4853A82D__46_ffffffffffffffff_2 
>>
> osd 448:
>> root at ceph09:/var/lib/ceph/osd/ceph-448/current# find 70.82ds3_head 
>> -name '*NDARINV6BVVAY29*' -type f -ls 3259920428  772 -rw-r--r--   1 
>> root     root       786432 Jun 27 07:13 
>> 70.82ds3_head/DIR_D/DIR_2/DIR_8/DIR_A/DIR_3/DIR_5/default.10341087.1\\u\\ushadow\\uSubjects\\ssub-NDARINV6BVVAY29\\sses-baselineYear1Arm1\\sfunc\\srun-04\\uAROMA\\sdenoised\\ufunc\\udata\\unonaggr.nii.2~VfI6rAYnU4XtzUcFAoecJDU5TpVR8AP.16\\u3__head_4853A82D__46_ffffffffffffffff_3 
>>
> osd 315:
>> root at ceph03:/var/lib/ceph/osd/ceph-315/current# find 70.82ds5_head 
>> -name '*NDARINV6BVVAY29*' -type f -ls 64439371813  772 -rw-r--r--   1 
>> root     root       786432 Jun 27 07:13 
>> 70.82ds5_head/DIR_D/DIR_2/DIR_8/DIR_A/DIR_3/DIR_5/default.10341087.1\\u\\ushadow\\uSubjects\\ssub-NDARINV6BVVAY29\\sses-baselineYear1Arm1\\sfunc\\srun-04\\uAROMA\\sdenoised\\ufunc\\udata\\unonaggr.nii.2~VfI6rAYnU4XtzUcFAoecJDU5TpVR8AP.16\\u3__head_4853A82D__46_ffffffffffffffff_5 
>>
> osd 98:
>> root at ceph05:/var/lib/ceph/osd/ceph-98/current# find 70.82ds0_head 
>> -name '*NDARINV6BVVAY29*' -type f -ls 2428799806  768 -rw-r--r--   1 
>> root     root       786432 Jun 27 07:13 
>> 70.82ds0_head/DIR_D/DIR_2/DIR_8/DIR_A/DIR_3/DIR_5/default.10341087.1\\u\\ushadow\\uSubjects\\ssub-NDARINV6BVVAY29\\sses-baselineYear1Arm1\\sfunc\\srun-04\\uAROMA\\sdenoised\\ufunc\\udata\\unonaggr.nii.2~VfI6rAYnU4XtzUcFAoecJDU5TpVR8AP.16\\u3__head_4853A82D__46_ffffffffffffffff_0 
>>
> osd 61:
>> root at ceph04:~# ls -l 
>> /var/lib/ceph/osd/ceph-61/current/70.82ds4_head/DIR_D/DIR_2/DIR_8/DIR_A/DIR_3/DIR_5/*NDARINV6BVVAY29* 
>>
>> -rw-r--r-- 1 root root 786432 Jun 27 07:13 
>> /var/lib/ceph/osd/ceph-61/current/70.82ds4_head/DIR_D/DIR_2/DIR_8/DIR_A/DIR_3/DIR_5/default.10341087.1\u\ushadow\uSubjects\ssub-NDARINV6BVVAY29\sses-baselineYear1Arm1\sfunc\srun-04\uAROMA\sdenoised\ufunc\udata\unonaggr.nii.2~VfI6rAYnU4XtzUcFAoecJDU5TpVR8AP.16\u3__head_4853A82D__46_ffffffffffffffff_4 
>>
> 
> 
> Highly motivated to keep looking for a solution... around 1PB of data in 
> this pool!
> 
> Graham

-- 
Graham Allan
Minnesota Supercomputing Institute - gta at umn.edu


More information about the ceph-users mailing list