[ceph-users] Some pgs stuck unclean in active+remapped state

Burkhard Linke Burkhard.Linke at computational.bio.uni-giessen.de
Mon Nov 19 04:22:45 PST 2018


Hi,

On 11/19/18 12:49 PM, Thomas Klute wrote:
> Hi,
>
> we have a production cluster (3 nodes) stuck unclean after we had to
> replace one osd.
> Cluster recovered fine except some pgs that are stuck unclean for about
> 2-3 days now:


*snipsnap*


> [root at ceph1 ~]# fgrep remapp /tmp/pgdump.txt
> 3.83    5423    0       0       5423    0       22046870528     3065
> 3065    active+remapped 2018-11-16 04:08:22.365825      85711'8469810
> 85711:8067280   [5,11]  5       [5,11,13]       5       83827'8450839


This PG is currently running on OSDs 5,11,13 and the reshuffling due to 
replacing the OSD has lead to a problem with crush and getting three 
OSDs following the crush rules. Crush came up with OSDs 5 and 11 for 
this PG; a third OSD is missing.


You only have three nodes, so this is a corner case in the crush 
algorithm and its pseudo random nature. To solve this problem you can 
either add more nodes, or change some of the crush parameters, e.g. the 
number of tries.


Regards,

Burkhard




More information about the ceph-users mailing list