[ceph-users] tunable question

mj lists at merit.unu.edu
Thu Oct 5 01:17:19 PDT 2017


Hi,

For the record, we changed tunables from "hammer" to "optimal", 
yesterday at 14:00, and it finished this morning at 9:00, so rebalancing 
  took 19 hours.

This was on a small ceph cluster, 24 4TB OSDs spread over three hosts, 
connected over 10G ethernet. Total amount of data: 32730 GB used, 56650 
GB / 89380 GB avail

We set noscrub and no-deepscrub during the rebalance, and our VMs 
experienced basically no impact.

MJ


On 10/03/2017 05:37 PM, lists wrote:
> Thanks Jake, for your extensive reply. :-)
> 
> MJ
> 
> On 3-10-2017 15:21, Jake Young wrote:
>>
>> On Tue, Oct 3, 2017 at 8:38 AM lists <lists at merit.unu.edu 
>> <mailto:lists at merit.unu.edu>> wrote:
>>
>>     Hi,
>>
>>     What would make the decision easier: if we knew that we could easily
>>     revert the
>>       > "ceph osd crush tunables optimal"
>>     once it has begun rebalancing data?
>>
>>     Meaning: if we notice that impact is too high, or it will take too 
>> long,
>>     that we could simply again say
>>       > "ceph osd crush tunables hammer"
>>     and the cluster would calm down again?
>>
>>
>> Yes you can revert the tunables back; but it will then move all the 
>> data back where it was, so be prepared for that.
>>
>> Verify you have the following values in ceph.conf. Note that these are 
>> the defaults in Jewel, so if they aren’t defined, you’re probably good:
>> osd_max_backfills=1
>> osd_recovery_threads=1
>>
>> You can try to set these (using ceph —inject) if you notice a large 
>> impact to your client performance:
>> osd_recovery_op_priority=1
>> osd_recovery_max_active=1
>> osd_recovery_threads=1
>>
>> I recall this tunables change when we went from hammer to jewel last 
>> year. It took over 24 hours to rebalance 122TB on our 110 osd  cluster.
>>
>> Jake
>>
>>
>>
>>     MJ
>>
>>     On 2-10-2017 9:41, Manuel Lausch wrote:
>>      > Hi,
>>      >
>>      > We have similar issues.
>>      > After upgradeing from hammer to jewel the tunable "choose leave
>>     stabel"
>>      > was introduces. If we activate it nearly all data will be 
>> moved. The
>>      > cluster has 2400 OSD on 40 nodes over two datacenters and is
>>     filled with
>>      > 2,5 PB Data.
>>      >
>>      > We tried to enable it but the backfillingtraffic is to high to be
>>      > handled without impacting other services on the Network.
>>      >
>>      > Do someone know if it is neccessary to enable this tunable? And 
>> could
>>      > it be a problem in the future if we want to upgrade to newer 
>> versions
>>      > wihout it enabled?
>>      >
>>      > Regards,
>>      > Manuel Lausch
>>      >
>>     _______________________________________________
>>     ceph-users mailing list
>>     ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


More information about the ceph-users mailing list