[ceph-users] RocksDB and WAL migration to new block device

Igor Fedotov ifedotov at suse.de
Tue Nov 20 09:54:42 PST 2018


FYI: https://github.com/ceph/ceph/pull/25187


On 11/20/2018 8:13 PM, Igor Fedotov wrote:
>
> On 11/20/2018 7:05 PM, Florian Engelmann wrote:
>> Am 11/20/18 um 4:59 PM schrieb Igor Fedotov:
>>>
>>>
>>> On 11/20/2018 6:42 PM, Florian Engelmann wrote:
>>>> Hi Igor,
>>>>
>>>>>
>>>>> what's your Ceph version?
>>>>
>>>> 12.2.8 (SES 5.5 - patched to the latest version)
>>>>
>>>>>
>>>>> Can you also check the output for
>>>>>
>>>>> ceph-bluestore-tool show-label -p <path to osd>
>>>>
>>>> ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
>>>> infering bluefs devices from bluestore path
>>>> {
>>>>     "/var/lib/ceph/osd/ceph-0//block": {
>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>         "size": 8001457295360,
>>>>         "btime": "2018-06-29 23:43:12.088842",
>>>>         "description": "main",
>>>>         "bluefs": "1",
>>>>         "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
>>>>         "kv_backend": "rocksdb",
>>>>         "magic": "ceph osd volume v026",
>>>>         "mkfs_done": "yes",
>>>>         "ready": "ready",
>>>>         "whoami": "0"
>>>>     },
>>>>     "/var/lib/ceph/osd/ceph-0//block.wal": {
>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>         "size": 524288000,
>>>>         "btime": "2018-06-29 23:43:12.098690",
>>>>         "description": "bluefs wal"
>>>>     },
>>>>     "/var/lib/ceph/osd/ceph-0//block.db": {
>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>         "size": 524288000,
>>>>         "btime": "2018-06-29 23:43:12.098023",
>>>>         "description": "bluefs db"
>>>>     }
>>>> }
>>>>
>>>>
>>>>>
>>>>>
>>>>> It should report 'size' labels for every volume, please check they 
>>>>> contain new values.
>>>>>
>>>>
>>>> That's exactly the problem, whether "ceph-bluestore-tool 
>>>> show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did 
>>>> recognize the new sizes. But we are 100% sure the new devices are 
>>>> used as we already deleted the old once...
>>>>
>>>> We tried to delete the "key" "size" to add one with the new value but:
>>>>
>>>> ceph-bluestore-tool rm-label-key --dev 
>>>> /var/lib/ceph/osd/ceph-0/block.db -k size
>>>> key 'size' not present
>>>>
>>>> even if:
>>>>
>>>> ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db
>>>> {
>>>>     "/var/lib/ceph/osd/ceph-0/block.db": {
>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>         "size": 524288000,
>>>>         "btime": "2018-06-29 23:43:12.098023",
>>>>         "description": "bluefs db"
>>>>     }
>>>> }
>>>>
>>>> So it looks like the key "size" is "read-only"?
>>>>
>>> There was a bug in updating specific keys, see
>>> https://github.com/ceph/ceph/pull/24352
>>>
>>> This PR also eliminates the need to set sizes manually on bdev-expand.
>>>
>>> I thought it had been backported to Luminous but it looks like it 
>>> doesn't.
>>> Will submit a PR shortly.
>>>
>>>
>>
>> Thank you so much Igor! So we have to decide how to proceed. Maybe 
>> you could help us here as well.
>>
>> Option A: Wait for this fix to be available. -> could last weeks or 
>> even months
> if you can build a custom version of ceph_bluestore_tool then this is 
> a short path. I'll submit a patch today or tomorrow which you need to 
> integrate into your private build.
> Then you need to upgrade just the tool and apply new sizes.
>
>>
>> Option B: Recreate OSDs "one-by-one". -> will take a very long time 
>> as well
> No need for that IMO.
>>
>> Option C: There is some "lowlevel" commad allowing us to fix those 
>> sizes?
> Well hex editor might help here as well. What you need is just to 
> update 64bit size value in block.db and block.wal files. In my lab I 
> can find it at offset 0x52. Most probably this is the fixed location 
> but it's better to check beforehand - existing value should contain 
> value corresponding to the one reported with show-label. Or I can do 
> that for you - please send the  first 4K chunks to me along with 
> corresponding label report.
> Then update with new values - the field has to contain exactly the 
> same size as your new partition.
>
>>
>>
>>>
>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Igor
>>>>>
>>>>>
>>>>> On 11/20/2018 5:29 PM, Florian Engelmann wrote:
>>>>>> Hi,
>>>>>>
>>>>>> today we migrated all of our rocksdb and wal devices to new once. 
>>>>>> The new once are much bigger (500MB for wal/db -> 60GB db and 2G 
>>>>>> WAL) and LVM based.
>>>>>>
>>>>>> We migrated like:
>>>>>>
>>>>>>     export OSD=x
>>>>>>
>>>>>>     systemctl stop ceph-osd@$OSD
>>>>>>
>>>>>>     lvcreate -n db-osd$OSD -L60g data || exit 1
>>>>>>     lvcreate -n wal-osd$OSD -L2g data || exit 1
>>>>>>
>>>>>>     dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal 
>>>>>> of=/dev/data/wal-osd$OSD bs=1M || exit 1
>>>>>>     dd if=/var/lib/ceph/osd/ceph-$OSD/block.db 
>>>>>> of=/dev/data/db-osd$OSD bs=1M  || exit 1
>>>>>>
>>>>>>     rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
>>>>>>     rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
>>>>>>     ln -vs /dev/data/db-osd$OSD 
>>>>>> /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
>>>>>>     ln -vs /dev/data/wal-osd$OSD 
>>>>>> /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
>>>>>>
>>>>>>
>>>>>>     chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
>>>>>>     chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
>>>>>>     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || 
>>>>>> exit 1
>>>>>>     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal || 
>>>>>> exit 1
>>>>>>
>>>>>>
>>>>>>     ceph-bluestore-tool bluefs-bdev-expand --path 
>>>>>> /var/lib/ceph/osd/ceph-$OSD/ || exit 1
>>>>>>
>>>>>>     systemctl start ceph-osd@$OSD
>>>>>>
>>>>>>
>>>>>> Everything went fine but it looks like the db and wal size is 
>>>>>> still the old one:
>>>>>>
>>>>>> ceph daemon osd.0 perf dump|jq '.bluefs'
>>>>>> {
>>>>>>   "gift_bytes": 0,
>>>>>>   "reclaim_bytes": 0,
>>>>>>   "db_total_bytes": 524279808,
>>>>>>   "db_used_bytes": 330301440,
>>>>>>   "wal_total_bytes": 524283904,
>>>>>>   "wal_used_bytes": 69206016,
>>>>>>   "slow_total_bytes": 320058949632,
>>>>>>   "slow_used_bytes": 13606322176,
>>>>>>   "num_files": 220,
>>>>>>   "log_bytes": 44204032,
>>>>>>   "log_compactions": 0,
>>>>>>   "logged_bytes": 31145984,
>>>>>>   "files_written_wal": 1,
>>>>>>   "files_written_sst": 1,
>>>>>>   "bytes_written_wal": 37753489,
>>>>>>   "bytes_written_sst": 238992
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Even if the new block devices are recognized correctly:
>>>>>>
>>>>>> 2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200 
>>>>>> /var/lib/ceph/osd/ceph-0/block.db) open size 64424509440 
>>>>>> (0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational
>>>>>> 2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs 
>>>>>> add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db 
>>>>>> size 60GiB
>>>>>>
>>>>>>
>>>>>> 2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600 
>>>>>> /var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648 
>>>>>> (0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational
>>>>>> 2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs 
>>>>>> add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal 
>>>>>> size 2GiB
>>>>>>
>>>>>>
>>>>>> Are we missing some command to "notify" rocksdb about the new 
>>>>>> device size?
>>>>>>
>>>>>> All the best,
>>>>>> Florian
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users at lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users at lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users at lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



More information about the ceph-users mailing list