[ceph-users] RocksDB and WAL migration to new block device

Igor Fedotov ifedotov at suse.de
Wed Nov 21 00:34:47 PST 2018


Actually  (given that your devices are already expanded) you don't need 
to expand them once again - one can just update size labels with my new PR.

For new migrations you can use updated bluefs expand command which sets 
size label automatically though.


Thanks,
Igor
On 11/21/2018 11:11 AM, Florian Engelmann wrote:
> Great support Igor!!!! Both thumbs up! We will try to build the tool 
> today and expand those bluefs devices once again.
>
>
> Am 11/20/18 um 6:54 PM schrieb Igor Fedotov:
>> FYI: https://github.com/ceph/ceph/pull/25187
>>
>>
>> On 11/20/2018 8:13 PM, Igor Fedotov wrote:
>>>
>>> On 11/20/2018 7:05 PM, Florian Engelmann wrote:
>>>> Am 11/20/18 um 4:59 PM schrieb Igor Fedotov:
>>>>>
>>>>>
>>>>> On 11/20/2018 6:42 PM, Florian Engelmann wrote:
>>>>>> Hi Igor,
>>>>>>
>>>>>>>
>>>>>>> what's your Ceph version?
>>>>>>
>>>>>> 12.2.8 (SES 5.5 - patched to the latest version)
>>>>>>
>>>>>>>
>>>>>>> Can you also check the output for
>>>>>>>
>>>>>>> ceph-bluestore-tool show-label -p <path to osd>
>>>>>>
>>>>>> ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
>>>>>> infering bluefs devices from bluestore path
>>>>>> {
>>>>>>     "/var/lib/ceph/osd/ceph-0//block": {
>>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>>         "size": 8001457295360,
>>>>>>         "btime": "2018-06-29 23:43:12.088842",
>>>>>>         "description": "main",
>>>>>>         "bluefs": "1",
>>>>>>         "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
>>>>>>         "kv_backend": "rocksdb",
>>>>>>         "magic": "ceph osd volume v026",
>>>>>>         "mkfs_done": "yes",
>>>>>>         "ready": "ready",
>>>>>>         "whoami": "0"
>>>>>>     },
>>>>>>     "/var/lib/ceph/osd/ceph-0//block.wal": {
>>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>>         "size": 524288000,
>>>>>>         "btime": "2018-06-29 23:43:12.098690",
>>>>>>         "description": "bluefs wal"
>>>>>>     },
>>>>>>     "/var/lib/ceph/osd/ceph-0//block.db": {
>>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>>         "size": 524288000,
>>>>>>         "btime": "2018-06-29 23:43:12.098023",
>>>>>>         "description": "bluefs db"
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It should report 'size' labels for every volume, please check 
>>>>>>> they contain new values.
>>>>>>>
>>>>>>
>>>>>> That's exactly the problem, whether "ceph-bluestore-tool 
>>>>>> show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did 
>>>>>> recognize the new sizes. But we are 100% sure the new devices are 
>>>>>> used as we already deleted the old once...
>>>>>>
>>>>>> We tried to delete the "key" "size" to add one with the new value 
>>>>>> but:
>>>>>>
>>>>>> ceph-bluestore-tool rm-label-key --dev 
>>>>>> /var/lib/ceph/osd/ceph-0/block.db -k size
>>>>>> key 'size' not present
>>>>>>
>>>>>> even if:
>>>>>>
>>>>>> ceph-bluestore-tool show-label --dev 
>>>>>> /var/lib/ceph/osd/ceph-0/block.db
>>>>>> {
>>>>>>     "/var/lib/ceph/osd/ceph-0/block.db": {
>>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>>         "size": 524288000,
>>>>>>         "btime": "2018-06-29 23:43:12.098023",
>>>>>>         "description": "bluefs db"
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>> So it looks like the key "size" is "read-only"?
>>>>>>
>>>>> There was a bug in updating specific keys, see
>>>>> https://github.com/ceph/ceph/pull/24352
>>>>>
>>>>> This PR also eliminates the need to set sizes manually on 
>>>>> bdev-expand.
>>>>>
>>>>> I thought it had been backported to Luminous but it looks like it 
>>>>> doesn't.
>>>>> Will submit a PR shortly.
>>>>>
>>>>>
>>>>
>>>> Thank you so much Igor! So we have to decide how to proceed. Maybe 
>>>> you could help us here as well.
>>>>
>>>> Option A: Wait for this fix to be available. -> could last weeks or 
>>>> even months
>>> if you can build a custom version of ceph_bluestore_tool then this 
>>> is a short path. I'll submit a patch today or tomorrow which you 
>>> need to integrate into your private build.
>>> Then you need to upgrade just the tool and apply new sizes.
>>>
>>>>
>>>> Option B: Recreate OSDs "one-by-one". -> will take a very long time 
>>>> as well
>>> No need for that IMO.
>>>>
>>>> Option C: There is some "lowlevel" commad allowing us to fix those 
>>>> sizes?
>>> Well hex editor might help here as well. What you need is just to 
>>> update 64bit size value in block.db and block.wal files. In my lab I 
>>> can find it at offset 0x52. Most probably this is the fixed location 
>>> but it's better to check beforehand - existing value should contain 
>>> value corresponding to the one reported with show-label. Or I can do 
>>> that for you - please send the first 4K chunks to me along with 
>>> corresponding label report.
>>> Then update with new values - the field has to contain exactly the 
>>> same size as your new partition.
>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Igor
>>>>>>>
>>>>>>>
>>>>>>> On 11/20/2018 5:29 PM, Florian Engelmann wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> today we migrated all of our rocksdb and wal devices to new 
>>>>>>>> once. The new once are much bigger (500MB for wal/db -> 60GB db 
>>>>>>>> and 2G WAL) and LVM based.
>>>>>>>>
>>>>>>>> We migrated like:
>>>>>>>>
>>>>>>>>     export OSD=x
>>>>>>>>
>>>>>>>>     systemctl stop ceph-osd@$OSD
>>>>>>>>
>>>>>>>>     lvcreate -n db-osd$OSD -L60g data || exit 1
>>>>>>>>     lvcreate -n wal-osd$OSD -L2g data || exit 1
>>>>>>>>
>>>>>>>>     dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal 
>>>>>>>> of=/dev/data/wal-osd$OSD bs=1M || exit 1
>>>>>>>>     dd if=/var/lib/ceph/osd/ceph-$OSD/block.db 
>>>>>>>> of=/dev/data/db-osd$OSD bs=1M  || exit 1
>>>>>>>>
>>>>>>>>     rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
>>>>>>>>     rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
>>>>>>>>     ln -vs /dev/data/db-osd$OSD 
>>>>>>>> /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
>>>>>>>>     ln -vs /dev/data/wal-osd$OSD 
>>>>>>>> /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
>>>>>>>>
>>>>>>>>
>>>>>>>>     chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
>>>>>>>>     chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
>>>>>>>>     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || 
>>>>>>>> exit 1
>>>>>>>>     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal 
>>>>>>>> || exit 1
>>>>>>>>
>>>>>>>>
>>>>>>>>     ceph-bluestore-tool bluefs-bdev-expand --path 
>>>>>>>> /var/lib/ceph/osd/ceph-$OSD/ || exit 1
>>>>>>>>
>>>>>>>>     systemctl start ceph-osd@$OSD
>>>>>>>>
>>>>>>>>
>>>>>>>> Everything went fine but it looks like the db and wal size is 
>>>>>>>> still the old one:
>>>>>>>>
>>>>>>>> ceph daemon osd.0 perf dump|jq '.bluefs'
>>>>>>>> {
>>>>>>>>   "gift_bytes": 0,
>>>>>>>>   "reclaim_bytes": 0,
>>>>>>>>   "db_total_bytes": 524279808,
>>>>>>>>   "db_used_bytes": 330301440,
>>>>>>>>   "wal_total_bytes": 524283904,
>>>>>>>>   "wal_used_bytes": 69206016,
>>>>>>>>   "slow_total_bytes": 320058949632,
>>>>>>>>   "slow_used_bytes": 13606322176,
>>>>>>>>   "num_files": 220,
>>>>>>>>   "log_bytes": 44204032,
>>>>>>>>   "log_compactions": 0,
>>>>>>>>   "logged_bytes": 31145984,
>>>>>>>>   "files_written_wal": 1,
>>>>>>>>   "files_written_sst": 1,
>>>>>>>>   "bytes_written_wal": 37753489,
>>>>>>>>   "bytes_written_sst": 238992
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> Even if the new block devices are recognized correctly:
>>>>>>>>
>>>>>>>> 2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200 
>>>>>>>> /var/lib/ceph/osd/ceph-0/block.db) open size 64424509440 
>>>>>>>> (0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational
>>>>>>>> 2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs 
>>>>>>>> add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db 
>>>>>>>> size 60GiB
>>>>>>>>
>>>>>>>>
>>>>>>>> 2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600 
>>>>>>>> /var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648 
>>>>>>>> (0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational
>>>>>>>> 2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs 
>>>>>>>> add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal 
>>>>>>>> size 2GiB
>>>>>>>>
>>>>>>>>
>>>>>>>> Are we missing some command to "notify" rocksdb about the new 
>>>>>>>> device size?
>>>>>>>>
>>>>>>>> All the best,
>>>>>>>> Florian
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list
>>>>>>>> ceph-users at lists.ceph.com
>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users at lists.ceph.com
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users at lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>



More information about the ceph-users mailing list