[ceph-users] RocksDB and WAL migration to new block device

Florian Engelmann florian.engelmann at everyware.ch
Wed Nov 21 00:11:17 PST 2018


Great support Igor!!!! Both thumbs up! We will try to build the tool 
today and expand those bluefs devices once again.


Am 11/20/18 um 6:54 PM schrieb Igor Fedotov:
> FYI: https://github.com/ceph/ceph/pull/25187
> 
> 
> On 11/20/2018 8:13 PM, Igor Fedotov wrote:
>>
>> On 11/20/2018 7:05 PM, Florian Engelmann wrote:
>>> Am 11/20/18 um 4:59 PM schrieb Igor Fedotov:
>>>>
>>>>
>>>> On 11/20/2018 6:42 PM, Florian Engelmann wrote:
>>>>> Hi Igor,
>>>>>
>>>>>>
>>>>>> what's your Ceph version?
>>>>>
>>>>> 12.2.8 (SES 5.5 - patched to the latest version)
>>>>>
>>>>>>
>>>>>> Can you also check the output for
>>>>>>
>>>>>> ceph-bluestore-tool show-label -p <path to osd>
>>>>>
>>>>> ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
>>>>> infering bluefs devices from bluestore path
>>>>> {
>>>>>     "/var/lib/ceph/osd/ceph-0//block": {
>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>         "size": 8001457295360,
>>>>>         "btime": "2018-06-29 23:43:12.088842",
>>>>>         "description": "main",
>>>>>         "bluefs": "1",
>>>>>         "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
>>>>>         "kv_backend": "rocksdb",
>>>>>         "magic": "ceph osd volume v026",
>>>>>         "mkfs_done": "yes",
>>>>>         "ready": "ready",
>>>>>         "whoami": "0"
>>>>>     },
>>>>>     "/var/lib/ceph/osd/ceph-0//block.wal": {
>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>         "size": 524288000,
>>>>>         "btime": "2018-06-29 23:43:12.098690",
>>>>>         "description": "bluefs wal"
>>>>>     },
>>>>>     "/var/lib/ceph/osd/ceph-0//block.db": {
>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>         "size": 524288000,
>>>>>         "btime": "2018-06-29 23:43:12.098023",
>>>>>         "description": "bluefs db"
>>>>>     }
>>>>> }
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> It should report 'size' labels for every volume, please check they 
>>>>>> contain new values.
>>>>>>
>>>>>
>>>>> That's exactly the problem, whether "ceph-bluestore-tool 
>>>>> show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did 
>>>>> recognize the new sizes. But we are 100% sure the new devices are 
>>>>> used as we already deleted the old once...
>>>>>
>>>>> We tried to delete the "key" "size" to add one with the new value but:
>>>>>
>>>>> ceph-bluestore-tool rm-label-key --dev 
>>>>> /var/lib/ceph/osd/ceph-0/block.db -k size
>>>>> key 'size' not present
>>>>>
>>>>> even if:
>>>>>
>>>>> ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db
>>>>> {
>>>>>     "/var/lib/ceph/osd/ceph-0/block.db": {
>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>         "size": 524288000,
>>>>>         "btime": "2018-06-29 23:43:12.098023",
>>>>>         "description": "bluefs db"
>>>>>     }
>>>>> }
>>>>>
>>>>> So it looks like the key "size" is "read-only"?
>>>>>
>>>> There was a bug in updating specific keys, see
>>>> https://github.com/ceph/ceph/pull/24352
>>>>
>>>> This PR also eliminates the need to set sizes manually on bdev-expand.
>>>>
>>>> I thought it had been backported to Luminous but it looks like it 
>>>> doesn't.
>>>> Will submit a PR shortly.
>>>>
>>>>
>>>
>>> Thank you so much Igor! So we have to decide how to proceed. Maybe 
>>> you could help us here as well.
>>>
>>> Option A: Wait for this fix to be available. -> could last weeks or 
>>> even months
>> if you can build a custom version of ceph_bluestore_tool then this is 
>> a short path. I'll submit a patch today or tomorrow which you need to 
>> integrate into your private build.
>> Then you need to upgrade just the tool and apply new sizes.
>>
>>>
>>> Option B: Recreate OSDs "one-by-one". -> will take a very long time 
>>> as well
>> No need for that IMO.
>>>
>>> Option C: There is some "lowlevel" commad allowing us to fix those 
>>> sizes?
>> Well hex editor might help here as well. What you need is just to 
>> update 64bit size value in block.db and block.wal files. In my lab I 
>> can find it at offset 0x52. Most probably this is the fixed location 
>> but it's better to check beforehand - existing value should contain 
>> value corresponding to the one reported with show-label. Or I can do 
>> that for you - please send the  first 4K chunks to me along with 
>> corresponding label report.
>> Then update with new values - the field has to contain exactly the 
>> same size as your new partition.
>>
>>>
>>>
>>>>
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Igor
>>>>>>
>>>>>>
>>>>>> On 11/20/2018 5:29 PM, Florian Engelmann wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> today we migrated all of our rocksdb and wal devices to new once. 
>>>>>>> The new once are much bigger (500MB for wal/db -> 60GB db and 2G 
>>>>>>> WAL) and LVM based.
>>>>>>>
>>>>>>> We migrated like:
>>>>>>>
>>>>>>>     export OSD=x
>>>>>>>
>>>>>>>     systemctl stop ceph-osd@$OSD
>>>>>>>
>>>>>>>     lvcreate -n db-osd$OSD -L60g data || exit 1
>>>>>>>     lvcreate -n wal-osd$OSD -L2g data || exit 1
>>>>>>>
>>>>>>>     dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal 
>>>>>>> of=/dev/data/wal-osd$OSD bs=1M || exit 1
>>>>>>>     dd if=/var/lib/ceph/osd/ceph-$OSD/block.db 
>>>>>>> of=/dev/data/db-osd$OSD bs=1M  || exit 1
>>>>>>>
>>>>>>>     rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
>>>>>>>     rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
>>>>>>>     ln -vs /dev/data/db-osd$OSD 
>>>>>>> /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
>>>>>>>     ln -vs /dev/data/wal-osd$OSD 
>>>>>>> /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
>>>>>>>
>>>>>>>
>>>>>>>     chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
>>>>>>>     chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
>>>>>>>     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || 
>>>>>>> exit 1
>>>>>>>     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal || 
>>>>>>> exit 1
>>>>>>>
>>>>>>>
>>>>>>>     ceph-bluestore-tool bluefs-bdev-expand --path 
>>>>>>> /var/lib/ceph/osd/ceph-$OSD/ || exit 1
>>>>>>>
>>>>>>>     systemctl start ceph-osd@$OSD
>>>>>>>
>>>>>>>
>>>>>>> Everything went fine but it looks like the db and wal size is 
>>>>>>> still the old one:
>>>>>>>
>>>>>>> ceph daemon osd.0 perf dump|jq '.bluefs'
>>>>>>> {
>>>>>>>   "gift_bytes": 0,
>>>>>>>   "reclaim_bytes": 0,
>>>>>>>   "db_total_bytes": 524279808,
>>>>>>>   "db_used_bytes": 330301440,
>>>>>>>   "wal_total_bytes": 524283904,
>>>>>>>   "wal_used_bytes": 69206016,
>>>>>>>   "slow_total_bytes": 320058949632,
>>>>>>>   "slow_used_bytes": 13606322176,
>>>>>>>   "num_files": 220,
>>>>>>>   "log_bytes": 44204032,
>>>>>>>   "log_compactions": 0,
>>>>>>>   "logged_bytes": 31145984,
>>>>>>>   "files_written_wal": 1,
>>>>>>>   "files_written_sst": 1,
>>>>>>>   "bytes_written_wal": 37753489,
>>>>>>>   "bytes_written_sst": 238992
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> Even if the new block devices are recognized correctly:
>>>>>>>
>>>>>>> 2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200 
>>>>>>> /var/lib/ceph/osd/ceph-0/block.db) open size 64424509440 
>>>>>>> (0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational
>>>>>>> 2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs 
>>>>>>> add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db 
>>>>>>> size 60GiB
>>>>>>>
>>>>>>>
>>>>>>> 2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600 
>>>>>>> /var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648 
>>>>>>> (0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational
>>>>>>> 2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs 
>>>>>>> add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal 
>>>>>>> size 2GiB
>>>>>>>
>>>>>>>
>>>>>>> Are we missing some command to "notify" rocksdb about the new 
>>>>>>> device size?
>>>>>>>
>>>>>>> All the best,
>>>>>>> Florian
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users at lists.ceph.com
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users at lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users at lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 

EveryWare AG
Florian Engelmann
Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: mailto:florian.engelmann at everyware.ch
web: http://www.everyware.ch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5210 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181121/5b684e5a/attachment.bin>


More information about the ceph-users mailing list