[ceph-users] RocksDB and WAL migration to new block device

Florian Engelmann florian.engelmann at everyware.ch
Wed Nov 21 08:01:33 PST 2018


Hi Igor,

sad to say but I failed building the tool. I tried to build the whole 
project like documented here:

http://docs.ceph.com/docs/mimic/install/build-ceph/

But as my workstation is running Ubuntu the binary fails on SLES:

./ceph-bluestore-tool --help
./ceph-bluestore-tool: symbol lookup error: ./ceph-bluestore-tool: 
undefined symbol: _ZNK7leveldb6Status8ToStringB5cxx11Ev

I did copy all libraries to ~/lib and exported LD_LIBRARY_PATH but it 
did not solve the problem.

Is there any simple method to just build the bluestore-tool standalone 
and static?

All the best,
Florian


Am 11/21/18 um 9:34 AM schrieb Igor Fedotov:
> Actually  (given that your devices are already expanded) you don't need 
> to expand them once again - one can just update size labels with my new PR.
> 
> For new migrations you can use updated bluefs expand command which sets 
> size label automatically though.
> 
> 
> Thanks,
> Igor
> On 11/21/2018 11:11 AM, Florian Engelmann wrote:
>> Great support Igor!!!! Both thumbs up! We will try to build the tool 
>> today and expand those bluefs devices once again.
>>
>>
>> Am 11/20/18 um 6:54 PM schrieb Igor Fedotov:
>>> FYI: https://github.com/ceph/ceph/pull/25187
>>>
>>>
>>> On 11/20/2018 8:13 PM, Igor Fedotov wrote:
>>>>
>>>> On 11/20/2018 7:05 PM, Florian Engelmann wrote:
>>>>> Am 11/20/18 um 4:59 PM schrieb Igor Fedotov:
>>>>>>
>>>>>>
>>>>>> On 11/20/2018 6:42 PM, Florian Engelmann wrote:
>>>>>>> Hi Igor,
>>>>>>>
>>>>>>>>
>>>>>>>> what's your Ceph version?
>>>>>>>
>>>>>>> 12.2.8 (SES 5.5 - patched to the latest version)
>>>>>>>
>>>>>>>>
>>>>>>>> Can you also check the output for
>>>>>>>>
>>>>>>>> ceph-bluestore-tool show-label -p <path to osd>
>>>>>>>
>>>>>>> ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
>>>>>>> infering bluefs devices from bluestore path
>>>>>>> {
>>>>>>>     "/var/lib/ceph/osd/ceph-0//block": {
>>>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>>>         "size": 8001457295360,
>>>>>>>         "btime": "2018-06-29 23:43:12.088842",
>>>>>>>         "description": "main",
>>>>>>>         "bluefs": "1",
>>>>>>>         "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
>>>>>>>         "kv_backend": "rocksdb",
>>>>>>>         "magic": "ceph osd volume v026",
>>>>>>>         "mkfs_done": "yes",
>>>>>>>         "ready": "ready",
>>>>>>>         "whoami": "0"
>>>>>>>     },
>>>>>>>     "/var/lib/ceph/osd/ceph-0//block.wal": {
>>>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>>>         "size": 524288000,
>>>>>>>         "btime": "2018-06-29 23:43:12.098690",
>>>>>>>         "description": "bluefs wal"
>>>>>>>     },
>>>>>>>     "/var/lib/ceph/osd/ceph-0//block.db": {
>>>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>>>         "size": 524288000,
>>>>>>>         "btime": "2018-06-29 23:43:12.098023",
>>>>>>>         "description": "bluefs db"
>>>>>>>     }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> It should report 'size' labels for every volume, please check 
>>>>>>>> they contain new values.
>>>>>>>>
>>>>>>>
>>>>>>> That's exactly the problem, whether "ceph-bluestore-tool 
>>>>>>> show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did 
>>>>>>> recognize the new sizes. But we are 100% sure the new devices are 
>>>>>>> used as we already deleted the old once...
>>>>>>>
>>>>>>> We tried to delete the "key" "size" to add one with the new value 
>>>>>>> but:
>>>>>>>
>>>>>>> ceph-bluestore-tool rm-label-key --dev 
>>>>>>> /var/lib/ceph/osd/ceph-0/block.db -k size
>>>>>>> key 'size' not present
>>>>>>>
>>>>>>> even if:
>>>>>>>
>>>>>>> ceph-bluestore-tool show-label --dev 
>>>>>>> /var/lib/ceph/osd/ceph-0/block.db
>>>>>>> {
>>>>>>>     "/var/lib/ceph/osd/ceph-0/block.db": {
>>>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>>>         "size": 524288000,
>>>>>>>         "btime": "2018-06-29 23:43:12.098023",
>>>>>>>         "description": "bluefs db"
>>>>>>>     }
>>>>>>> }
>>>>>>>
>>>>>>> So it looks like the key "size" is "read-only"?
>>>>>>>
>>>>>> There was a bug in updating specific keys, see
>>>>>> https://github.com/ceph/ceph/pull/24352
>>>>>>
>>>>>> This PR also eliminates the need to set sizes manually on 
>>>>>> bdev-expand.
>>>>>>
>>>>>> I thought it had been backported to Luminous but it looks like it 
>>>>>> doesn't.
>>>>>> Will submit a PR shortly.
>>>>>>
>>>>>>
>>>>>
>>>>> Thank you so much Igor! So we have to decide how to proceed. Maybe 
>>>>> you could help us here as well.
>>>>>
>>>>> Option A: Wait for this fix to be available. -> could last weeks or 
>>>>> even months
>>>> if you can build a custom version of ceph_bluestore_tool then this 
>>>> is a short path. I'll submit a patch today or tomorrow which you 
>>>> need to integrate into your private build.
>>>> Then you need to upgrade just the tool and apply new sizes.
>>>>
>>>>>
>>>>> Option B: Recreate OSDs "one-by-one". -> will take a very long time 
>>>>> as well
>>>> No need for that IMO.
>>>>>
>>>>> Option C: There is some "lowlevel" commad allowing us to fix those 
>>>>> sizes?
>>>> Well hex editor might help here as well. What you need is just to 
>>>> update 64bit size value in block.db and block.wal files. In my lab I 
>>>> can find it at offset 0x52. Most probably this is the fixed location 
>>>> but it's better to check beforehand - existing value should contain 
>>>> value corresponding to the one reported with show-label. Or I can do 
>>>> that for you - please send the first 4K chunks to me along with 
>>>> corresponding label report.
>>>> Then update with new values - the field has to contain exactly the 
>>>> same size as your new partition.
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Igor
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/20/2018 5:29 PM, Florian Engelmann wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> today we migrated all of our rocksdb and wal devices to new 
>>>>>>>>> once. The new once are much bigger (500MB for wal/db -> 60GB db 
>>>>>>>>> and 2G WAL) and LVM based.
>>>>>>>>>
>>>>>>>>> We migrated like:
>>>>>>>>>
>>>>>>>>>     export OSD=x
>>>>>>>>>
>>>>>>>>>     systemctl stop ceph-osd@$OSD
>>>>>>>>>
>>>>>>>>>     lvcreate -n db-osd$OSD -L60g data || exit 1
>>>>>>>>>     lvcreate -n wal-osd$OSD -L2g data || exit 1
>>>>>>>>>
>>>>>>>>>     dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal 
>>>>>>>>> of=/dev/data/wal-osd$OSD bs=1M || exit 1
>>>>>>>>>     dd if=/var/lib/ceph/osd/ceph-$OSD/block.db 
>>>>>>>>> of=/dev/data/db-osd$OSD bs=1M  || exit 1
>>>>>>>>>
>>>>>>>>>     rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
>>>>>>>>>     rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
>>>>>>>>>     ln -vs /dev/data/db-osd$OSD 
>>>>>>>>> /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
>>>>>>>>>     ln -vs /dev/data/wal-osd$OSD 
>>>>>>>>> /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
>>>>>>>>>     chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
>>>>>>>>>     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || 
>>>>>>>>> exit 1
>>>>>>>>>     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal 
>>>>>>>>> || exit 1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     ceph-bluestore-tool bluefs-bdev-expand --path 
>>>>>>>>> /var/lib/ceph/osd/ceph-$OSD/ || exit 1
>>>>>>>>>
>>>>>>>>>     systemctl start ceph-osd@$OSD
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Everything went fine but it looks like the db and wal size is 
>>>>>>>>> still the old one:
>>>>>>>>>
>>>>>>>>> ceph daemon osd.0 perf dump|jq '.bluefs'
>>>>>>>>> {
>>>>>>>>>   "gift_bytes": 0,
>>>>>>>>>   "reclaim_bytes": 0,
>>>>>>>>>   "db_total_bytes": 524279808,
>>>>>>>>>   "db_used_bytes": 330301440,
>>>>>>>>>   "wal_total_bytes": 524283904,
>>>>>>>>>   "wal_used_bytes": 69206016,
>>>>>>>>>   "slow_total_bytes": 320058949632,
>>>>>>>>>   "slow_used_bytes": 13606322176,
>>>>>>>>>   "num_files": 220,
>>>>>>>>>   "log_bytes": 44204032,
>>>>>>>>>   "log_compactions": 0,
>>>>>>>>>   "logged_bytes": 31145984,
>>>>>>>>>   "files_written_wal": 1,
>>>>>>>>>   "files_written_sst": 1,
>>>>>>>>>   "bytes_written_wal": 37753489,
>>>>>>>>>   "bytes_written_sst": 238992
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Even if the new block devices are recognized correctly:
>>>>>>>>>
>>>>>>>>> 2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200 
>>>>>>>>> /var/lib/ceph/osd/ceph-0/block.db) open size 64424509440 
>>>>>>>>> (0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational
>>>>>>>>> 2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs 
>>>>>>>>> add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db 
>>>>>>>>> size 60GiB
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600 
>>>>>>>>> /var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648 
>>>>>>>>> (0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational
>>>>>>>>> 2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs 
>>>>>>>>> add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal 
>>>>>>>>> size 2GiB
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Are we missing some command to "notify" rocksdb about the new 
>>>>>>>>> device size?
>>>>>>>>>
>>>>>>>>> All the best,
>>>>>>>>> Florian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> ceph-users mailing list
>>>>>>>>> ceph-users at lists.ceph.com
>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list
>>>>>>>> ceph-users at lists.ceph.com
>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users at lists.ceph.com
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users at lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
> 

-- 

EveryWare AG
Florian Engelmann
Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: mailto:florian.engelmann at everyware.ch
web: http://www.everyware.ch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5210 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181121/f80f3558/attachment.bin>


More information about the ceph-users mailing list