[ceph-users] RocksDB and WAL migration to new block device

Igor Fedotov ifedotov at suse.de
Thu Nov 22 00:38:52 PST 2018


Hi Florian,


On 11/21/2018 7:01 PM, Florian Engelmann wrote:
> Hi Igor,
>
> sad to say but I failed building the tool. I tried to build the whole 
> project like documented here:
>
> http://docs.ceph.com/docs/mimic/install/build-ceph/
>
> But as my workstation is running Ubuntu the binary fails on SLES:
>
> ./ceph-bluestore-tool --help
> ./ceph-bluestore-tool: symbol lookup error: ./ceph-bluestore-tool: 
> undefined symbol: _ZNK7leveldb6Status8ToStringB5cxx11Ev
>
> I did copy all libraries to ~/lib and exported LD_LIBRARY_PATH but it 
> did not solve the problem.
>
> Is there any simple method to just build the bluestore-tool standalone 
> and static?
>
Unfortunately I don't know such a method.

May be try hex editing instead?

> All the best,
> Florian
>
>
> Am 11/21/18 um 9:34 AM schrieb Igor Fedotov:
>> Actually  (given that your devices are already expanded) you don't 
>> need to expand them once again - one can just update size labels with 
>> my new PR.
>>
>> For new migrations you can use updated bluefs expand command which 
>> sets size label automatically though.
>>
>>
>> Thanks,
>> Igor
>> On 11/21/2018 11:11 AM, Florian Engelmann wrote:
>>> Great support Igor!!!! Both thumbs up! We will try to build the tool 
>>> today and expand those bluefs devices once again.
>>>
>>>
>>> Am 11/20/18 um 6:54 PM schrieb Igor Fedotov:
>>>> FYI: https://github.com/ceph/ceph/pull/25187
>>>>
>>>>
>>>> On 11/20/2018 8:13 PM, Igor Fedotov wrote:
>>>>>
>>>>> On 11/20/2018 7:05 PM, Florian Engelmann wrote:
>>>>>> Am 11/20/18 um 4:59 PM schrieb Igor Fedotov:
>>>>>>>
>>>>>>>
>>>>>>> On 11/20/2018 6:42 PM, Florian Engelmann wrote:
>>>>>>>> Hi Igor,
>>>>>>>>
>>>>>>>>>
>>>>>>>>> what's your Ceph version?
>>>>>>>>
>>>>>>>> 12.2.8 (SES 5.5 - patched to the latest version)
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Can you also check the output for
>>>>>>>>>
>>>>>>>>> ceph-bluestore-tool show-label -p <path to osd>
>>>>>>>>
>>>>>>>> ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
>>>>>>>> infering bluefs devices from bluestore path
>>>>>>>> {
>>>>>>>>     "/var/lib/ceph/osd/ceph-0//block": {
>>>>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>>>>         "size": 8001457295360,
>>>>>>>>         "btime": "2018-06-29 23:43:12.088842",
>>>>>>>>         "description": "main",
>>>>>>>>         "bluefs": "1",
>>>>>>>>         "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
>>>>>>>>         "kv_backend": "rocksdb",
>>>>>>>>         "magic": "ceph osd volume v026",
>>>>>>>>         "mkfs_done": "yes",
>>>>>>>>         "ready": "ready",
>>>>>>>>         "whoami": "0"
>>>>>>>>     },
>>>>>>>>     "/var/lib/ceph/osd/ceph-0//block.wal": {
>>>>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>>>>         "size": 524288000,
>>>>>>>>         "btime": "2018-06-29 23:43:12.098690",
>>>>>>>>         "description": "bluefs wal"
>>>>>>>>     },
>>>>>>>>     "/var/lib/ceph/osd/ceph-0//block.db": {
>>>>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>>>>         "size": 524288000,
>>>>>>>>         "btime": "2018-06-29 23:43:12.098023",
>>>>>>>>         "description": "bluefs db"
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It should report 'size' labels for every volume, please check 
>>>>>>>>> they contain new values.
>>>>>>>>>
>>>>>>>>
>>>>>>>> That's exactly the problem, whether "ceph-bluestore-tool 
>>>>>>>> show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did 
>>>>>>>> recognize the new sizes. But we are 100% sure the new devices 
>>>>>>>> are used as we already deleted the old once...
>>>>>>>>
>>>>>>>> We tried to delete the "key" "size" to add one with the new 
>>>>>>>> value but:
>>>>>>>>
>>>>>>>> ceph-bluestore-tool rm-label-key --dev 
>>>>>>>> /var/lib/ceph/osd/ceph-0/block.db -k size
>>>>>>>> key 'size' not present
>>>>>>>>
>>>>>>>> even if:
>>>>>>>>
>>>>>>>> ceph-bluestore-tool show-label --dev 
>>>>>>>> /var/lib/ceph/osd/ceph-0/block.db
>>>>>>>> {
>>>>>>>>     "/var/lib/ceph/osd/ceph-0/block.db": {
>>>>>>>>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>>>>>>>>         "size": 524288000,
>>>>>>>>         "btime": "2018-06-29 23:43:12.098023",
>>>>>>>>         "description": "bluefs db"
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>
>>>>>>>> So it looks like the key "size" is "read-only"?
>>>>>>>>
>>>>>>> There was a bug in updating specific keys, see
>>>>>>> https://github.com/ceph/ceph/pull/24352
>>>>>>>
>>>>>>> This PR also eliminates the need to set sizes manually on 
>>>>>>> bdev-expand.
>>>>>>>
>>>>>>> I thought it had been backported to Luminous but it looks like 
>>>>>>> it doesn't.
>>>>>>> Will submit a PR shortly.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Thank you so much Igor! So we have to decide how to proceed. 
>>>>>> Maybe you could help us here as well.
>>>>>>
>>>>>> Option A: Wait for this fix to be available. -> could last weeks 
>>>>>> or even months
>>>>> if you can build a custom version of ceph_bluestore_tool then this 
>>>>> is a short path. I'll submit a patch today or tomorrow which you 
>>>>> need to integrate into your private build.
>>>>> Then you need to upgrade just the tool and apply new sizes.
>>>>>
>>>>>>
>>>>>> Option B: Recreate OSDs "one-by-one". -> will take a very long 
>>>>>> time as well
>>>>> No need for that IMO.
>>>>>>
>>>>>> Option C: There is some "lowlevel" commad allowing us to fix 
>>>>>> those sizes?
>>>>> Well hex editor might help here as well. What you need is just to 
>>>>> update 64bit size value in block.db and block.wal files. In my lab 
>>>>> I can find it at offset 0x52. Most probably this is the fixed 
>>>>> location but it's better to check beforehand - existing value 
>>>>> should contain value corresponding to the one reported with 
>>>>> show-label. Or I can do that for you - please send the first 4K 
>>>>> chunks to me along with corresponding label report.
>>>>> Then update with new values - the field has to contain exactly the 
>>>>> same size as your new partition.
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Igor
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/20/2018 5:29 PM, Florian Engelmann wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> today we migrated all of our rocksdb and wal devices to new 
>>>>>>>>>> once. The new once are much bigger (500MB for wal/db -> 60GB 
>>>>>>>>>> db and 2G WAL) and LVM based.
>>>>>>>>>>
>>>>>>>>>> We migrated like:
>>>>>>>>>>
>>>>>>>>>>     export OSD=x
>>>>>>>>>>
>>>>>>>>>>     systemctl stop ceph-osd@$OSD
>>>>>>>>>>
>>>>>>>>>>     lvcreate -n db-osd$OSD -L60g data || exit 1
>>>>>>>>>>     lvcreate -n wal-osd$OSD -L2g data || exit 1
>>>>>>>>>>
>>>>>>>>>>     dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal 
>>>>>>>>>> of=/dev/data/wal-osd$OSD bs=1M || exit 1
>>>>>>>>>>     dd if=/var/lib/ceph/osd/ceph-$OSD/block.db 
>>>>>>>>>> of=/dev/data/db-osd$OSD bs=1M  || exit 1
>>>>>>>>>>
>>>>>>>>>>     rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
>>>>>>>>>>     rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
>>>>>>>>>>     ln -vs /dev/data/db-osd$OSD 
>>>>>>>>>> /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
>>>>>>>>>>     ln -vs /dev/data/wal-osd$OSD 
>>>>>>>>>> /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || 
>>>>>>>>>> exit 1
>>>>>>>>>>     chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || 
>>>>>>>>>> exit 1
>>>>>>>>>>     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db 
>>>>>>>>>> || exit 1
>>>>>>>>>>     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal 
>>>>>>>>>> || exit 1
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     ceph-bluestore-tool bluefs-bdev-expand --path 
>>>>>>>>>> /var/lib/ceph/osd/ceph-$OSD/ || exit 1
>>>>>>>>>>
>>>>>>>>>>     systemctl start ceph-osd@$OSD
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Everything went fine but it looks like the db and wal size is 
>>>>>>>>>> still the old one:
>>>>>>>>>>
>>>>>>>>>> ceph daemon osd.0 perf dump|jq '.bluefs'
>>>>>>>>>> {
>>>>>>>>>>   "gift_bytes": 0,
>>>>>>>>>>   "reclaim_bytes": 0,
>>>>>>>>>>   "db_total_bytes": 524279808,
>>>>>>>>>>   "db_used_bytes": 330301440,
>>>>>>>>>>   "wal_total_bytes": 524283904,
>>>>>>>>>>   "wal_used_bytes": 69206016,
>>>>>>>>>>   "slow_total_bytes": 320058949632,
>>>>>>>>>>   "slow_used_bytes": 13606322176,
>>>>>>>>>>   "num_files": 220,
>>>>>>>>>>   "log_bytes": 44204032,
>>>>>>>>>>   "log_compactions": 0,
>>>>>>>>>>   "logged_bytes": 31145984,
>>>>>>>>>>   "files_written_wal": 1,
>>>>>>>>>>   "files_written_sst": 1,
>>>>>>>>>>   "bytes_written_wal": 37753489,
>>>>>>>>>>   "bytes_written_sst": 238992
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Even if the new block devices are recognized correctly:
>>>>>>>>>>
>>>>>>>>>> 2018-11-20 11:40:34.653524 7f70219b8d00  1 
>>>>>>>>>> bdev(0x5647ea9ce200 /var/lib/ceph/osd/ceph-0/block.db) open 
>>>>>>>>>> size 64424509440 (0xf00000000, 60GiB) block_size 4096 (4KiB) 
>>>>>>>>>> non-rotational
>>>>>>>>>> 2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs 
>>>>>>>>>> add_block_device bdev 1 path 
>>>>>>>>>> /var/lib/ceph/osd/ceph-0/block.db size 60GiB
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2018-11-20 11:40:34.662385 7f70219b8d00  1 
>>>>>>>>>> bdev(0x5647ea9ce600 /var/lib/ceph/osd/ceph-0/block.wal) open 
>>>>>>>>>> size 2147483648 (0x80000000, 2GiB) block_size 4096 (4KiB) 
>>>>>>>>>> non-rotational
>>>>>>>>>> 2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs 
>>>>>>>>>> add_block_device bdev 0 path 
>>>>>>>>>> /var/lib/ceph/osd/ceph-0/block.wal size 2GiB
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Are we missing some command to "notify" rocksdb about the new 
>>>>>>>>>> device size?
>>>>>>>>>>
>>>>>>>>>> All the best,
>>>>>>>>>> Florian
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ceph-users mailing list
>>>>>>>>>> ceph-users at lists.ceph.com
>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> ceph-users mailing list
>>>>>>>>> ceph-users at lists.ceph.com
>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list
>>>>>>>> ceph-users at lists.ceph.com
>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users at lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
>



More information about the ceph-users mailing list