[ceph-users] RocksDB and WAL migration to new block device

Igor Fedotov ifedotov at suse.de
Tue Nov 20 07:59:11 PST 2018



On 11/20/2018 6:42 PM, Florian Engelmann wrote:
> Hi Igor,
>
>>
>> what's your Ceph version?
>
> 12.2.8 (SES 5.5 - patched to the latest version)
>
>>
>> Can you also check the output for
>>
>> ceph-bluestore-tool show-label -p <path to osd>
>
> ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
> infering bluefs devices from bluestore path
> {
>     "/var/lib/ceph/osd/ceph-0//block": {
>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>         "size": 8001457295360,
>         "btime": "2018-06-29 23:43:12.088842",
>         "description": "main",
>         "bluefs": "1",
>         "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
>         "kv_backend": "rocksdb",
>         "magic": "ceph osd volume v026",
>         "mkfs_done": "yes",
>         "ready": "ready",
>         "whoami": "0"
>     },
>     "/var/lib/ceph/osd/ceph-0//block.wal": {
>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>         "size": 524288000,
>         "btime": "2018-06-29 23:43:12.098690",
>         "description": "bluefs wal"
>     },
>     "/var/lib/ceph/osd/ceph-0//block.db": {
>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>         "size": 524288000,
>         "btime": "2018-06-29 23:43:12.098023",
>         "description": "bluefs db"
>     }
> }
>
>
>>
>>
>> It should report 'size' labels for every volume, please check they 
>> contain new values.
>>
>
> That's exactly the problem, whether "ceph-bluestore-tool show-label" 
> nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did recognize the new 
> sizes. But we are 100% sure the new devices are used as we already 
> deleted the old once...
>
> We tried to delete the "key" "size" to add one with the new value but:
>
> ceph-bluestore-tool rm-label-key --dev 
> /var/lib/ceph/osd/ceph-0/block.db -k size
> key 'size' not present
>
> even if:
>
> ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db
> {
>     "/var/lib/ceph/osd/ceph-0/block.db": {
>         "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
>         "size": 524288000,
>         "btime": "2018-06-29 23:43:12.098023",
>         "description": "bluefs db"
>     }
> }
>
> So it looks like the key "size" is "read-only"?
>
There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352

This PR also eliminates the need to set sizes manually on bdev-expand.

I thought it had been backported to Luminous but it looks like it doesn't.
Will submit a PR shortly.


Thanks,
Igor


>>
>> Thanks,
>>
>> Igor
>>
>>
>> On 11/20/2018 5:29 PM, Florian Engelmann wrote:
>>> Hi,
>>>
>>> today we migrated all of our rocksdb and wal devices to new once. 
>>> The new once are much bigger (500MB for wal/db -> 60GB db and 2G 
>>> WAL) and LVM based.
>>>
>>> We migrated like:
>>>
>>>     export OSD=x
>>>
>>>     systemctl stop ceph-osd@$OSD
>>>
>>>     lvcreate -n db-osd$OSD -L60g data || exit 1
>>>     lvcreate -n wal-osd$OSD -L2g data || exit 1
>>>
>>>     dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal 
>>> of=/dev/data/wal-osd$OSD bs=1M || exit 1
>>>     dd if=/var/lib/ceph/osd/ceph-$OSD/block.db 
>>> of=/dev/data/db-osd$OSD bs=1M  || exit 1
>>>
>>>     rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
>>>     rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
>>>     ln -vs /dev/data/db-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.db 
>>> || exit 1
>>>     ln -vs /dev/data/wal-osd$OSD 
>>> /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
>>>
>>>
>>>     chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
>>>     chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
>>>     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
>>>     chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
>>>
>>>
>>>     ceph-bluestore-tool bluefs-bdev-expand --path 
>>> /var/lib/ceph/osd/ceph-$OSD/ || exit 1
>>>
>>>     systemctl start ceph-osd@$OSD
>>>
>>>
>>> Everything went fine but it looks like the db and wal size is still 
>>> the old one:
>>>
>>> ceph daemon osd.0 perf dump|jq '.bluefs'
>>> {
>>>   "gift_bytes": 0,
>>>   "reclaim_bytes": 0,
>>>   "db_total_bytes": 524279808,
>>>   "db_used_bytes": 330301440,
>>>   "wal_total_bytes": 524283904,
>>>   "wal_used_bytes": 69206016,
>>>   "slow_total_bytes": 320058949632,
>>>   "slow_used_bytes": 13606322176,
>>>   "num_files": 220,
>>>   "log_bytes": 44204032,
>>>   "log_compactions": 0,
>>>   "logged_bytes": 31145984,
>>>   "files_written_wal": 1,
>>>   "files_written_sst": 1,
>>>   "bytes_written_wal": 37753489,
>>>   "bytes_written_sst": 238992
>>> }
>>>
>>>
>>> Even if the new block devices are recognized correctly:
>>>
>>> 2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200 
>>> /var/lib/ceph/osd/ceph-0/block.db) open size 64424509440 
>>> (0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational
>>> 2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs add_block_device 
>>> bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 60GiB
>>>
>>>
>>> 2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600 
>>> /var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648 
>>> (0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational
>>> 2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs add_block_device 
>>> bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2GiB
>>>
>>>
>>> Are we missing some command to "notify" rocksdb about the new device 
>>> size?
>>>
>>> All the best,
>>> Florian
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181120/bf6a08dd/attachment.html>


More information about the ceph-users mailing list