[ceph-users] ceph all-nvme mysql performance tuning

German Anders ganders at despegar.com
Mon Dec 4 04:03:58 PST 2017


Could anyone run the tests? and share some results..

Thanks in advance,

Best,


*German*

2017-11-30 14:25 GMT-03:00 German Anders <ganders at despegar.com>:

> That's correct, IPoIB for the backend (already configured the irq
> affinity),  and 10GbE on the frontend. I would love to try rdma but like
> you said is not stable for production, so I think I'll have to wait for
> that. Yeah, the thing is that it's not my decision to go for 50GbE or
> 100GbE... :( so.. 10GbE for the front-end will be...
>
> Would be really helpful if someone could run the following sysbench test
> on a mysql db so I could make some compares:
>
> *my.cnf *configuration file:
>
> [mysqld_safe]
> nice                                    = 0
> pid-file                                = /home/test_db/mysql/mysql.pid
>
> [client]
> port                                    = 33033
> socket                                  = /home/test_db/mysql/mysql.sock
>
> [mysqld]
> user                                    = test_db
> port                                    = 33033
> socket                                  = /home/test_db/mysql/mysql.sock
> pid-file                                = /home/test_db/mysql/mysql.pid
> log-error                               = /home/test_db/mysql/mysql.err
> datadir                                 = /home/test_db/mysql/data
> tmpdir                                  = /tmp
> server-id                               = 1
>
> # ** Binlogging **
> #log-bin                                = /home/test_db/mysql/binlog/
> mysql-bin
> #log_bin_index                          = /home/test_db/mysql/binlog/
> mysql-bin.index
> expire_logs_days                        = 1
> max_binlog_size                         = 512MB
>
> thread_handling                         = pool-of-threads
> thread_pool_max_threads                 = 300
>
>
> # ** Slow query log **
> slow_query_log                          = 1
> slow_query_log_file                     = /home/test_db/mysql/mysql-
> slow.log
> long_query_time                         = 10
> log_output                              = FILE
> log_slow_slave_statements               = 1
> log_slow_verbosity                      = query_plan,innodb,explain
>
> # ** INNODB Specific options **
> transaction_isolation                   = READ-COMMITTED
> innodb_buffer_pool_size                 = 12G
> innodb_data_file_path                   = ibdata1:256M:autoextend
> innodb_thread_concurrency               = 16
> innodb_log_file_size                    = 256M
> innodb_log_files_in_group               = 3
> innodb_file_per_table
> innodb_log_buffer_size                  = 16M
> innodb_stats_on_metadata                = 0
> innodb_lock_wait_timeout                = 30
> # innodb_flush_method                   = O_DSYNC
> innodb_flush_method                     = O_DIRECT
> max_connections                         = 10000
> max_connect_errors                      = 999999
> max_allowed_packet                      = 128M
> skip-host-cache
> skip-name-resolve
> explicit_defaults_for_timestamp         = 1
> performance_schema                      = OFF
> log_warnings                            = 2
> event_scheduler                         = ON
>
> # ** Specific Galera Cluster Settings **
> binlog_format                           = ROW
> default-storage-engine                  = innodb
> query_cache_size                        = 0
> query_cache_type                        = 0
>
>
> Volume is just an RBD (on a RF=3 pool) with the default 22 bit order
> mounted on */home/test_db/mysql/data*
>
> commands for the test:
>
> sysbench --test=/usr/share/sysbench/tests/include/oltp_legacy/parallel_prepare.lua
> --mysql-host=<hostname> --mysql-port=33033 --mysql-user=sysbench
> --mysql-password=sysbench --mysql-db=sysbench --mysql-table-engine=innodb
> --db-driver=mysql --oltp_tables_count=10 --oltp-test-mode=complex
> --oltp-read-only=off --oltp-table-size=200000 --threads=10
> --rand-type=uniform --rand-init=on cleanup > /dev/null 2>/dev/null
>
> sysbench --test=/usr/share/sysbench/tests/include/oltp_legacy/parallel_prepare.lua
> --mysql-host=<hostname> --mysql-port=33033 --mysql-user=sysbench
> --mysql-password=sysbench --mysql-db=sysbench --mysql-table-engine=innodb
> --db-driver=mysql --oltp_tables_count=10 --oltp-test-mode=complex
> --oltp-read-only=off --oltp-table-size=200000 --threads=10
> --rand-type=uniform --rand-init=on prepare > /dev/null 2>/dev/null
>
> sysbench --test=/usr/share/sysbench/tests/include/oltp_legacy/oltp.lua
> --mysql-host=<hostname> --mysql-port=33033 --mysql-user=sysbench
> --mysql-password=sysbench --mysql-db=sysbench --mysql-table-engine=innodb
> --db-driver=mysql --oltp_tables_count=10 --oltp-test-mode=complex
> --oltp-read-only=off --oltp-table-size=200000 --threads=20
> --rand-type=uniform --rand-init=on --time=120 run >
> result_sysbench_perf_test.out 2>/dev/null
>
> Im looking for tps, qps and 95th perc, could anyone with a all-nvme
> cluster run the test and share the results? I would really appreciate the
> help :)
>
> Thanks in advance,
>
> Best,
>
>
> *German *
>
> 2017-11-29 19:14 GMT-03:00 Zoltan Arnold Nagy <zoltan at linux.vnet.ibm.com>:
>
>> On 2017-11-27 14:02, German Anders wrote:
>>
>>> 4x 2U servers:
>>>   1x 82599ES 10-Gigabit SFI/SFP+ Network Connection
>>>   1x Mellanox ConnectX-3 InfiniBand FDR 56Gb/s Adapter (dual port)
>>>
>> so I assume you are using IPoIB as the cluster network for the
>> replication...
>>
>> 1x OneConnect 10Gb NIC (quad-port) - in a bond configuration
>>> (active/active) with 3 vlans
>>>
>> ... and the 10GbE network for the front-end network?
>>
>> At 4k writes your network latency will be very high (see the flame graphs
>> at the Intel NVMe presentation from the Boston OpenStack Summit - not sure
>> if there is a newer deck that somebody could link ;)) and the time will be
>> spent in the kernel. You could give RDMAMessenger a try but it's not stable
>> at the current LTS release.
>>
>> If I were you I'd be looking at 100GbE - we've recently pulled in a bunch
>> of 100GbE links and it's been wonderful to see 100+GB/s going over the
>> network for just storage.
>>
>> Some people suggested mounting multiple RBD volumes - unless I'm mistaken
>> and you're using very recent qemu/libvirt combinations with the proper
>> libvirt disk settings all IO will still be single threaded towards librbd
>> thus not making any speedup.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171204/4588e970/attachment.html>


More information about the ceph-users mailing list