[ceph-users] Ceph cluster on AMD based system.

Mark Nelson mnelson at redhat.com
Tue Mar 5 07:26:21 PST 2019


Hi,


I've got a ryzen7 1700 box that I regularly run tests on along with the 
upstream community performance test nodes that have Intel Xeon E5-2650v3 
processors in them.  The Ryzen is 3.0GHz/3.7GHz turbo while the Xeons 
are 2.3GHz/3.0GHz.  The Xeons are quite a bit faster clock/clock in the 
tests I've done with Ceph. Typically I see a single OSD using fewer 
cores on the Xeon processors vs Ryzen to hit similar performance numbers 
despite being clocked lower (though I haven't verified the turbo 
frequencies of both under load).  On the other hand, the Ryzen processor 
is significantly cheaper per core.  If you only looked at cores you'd 
think something like Ryzen would be the way to go, but there are other 
things to consider.  The number of PCIE lanes, memory configuration, 
cache configuration, and CPU interconnect (in multi-socket 
configurations) all start becoming really important if you are targeting 
multiple NVMe drives like what you are talking about below.  The EPYC 
processors give you more of all of that, but also costs a lot more than 
Ryzen.  Ultimately the CPU is only a small part of the price for nodes 
like this so I wouldn't skimp if your goal is to maximize IOPS.


With 10 NVMe drives per node, I'm guessing that a single EPYC 7451 is 
going to be CPU bound for small IO workloads (2.4c/4.8t per OSD), but 
will be network bound for large IO workloads unless you are sticking 
2x100GbE in.  You might want to consider jumping up to the 7601.  That 
would get you closer to where you want to be for 10 NVMe drives 
(3.2c/6.4t per OSD).  Another option might be dual 7351s in this chassis:

https://www.supermicro.com/Aplus/system/1U/1123/AS-1123US-TN10RT.cfm


Figure that with sufficient client parallelism/load you'll get about 
3000-6000 read IOPS/core and about 1500-3000 write IOPS/core (before 
replication) with OSDs typically topping out at a max of about 6-8 cores 
each.  Doubling up OSDs on each NVMe drive might improve or hurt 
performance depending on what the limitations are (typically it seems to 
help most when the kv sync thread is the primary bottleneck in 
bluestore, which most likely happens with tons of slow cores and very 
fast NVMe drives).  Those are all very rough hand-wavy numbers and 
depend on a huge variety of factors so take them with a grain of salt.  
Doing things like disabling authentication, disabling logging, forcing 
high level P/C states, tweaking RocksDB WAL and compaction settings, the 
number of osd shards/threads, and the system numa configuration might 
get you higher performance/core, though it's all pretty hard to predict 
without outright testing it.


Though you didn't ask about it, probably the most important thing you 
can spend money on with NVMe drives is getting high write endurance 
(DWPD) if you expect even a moderately high write workload.


Mark


On 3/5/19 3:49 AM, Darius Kasparavičius wrote:
> Hello,
>
>
> I was thinking of using AMD based system for my new nvme based
> cluster. In particular I'm looking at
> https://www.supermicro.com/Aplus/system/1U/1113/AS-1113S-WN10RT.cfm
> and https://www.amd.com/en/products/cpu/amd-epyc-7451 CPU's. Have
> anyone tried running it on this particular hardware?
>
> General idea is 6 nodes with 10 nvme drives and 2 osds per nvme drive.
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


More information about the ceph-users mailing list