[ceph-users] Performance, and how much wiggle room there is with tunables

Mark Nelson mnelson at redhat.com
Fri Nov 10 10:36:14 PST 2017



On 11/10/2017 12:21 PM, Maged Mokhtar wrote:
> Hi Mark,
>
> It will be interesting to know:
>
> The impact of replication. I guess it will decrease by a higher factor
> than the replica count.
>
> I assume you mean the 30K IOPS per OSD is what the client sees, if so
> the OSD raw disk itself will be doing more IOPS, is this correct and if
> so what is the factor ( the less the better efficiency).

In those tests it's 1x replication with 1 OSD.  You do lose more than 3X 
for 3X replication, but it's not necessarily easy to tell how much 
depending on the network, kernel, etc.

>
> Are you running 1 OSD per physical drive or multiple..any recommendations ?

In those tests 1 OSD per NVMe.  You can do better if you put multiple 
OSDs on the same drive, both for filestore and bluestore.

Mark

>
> Cheers /Maged
>
> On 2017-11-10 18:51, Mark Nelson wrote:
>
>> FWIW, on very fast drives you can achieve at least 1.4GB/s and 30K+
>> write IOPS per OSD (before replication).  It's quite possible to do
>> better but those are recent numbers on a mostly default bluestore
>> configuration that I'm fairly confident to share.  It takes a lot of
>> CPU, but it's possible.
>>
>> Mark
>>
>> On 11/10/2017 10:35 AM, Robert Stanford wrote:
>>>
>>>  Thank you for that excellent observation.  Are there any rumors / has
>>> anyone had experience with faster clusters, on faster networks?  I
>>> wonder how Ceph can get ("it depends"), of course, but I wonder about
>>> numbers people have seen.
>>>
>>> On Fri, Nov 10, 2017 at 10:31 AM, Denes Dolhay <denke at denkesys.com
>>> <mailto:denke at denkesys.com>
>>> <mailto:denke at denkesys.com <mailto:denke at denkesys.com>>> wrote:
>>>
>>>     So you are using a 40 / 100 gbit connection all the way to your client?
>>>
>>>     John's question is valid because 10 gbit = 1.25GB/s ... subtract
>>>     some ethernet, ip, tcp and protocol overhead take into account some
>>>     additional network factors and you are about there...
>>>
>>>
>>>     Denes
>>>
>>>
>>>     On 11/10/2017 05:10 PM, Robert Stanford wrote:
>>>>
>>>>      The bandwidth of the network is much higher than that.  The
>>>>     bandwidth I mentioned came from "rados bench" output, under the
>>>>     "Bandwidth (MB/sec)" row.  I see from comparing mine to others
>>>>     online that mine is pretty good (relatively).  But I'd like to get
>>>>     much more than that.
>>>>
>>>>     Does "rados bench" show a near maximum of what a cluster can do?
>>>>     Or is it possible that I can tune it to get more bandwidth?
>>>>     |
>>>>     |
>>>>
>>>>     On Fri, Nov 10, 2017 at 3:43 AM, John Spray <jspray at redhat.com
>>>> <mailto:jspray at redhat.com>
>>>>     <mailto:jspray at redhat.com <mailto:jspray at redhat.com>>> wrote:
>>>>
>>>>         On Fri, Nov 10, 2017 at 4:29 AM, Robert Stanford
>>>>         <rstanford8896 at gmail.com
>>>> <mailto:rstanford8896 at gmail.com> <mailto:rstanford8896 at gmail.com
>>>> <mailto:rstanford8896 at gmail.com>>> wrote:
>>>>         >
>>>>         >  In my cluster, rados bench shows about 1GB/s bandwidth.
>>>>         I've done some
>>>>         > tuning:
>>>>         >
>>>>         > [osd]
>>>>         > osd op threads = 8
>>>>         > osd disk threads = 4
>>>>         > osd recovery max active = 7
>>>>         >
>>>>         >
>>>>         > I was hoping to get much better bandwidth.  My network can
>>>>         handle it, and my
>>>>         > disks are pretty fast as well.  Are there any major tunables
>>>>         I can play with
>>>>         > to increase what will be reported by "rados bench"?  Am I
>>>>         pretty much stuck
>>>>         > around the bandwidth it reported?
>>>>
>>>>         Are you sure your 1GB/s isn't just the NIC bandwidth limit of the
>>>>         client you're running rados bench from?
>>>>
>>>>         John
>>>>
>>>>         >
>>>>         >  Thank you
>>>>         >
>>>>         > _______________________________________________
>>>>         > ceph-users mailing list
>>>>         > ceph-users at lists.ceph.com
>>>> <mailto:ceph-users at lists.ceph.com> <mailto:ceph-users at lists.ceph.com
>>>> <mailto:ceph-users at lists.ceph.com>>
>>>>         > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>         <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>>>         >
>>>>
>>>>
>>>>
>>>>
>>>>     _______________________________________________
>>>>     ceph-users mailing list
>>>>     ceph-users at lists.ceph.com
>>>> <mailto:ceph-users at lists.ceph.com> <mailto:ceph-users at lists.ceph.com
>>>> <mailto:ceph-users at lists.ceph.com>>
>>>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>     <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>>
>>>
>>>     _______________________________________________
>>>     ceph-users mailing list
>>>     ceph-users at lists.ceph.com
>>> <mailto:ceph-users at lists.ceph.com> <mailto:ceph-users at lists.ceph.com
>>> <mailto:ceph-users at lists.ceph.com>>
>>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>     <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>


More information about the ceph-users mailing list