[ceph-users] Cluster network slower than public network

Wido den Hollander wido at 42on.com
Thu Nov 16 07:31:25 PST 2017


> Op 16 november 2017 om 16:20 schreef David Turner <drakonstein at gmail.com>:
> 
> 
> There is another thread in the ML right now covering this exact topic.  The
> general consensus is that for most deployments, a separate network for
> public and cluster is wasted complexity.
> 

Indeed. Just for the record (when people search): If you have very specific needs a public/cluster network can work.

However, for most setups a 2x10Gbit LACP trunk into a server is more then sufficient. Saturating 20Gbit with Ceph is difficult. In total a Ceph cluster can do much more then 20Gbit, but a single server pushing 2,5GB/sec is a lot of traffic.

Usually IOps are the main bottleneck, not bandwidth. IOps are latency bound.

I checked a busy cluster with 250 OSDs (SSD):

client io 365 MB/s rd, 138 MB/s wr, 15555 op/s rd, 13044 op/s wr

So this is doing roughly 30k IOps, but only doing a few hunderd Mb/sec.

Don't waste your time or money on a cluster network. Just use a single flat network.

Wido

> On Thu, Nov 16, 2017 at 9:59 AM Jake Young <jak3kaj at gmail.com> wrote:
> 
> > On Wed, Nov 15, 2017 at 1:07 PM Ronny Aasen <ronny+ceph-users at aasen.cx>
> > wrote:
> >
> >> On 15.11.2017 13:50, Gandalf Corvotempesta wrote:
> >>
> >> As 10gb switches are expansive, what would happen by using a gigabit
> >> cluster network and a 10gb public network?
> >>
> >> Replication and rebalance should be slow, but what about public I/O ?
> >> When a client wants to write to a file, it does over the public network
> >> and the ceph automatically replicate it over the cluster network or the
> >> whole IO is made over the public?
> >>
> >>
> >>
> >> public io would be slow.
> >> each write goes from client to primary osd on public network, then is
> >> replicated 2 times to the secondary osd's over the cluster network, then
> >> the client is informed the block is written.
> >> since cluster network would see 2x write traffic compared to public
> >> network when things a OK. and many times the traffic of the public network
> >> when things are recovering or backfilling. i would prioritize the
> >> clusternetwork for the highest speed if one could not have 10Gbps on
> >> everything.
> >>
> >
> >
> > I would seriously consider combining the cluster and public network. It
> > will simplify your configuration.   It really takes a lot to saturate a 10G
> > network with Ceph.
> >
> > If you find that you need to separate your public and cluster networks
> > later, you can do that in the future.
> >
> > Jake
> >
> >> _______________________________________________
> > ceph-users mailing list
> > ceph-users at lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


More information about the ceph-users mailing list