[ceph-users] read performance, separate client CRUSH maps or limit osd read access from each client

Gregory Farnum gfarnum at redhat.com
Mon Nov 26 05:47:43 PST 2018


On Tue, Nov 20, 2018 at 9:50 PM Vlad Kopylov <vladkopy at gmail.com> wrote:

> I see the point, but not for the read case:
>   no overhead for just choosing or let Mount option choose read replica.
>
> This is simple feature that can be implemented, that will save many
> people bandwidth in really distributed cases.
>

This is actually much more complicated than it sounds. Allowing reads from
the replica OSDs while still routing writes through a different primary OSD
introduces a great many consistency issues. We've tried adding very limited
support for this read-from-replica scenario in special cases, but have had
to roll them all back due to edge cases where they don't work.

I understand why you want it, but it's definitely not a simple feature. :(
-Greg


>
> Main issue this surfaces is that RADOS maps ignore clients - they just
> see cluster. There should be the part of RADOS map unique or possibly
> unique for each client connection.
>
> Lets file feature request?
>
> p.s. honestly, I don't see why anyone would use ceph for local network
> RAID setups, there are other simple solutions out there even in your
> own RedHat shop.
> On Tue, Nov 20, 2018 at 8:38 PM Patrick Donnelly <pdonnell at redhat.com>
> wrote:
> >
> > You either need to accept that reads/writes will land on different data
> centers, primary OSD for a given pool is always in the desired data center,
> or some other non-Ceph solution which will have either expensive, eventual,
> or false consistency.
> >
> > On Fri, Nov 16, 2018, 10:07 AM Vlad Kopylov <vladkopy at gmail.com wrote:
> >>
> >> This is what Jean suggested. I understand it and it works with primary.
> >> But what I need is for all clients to access same files, not separate
> sets (like red blue green)
> >>
> >> Thanks Konstantin.
> >>
> >> On Fri, Nov 16, 2018 at 3:43 AM Konstantin Shalygin <k0ste at k0ste.ru>
> wrote:
> >>>
> >>> On 11/16/18 11:57 AM, Vlad Kopylov wrote:
> >>> > Exactly. But write operations should go to all nodes.
> >>>
> >>> This can be set via primary affinity [1], when a ceph client reads or
> >>> writes data, it always contacts the primary OSD in the acting set.
> >>>
> >>>
> >>> If u want to totally segregate IO, you can use device classes:
> >>>
> >>> Just create osds with different classes:
> >>>
> >>> dc1
> >>>
> >>>    host1
> >>>
> >>>      red osd.0 primary
> >>>
> >>>      blue osd.1
> >>>
> >>>      green osd.2
> >>>
> >>> dc2
> >>>
> >>>    host2
> >>>
> >>>      red osd.3
> >>>
> >>>      blue osd.4 primary
> >>>
> >>>      green osd.5
> >>>
> >>> dc3
> >>>
> >>>    host3
> >>>
> >>>      red osd.6
> >>>
> >>>      blue osd.7
> >>>
> >>>      green osd.8 primary
> >>>
> >>>
> >>> create 3 crush rules:
> >>>
> >>> ceph osd crush rule create-replicated red default host red
> >>>
> >>> ceph osd crush rule create-replicated blue default host blue
> >>>
> >>> ceph osd crush rule create-replicated green default host green
> >>>
> >>>
> >>> and 3 pools:
> >>>
> >>> ceph osd pool create red 64 64 replicated red
> >>>
> >>> ceph osd pool create blue 64 64 replicated blue
> >>>
> >>> ceph osd pool create blue 64 64 replicated green
> >>>
> >>>
> >>> [1]
> >>>
> http://docs.ceph.com/docs/master/rados/operations/crush-map/#primary-affinity
> '
> >>>
> >>>
> >>>
> >>> k
> >>>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users at lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181126/fb59128e/attachment.html>


More information about the ceph-users mailing list