[ceph-users] Cephfs Hadoop Plugin and CEPH integration

Orit Wasserman owasserm at redhat.com
Wed Nov 29 07:55:42 PST 2017


On Wed, Nov 29, 2017 at 5:32 PM, Aristeu Gil Alves Jr
<aristeu.jr at gmail.com> wrote:
> Orit,
> As I mentioned, I have cephfs in production for almost two years.
> Can I use this installed filesystem or I need to start from scratch? If the
> first is true, is there any tutorial that you recommend on adding s3 on an
> installed base, or to ceph in general?

Radosgw is the service that provide s3 and swift compatible object storage.
You can use you existing ceph cluster (monitors and OSDS) but will
need to add the radosgw demon.
It will have it's one separate pools.

> Does s3 or swifta (for hadoop or spark) have integrated data-layout APIs for
> local processing data as have cephfs hadoop plugin?
With s3 and swift you won't have data locality as it was designed for
public cloud.
We recommend disable locality based scheduling in Hadoop when running
with those connectors.
There is on going work on to optimize those connectors to work with
object storage.
Hadoop community works on the s3a connector.
There is also https://github.com/SparkTC/stocator which is a swift
based connector IBM wrote  for their cloud.

> Sorry for my lack of knowledge on the matter. As I was exclusively a CephFS
> user, I didn't touch RGW yet. Gonna learn everything now. Any hint is going
> to be welcome.

CephFS is great and depending on your dataset and workload it maybe
the right storage for you :)


> Thanks and regards,
> Aristeu
> 2017-11-29 4:19 GMT-02:00 Orit Wasserman <owasserm at redhat.com>:
>> On Tue, Nov 28, 2017 at 7:26 PM, Aristeu Gil Alves Jr
>> <aristeu.jr at gmail.com> wrote:
>> > Greg and Donny,
>> >
>> > Thanks for the answers. It helped a lot!
>> >
>> > I just watched the swifta presentation and it looks quite good!
>> >
>> I would highly recommend using s3a and not swifta as it is much more
>> mature and is more used.
>> Cheers,
>> Orit
>> > Due the lack of updates/development, and the fact that we can choose
>> > spark
>> > also, I think maybe swift/swifta with ceph is a good strategy too.
>> > I need to study it more, tho.
>> >
>> > Can I get the same results (performance and integrated data-layout APIs)
>> > with it?
>> >
>> > Is there a migration cases/tutorials from a cephfs to a swift with ceph
>> > scenario that you could suggest?
>> >
>> > Best regards,
>> > --
>> > Aristeu
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users at lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >

More information about the ceph-users mailing list