[ceph-users] Cephfs Hadoop Plugin and CEPH integration

Aristeu Gil Alves Jr aristeu.jr at gmail.com
Wed Nov 29 08:52:01 PST 2017

> > Does s3 or swifta (for hadoop or spark) have integrated data-layout APIs
> for
> > local processing data as have cephfs hadoop plugin?
> >
> With s3 and swift you won't have data locality as it was designed for
> public cloud.
> We recommend disable locality based scheduling in Hadoop when running
> with those connectors.
> There is on going work on to optimize those connectors to work with
> object storage.
> Hadoop community works on the s3a connector.
> There is also https://github.com/SparkTC/stocator which is a swift
> based connector IBM wrote  for their cloud.

Assuming this cases, how would be a mapreduce process without data
How the processors get the data? Still there's the need to split the data,
Doesn't it severely impact the performance of big files (not just the

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171129/6dde8829/attachment.html>

More information about the ceph-users mailing list