[ceph-users] Cephfs Hadoop Plugin and CEPH integration
Aristeu Gil Alves Jr
aristeu.jr at gmail.com
Wed Nov 29 08:52:01 PST 2017
> > Does s3 or swifta (for hadoop or spark) have integrated data-layout APIs
> > local processing data as have cephfs hadoop plugin?
> With s3 and swift you won't have data locality as it was designed for
> public cloud.
> We recommend disable locality based scheduling in Hadoop when running
> with those connectors.
> There is on going work on to optimize those connectors to work with
> object storage.
> Hadoop community works on the s3a connector.
> There is also https://github.com/SparkTC/stocator which is a swift
> based connector IBM wrote for their cloud.
Assuming this cases, how would be a mapreduce process without data
How the processors get the data? Still there's the need to split the data,
Doesn't it severely impact the performance of big files (not just the
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ceph-users