[ceph-users] hardware heterogeneous in same pool

Jonathan D. Proulx jon at csail.mit.edu
Wed Oct 3 15:34:36 PDT 2018


On Wed, Oct 03, 2018 at 07:09:30PM -0300, Bruno Carvalho wrote:
:Hi Cephers, I would like to know how you are growing the cluster.
:
:Using dissimilar hardware in the same pool or creating a pool for each
:different hardware group.
:
:What problem would I have many problems using different hardware (CPU,
:memory, disk) in the same pool?

I've been growing with new hardware in old pools.

Due to the way RDB gets smeared across the disks your performance is
almost always bottle necked by slowest storage location.

If you're just adding slightly newer slightly faster hardware this is
OK as most of the performance gain in that case is from spreading
wider not so much the individual drive performance.

But if you are adding a faster technology like going from
spinning disk to ssd you do want to think about how to transition.

I recently added SSD to a previously all HDD cluster (well HDD data
with SSD WAL/DB).  For this I did fiddle with crush rules. First I made
the existing rules require HDD class devices which shoudl have been a
noop in my mind but actually moved 90% of my data.  The folks at CERN
made a similar discovery before me and even (I think worked out a way
to avoid it) see
http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-June/000113.html

After that I made new rules that took on SSD andtwo HDD for each
replica set (in addtion to spreading across racks or servers or what
ever) and after applying the new rule to the pools I use for Nova
ephemeral storage and Cinder Volumes I set the SSD OSDs to have high
"primary affinity" and the HDDs to have low "primary affinity".

In the end this means the SSDs server reads and writes while writes to
the HDD replicas are buffered by the SSD WAL so both reads and write
are relatively fast (we'd previouslyy been suffering on reads due to
IO load).

I left  Glance images on HDD only as those don't require much
performance in my world, same with RGW object storage though for soem
that may be performance sensitive.

The plan forward is more SSD to replace HDD, probbably by first
getting enough to transition ephemeral dirves, then a set to move
block storage, then the rest over next year or two.

The mixed SSD/HDD was a big win for us though so we're happy with that
for now.

scale matters with this so we have:
245 OSDs in 12 servers
627 TiB RAW storage (267 TiB used)
19.44 M objects


hope that helps,
-Jon


More information about the ceph-users mailing list