[ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

Ashley Merrick singapore at amerrick.co.uk
Sat Nov 10 21:24:29 PST 2018


I've just worked out I had the same issue, been trying to work out the
cause for the past few days!

However I am using brand new enterprise Toshiba drivers with 256MB write
cache, was seeing I/O wait peaks of 40% even during a small writing
operation to CEPH and commit / apply latency's in the 40ms+.

Just went through and disabled the write cache on each drive, and done a
few tests with the exact same write performance, but I/O wait in the <1%
and commit / apply latency's in the 1-3ms max.

Something somewhere definitely doesn't seem to like the write cache being
enabled on the disks, this is a EC Pool in the latest Mimic version.

On Sun, Nov 11, 2018 at 5:34 AM Vitaliy Filippov <vitalif at yourcmc.ru> wrote:

> Hi
>
> A weird thing happens in my test cluster made from desktop hardware.
>
> The command `for i in /dev/sd?; do hdparm -W 0 $i; done` increases
> single-thread write iops (reduces latency) 7 times!
>
> It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 7200rpm HDDs + 1x
> SATA desktop SSD for system and ceph-mon + 1x SATA server SSD for
> block.db/wal in each host. Hosts are linked by 10gbit ethernet (not the
> fastest one though, average RTT according to flood-ping is 0.098ms). Ceph
> and OpenNebula are installed on the same hosts, OSDs are prepared with
> ceph-volume and bluestore with default options. SSDs have capacitors
> ('power-loss protection'), write cache is turned off for them since the
> very beginning (hdparm -W 0 /dev/sdb). They're quite old, but each of
> them
> is capable of delivering ~22000 iops in journal mode (fio -sync=1
> -direct=1 -iodepth=1 -bs=4k -rw=write).
>
> However, RBD single-threaded random-write benchmark originally gave awful
> results - when testing with `fio -ioengine=libaio -size=10G -sync=1
> -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60
> -filename=./testfile` from inside a VM, the result was only 58 iops
> average (17ms latency). This was not what I expected from the HDD+SSD
> setup.
>
> But today I tried to play with cache settings for data disks. And I was
> really surprised to discover that just disabling HDD write cache (hdparm
> -W 0 /dev/sdX for all HDD devices) increases single-threaded performance
> ~7 times! The result from the same VM (without even rebooting it) is
> iops=405, avg lat=2.47ms. That's a magnitude faster and in fact 2.5ms
> seems sort of an expected number.
>
> As I understand 4k writes are always deferred at the default setting of
> prefer_deferred_size_hdd=32768, this means they should only get written
> to
> the journal device before OSD acks the write operation.
>
> So my question is WHY? Why does HDD write cache affect commit latency
> with
> WAL on an SSD?
>
> I would also appreciate if anybody with similar setup (HDD+SSD with
> desktop SATA controllers or HBA) could test the same thing...
>
> --
> With best regards,
>    Vitaliy Filippov
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181111/900d8fd9/attachment.html>


More information about the ceph-users mailing list