[ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

Marc Roos M.Roos at f1-outsourcing.eu
Sun Nov 11 02:55:10 PST 2018



I just did very very short test and don’t see any difference with this 
cache on or off, so I am leaving it on for now. 





-----Original Message-----
From: Ashley Merrick [mailto:singapore at amerrick.co.uk] 
Sent: zondag 11 november 2018 11:43
To: Marc Roos
Cc: ceph-users; vitalif
Subject: Re: [ceph-users] Disabling write cache on SATA HDDs reduces 
write latency 7 times

Don’t have any SSD in the cluster to test.

Also without knowing the exact reason why it being enabled has such a 
negative effect I wouldn’t be sure if also would be the same on SSD’s.

On Sun, 11 Nov 2018 at 6:41 PM, Marc Roos <M.Roos at f1-outsourcing.eu> 
wrote:


	 
	
	Does it make sense to test disabling this on hdd cluster only?
	
	
	-----Original Message-----
	From: Ashley Merrick [mailto:singapore at amerrick.co.uk] 
	Sent: zondag 11 november 2018 6:24
	To: vitalif at yourcmc.ru
	Cc: ceph-users at lists.ceph.com
	Subject: Re: [ceph-users] Disabling write cache on SATA HDDs 
reduces 
	write latency 7 times
	
	I've just worked out I had the same issue, been trying to work out 
the 
	cause for the past few days!
	
	However I am using brand new enterprise Toshiba drivers with 256MB 
write 
	cache, was seeing I/O wait peaks of 40% even during a small writing 

	operation to CEPH and commit / apply latency's in the 40ms+.
	
	Just went through and disabled the write cache on each drive, and 
done a 
	few tests with the exact same write performance, but I/O wait in 
the <1% 
	and commit / apply latency's in the 1-3ms max.
	
	Something somewhere definitely doesn't seem to like the write cache 

	being enabled on the disks, this is a EC Pool in the latest Mimic 
	version.
	
	On Sun, Nov 11, 2018 at 5:34 AM Vitaliy Filippov 
<vitalif at yourcmc.ru> 
	wrote:
	
	
	        Hi
	
	        A weird thing happens in my test cluster made from desktop 
	hardware.
	
	        The command `for i in /dev/sd?; do hdparm -W 0 $i; done` 
increases  
	
	        single-thread write iops (reduces latency) 7 times!
	
	        It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 
7200rpm HDDs + 
	1x  
	        SATA desktop SSD for system and ceph-mon + 1x SATA server 
SSD for  
	        block.db/wal in each host. Hosts are linked by 10gbit 
ethernet (not 
	the  
	        fastest one though, average RTT according to flood-ping is 
	0.098ms). Ceph  
	        and OpenNebula are installed on the same hosts, OSDs are 
prepared 
	with  
	        ceph-volume and bluestore with default options. SSDs have 
	capacitors  
	        ('power-loss protection'), write cache is turned off for 
them since 
	the  
	        very beginning (hdparm -W 0 /dev/sdb). They're quite old, 
but each 
	of them  
	        is capable of delivering ~22000 iops in journal mode (fio 
-sync=1  
	        -direct=1 -iodepth=1 -bs=4k -rw=write).
	
	        However, RBD single-threaded random-write benchmark 
originally gave 
	awful  
	        results - when testing with `fio -ioengine=libaio -size=10G 
-sync=1 
	
	        -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite 
-runtime=60  
	        -filename=./testfile` from inside a VM, the result was only 
58 iops 
	
	        average (17ms latency). This was not what I expected from 
the 
	HDD+SSD  
	        setup.
	
	        But today I tried to play with cache settings for data 
disks. And I 
	was  
	        really surprised to discover that just disabling HDD write 
cache 
	(hdparm  
	        -W 0 /dev/sdX for all HDD devices) increases 
single-threaded 
	performance  
	        ~7 times! The result from the same VM (without even 
rebooting it) 
	is  
	        iops=405, avg lat=2.47ms. That's a magnitude faster and in 
fact 
	2.5ms  
	        seems sort of an expected number.
	
	        As I understand 4k writes are always deferred at the 
default 
	setting of  
	        prefer_deferred_size_hdd=32768, this means they should only 
get 
	written to  
	        the journal device before OSD acks the write operation.
	
	        So my question is WHY? Why does HDD write cache affect 
commit 
	latency with  
	        WAL on an SSD?
	
	        I would also appreciate if anybody with similar setup 
(HDD+SSD with 
	
	        desktop SATA controllers or HBA) could test the same 
thing...
	
	        -- 
	        With best regards,
	           Vitaliy Filippov
	        _______________________________________________
	        ceph-users mailing list
	        ceph-users at lists.ceph.com
	        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
	
	
	
	




More information about the ceph-users mailing list