[ceph-users] Huge latency spikes

John Petrini jpetrini at coredial.com
Tue Nov 20 04:12:58 PST 2018


I would disable cache on the controller for your journals. Use write
through and no read ahead. Did you make sure the disk cache is disabled?

On Tuesday, November 20, 2018, Alex Litvak <alexander.v.litvak at gmail.com>
wrote:
> I went through raid controller firmware update.  I replaced a pair  of
SSDs with new ones.  Nothing have changed.  Per controller card utility it
shows that no patrol reading happens and battery backup is in a good
shape.  Cache policy is WriteBack.  I am aware on the bad battery effect
but it doesn't seem to be the case unless controller is lying to me.
>
>
> On 11/19/2018 2:39 PM, Brendan Moloney wrote:
>>
>> Hi,
>>
>>> Raid card for journal disks is Perc H730 (Megaraid), RAID 1, battery
back cache is on
>>>
>>> Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache
if Bad BBU
>>> Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache
if Bad BBU
>>>
>>> I have  2 other nodes with older Perc H710 and similar SSDs with
slightly higher wear (6.3% vs 5.18%) but from observation they hardly hit
1.5 ms on rear occasion
>>> Cache, RAID, and battery situation is the same.
>>
>> I would take a closer look at the RAID card.  Are you sure the BBU is
ok?  In the past I noticed the Megaraid cards would do periodic battery
tests that would completely drain the battery and thus disable the write
cache until they reached some threshold of charge again.  They also can do
periodic "patrol reads" and "consistency checks" that can hurt performance.
Or the card could just be failing, I have almost gone through more RAID
cards than HDDs. The unreliability and black box nature of hardware RAID
cards is one of the things that first got me looking into Ceph (although
even mdadm is a big improvement in my opinion).
>>
>> For journals you are better off putting half your OSDs on one SSD and
half on the other instead of RAID1.
>>
>> -Brendan
>>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 


John Petrini
Platforms Engineer

[image: Call CoreDial] 215.297.4400 x 232 <215-297-4400>
[image: Call CoreDial] www.coredial.com <https://coredial.com/>
[image: CoreDial] 751 Arbor Way, Hillcrest I, Suite 150 Blue Bell, PA 19422
<https://www.google.com/maps/place/CoreDial,+LLC/@40.140902,-75.2878857,17z/data=!3m1!4b1!4m5!3m4!1s0x89c6bc587f1cfd47:0x4c79d505f2ee580b!8m2!3d40.140902!4d-75.285697>
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20181120/40ba5fcf/attachment.html>


More information about the ceph-users mailing list