[ceph-users] Huge latency spikes

Alex Litvak alexander.v.litvak at gmail.com
Sat Nov 17 13:07:43 PST 2018


John,

Thank you for suggestions:

I looked into journal SSDs.  It is close to 3 years old showing 5.17% of 
wear (352941GB Written to disk with 3.6 PB endurance specs over 5 years)

It could be that smart not telling all but that it what I see.

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE 
UPDATED  WHEN_FAILED RAW_VALUE
   5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age 
Always       -       0
   9 Power_On_Hours          0x0032   100   100   000    Old_age 
Always       -       29054
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
Always       -       4
170 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always 
       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always 
       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always 
       -       0
174 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always 
       -       3
175 Power_Loss_Cap_Test     0x0033   100   100   010    Pre-fail  Always 
       -       5130 (117 3127)
183 SATA_Downshift_Count    0x0032   100   100   000    Old_age   Always 
       -       0
184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always 
       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always 
       -       0
190 Temperature_Case        0x0022   074   064   000    Old_age   Always 
       -       26 (Min/Max 23/36)
192 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always 
       -       3
194 Temperature_Internal    0x0022   100   100   000    Old_age   Always 
       -       26
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always 
       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always 
       -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always 
       -       10518704
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always 
       -       5304
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always 
       -       0
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always 
       -       1743266
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always 
       -       0
233 Media_Wearout_Indicator 0x0032   095   095   000    Old_age   Always 
       -       0
234 Thermal_Throttle        0x0032   100   100   000    Old_age   Always 
       -       0/0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always 
       -       10518704
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always 
       -       6034

SMART Error Log Version: 1
No Errors Logged

How do you look at cstates?

On 11/17/2018 2:37 PM, John Petrini wrote:
> I'd take a look at cstates if it's only happening during periods of
> low activity. If your journals are on SSD you should also check their
> health. They may have exceeded their write endurance - high apply
> latency is a tell tale sign of this and you'd see high iowait on those
> disks.
> 




More information about the ceph-users mailing list