[ceph-users] High osd cpu usage

Alon Avrahami alonavrahami.isr at gmail.com
Wed Nov 8 05:13:19 PST 2017


Hello Guys

We  have a fresh 'luminous'  (  12.2.0 )
(32ce2a3ae5239ee33d6150705cdb24d43bab910c)
luminous (rc)   ( installed using ceph-ansible )

the cluster contains 6 *  Intel  server board  S2600WTTR  (  96 osds and  3
mons )

We have 6 nodes  ( Intel server board  S2600WTTR ) , Mem - 64G , CPU
-> Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz , 32 cores .
Each server   has 16 * 1.6TB  Dell SSD drives ( SSDSC2BB016T7R )  , total
of 96 osds , 3 mons

The main usage  is rbd's for our  OpenStack environment ( Okata )

We're at the beginning of our production tests and it looks like the  osd's
are too busy although  we don't generate  too much iops at this stage (
almost nothing )
All ceph-osds using 50% of CPU usage and I can't figure out why are they so
busy :

top - 07:41:55 up 49 days,  2:54,  2 users,  load average: 6.85, 6.40, 6.37

Tasks: 518 total,   1 running, 517 sleeping,   0 stopped,   0 zombie
%Cpu(s): 14.8 us,  4.3 sy,  0.0 ni, 80.3 id,  0.0 wa,  0.0 hi,  0.6 si,
0.0 st
KiB Mem : 65853584 total, 23953788 free, 40342680 used,  1557116 buff/cache
KiB Swap:  3997692 total,  3997692 free,        0 used. 18020584 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
COMMAND
  36713 ceph      20   0 3869588 2.826g  28896 S  47.2  4.5   6079:20
ceph-osd
  53981 ceph      20   0 3998732 2.666g  28628 S  45.8  4.2   5939:28
ceph-osd
  55879 ceph      20   0 3707004 2.286g  28844 S  44.2  3.6   5854:29
ceph-osd
  46026 ceph      20   0 3631136 1.930g  29100 S  43.2  3.1   6008:50
ceph-osd
  39021 ceph      20   0 4091452 2.698g  28936 S  42.9  4.3   5687:39
ceph-osd
  47210 ceph      20   0 3598572 1.871g  29092 S  42.9  3.0   5759:19
ceph-osd
  52763 ceph      20   0 3843216 2.410g  28896 S  42.2  3.8   5540:11
ceph-osd
  49317 ceph      20   0 3794760 2.142g  28932 S  41.5  3.4   5872:24
ceph-osd
  42653 ceph      20   0 3915476 2.489g  28840 S  41.2  4.0   5605:13
ceph-osd
  41560 ceph      20   0 3460900 1.801g  28660 S  38.5  2.9   5128:01
ceph-osd
  50675 ceph      20   0 3590288 1.827g  28840 S  37.9  2.9   5196:58
ceph-osd
  37897 ceph      20   0 4034180 2.814g  29000 S  34.9  4.5   4789:10
ceph-osd
  50237 ceph      20   0 3379780 1.930g  28892 S  34.6  3.1   4846:36
ceph-osd
  48608 ceph      20   0 3893684 2.721g  28880 S  33.9  4.3   4752:43
ceph-osd
  40323 ceph      20   0 4227864 2.959g  28800 S  33.6  4.7   4712:36
ceph-osd
  44638 ceph      20   0 3656780 2.437g  28896 S  33.2  3.9   4793:58
ceph-osd
  61639 ceph      20   0  527512 114300  20988 S   2.7  0.2   2722:03
ceph-mgr
  31586 ceph      20   0  765672 304140  21816 S   0.7  0.5 409:06.09
ceph-mon
     68 root      20   0       0      0      0 S   0.3  0.0   3:09.69
ksoftirqd/12

strace  doesn't show anything suspicious

root at ecprdbcph10-opens:~# strace -p 36713
strace: Process 36713 attached
futex(0x563343c56764, FUTEX_WAIT_PRIVATE, 1, NUL

Ceph logs don't reveal anything?
Is this "normal" behavior in Luminous?
Looking out in older threads I can only find a thread about time gaps which
is not our case

Thanks,
Alon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171108/23a2d6e2/attachment.html>


More information about the ceph-users mailing list