[ceph-users] High osd cpu usage

Vy Nguyen Tan vynt.kenshiro at gmail.com
Wed Nov 8 23:37:25 PST 2017


Hello,

I think it not normal behavior in Luminous. I'm testing 3 nodes, each node
have 3 x 1TB HDD, 1 SSD for wal + db, E5-2620 v3, 32GB of RAM, 10Gbps NIC.

I use fio for  I/O performance measurements. When I ran "fio --randrepeat=1
--ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test
--bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75" I get %
CPU each ceph-osd as shown bellow:

   2452 ceph      20   0 2667088 1.813g  15724 S  22.8  5.8  34:41.02
/usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
   2178 ceph      20   0 2872152 2.005g  15916 S  22.2  6.4  43:22.80
/usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
   1820 ceph      20   0 2713428 1.865g  15064 S  13.2  5.9  34:19.56
/usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph

Are you using bluestore? How many IOPS / disk throughput did you get with
your cluster ?


Regards,

On Wed, Nov 8, 2017 at 8:13 PM, Alon Avrahami <alonavrahami.isr at gmail.com>
wrote:

> Hello Guys
>
> We  have a fresh 'luminous'  (  12.2.0 ) (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
> luminous (rc)   ( installed using ceph-ansible )
>
> the cluster contains 6 *  Intel  server board  S2600WTTR  (  96 osds and
> 3 mons )
>
> We have 6 nodes  ( Intel server board  S2600WTTR ) , Mem - 64G , CPU
> -> Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz , 32 cores .
> Each server   has 16 * 1.6TB  Dell SSD drives ( SSDSC2BB016T7R )  , total
> of 96 osds , 3 mons
>
> The main usage  is rbd's for our  OpenStack environment ( Okata )
>
> We're at the beginning of our production tests and it looks like the
> osd's are too busy although  we don't generate  too much iops at this stage
> ( almost nothing )
> All ceph-osds using 50% of CPU usage and I can't figure out why are they
> so busy :
>
> top - 07:41:55 up 49 days,  2:54,  2 users,  load average: 6.85, 6.40, 6.37
>
> Tasks: 518 total,   1 running, 517 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 14.8 us,  4.3 sy,  0.0 ni, 80.3 id,  0.0 wa,  0.0 hi,  0.6 si,
> 0.0 st
> KiB Mem : 65853584 total, 23953788 free, 40342680 used,  1557116 buff/cache
> KiB Swap:  3997692 total,  3997692 free,        0 used. 18020584 avail Mem
>
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
> COMMAND
>   36713 ceph      20   0 3869588 2.826g  28896 S  47.2  4.5   6079:20
> ceph-osd
>   53981 ceph      20   0 3998732 2.666g  28628 S  45.8  4.2   5939:28
> ceph-osd
>   55879 ceph      20   0 3707004 2.286g  28844 S  44.2  3.6   5854:29
> ceph-osd
>   46026 ceph      20   0 3631136 1.930g  29100 S  43.2  3.1   6008:50
> ceph-osd
>   39021 ceph      20   0 4091452 2.698g  28936 S  42.9  4.3   5687:39
> ceph-osd
>   47210 ceph      20   0 3598572 1.871g  29092 S  42.9  3.0   5759:19
> ceph-osd
>   52763 ceph      20   0 3843216 2.410g  28896 S  42.2  3.8   5540:11
> ceph-osd
>   49317 ceph      20   0 3794760 2.142g  28932 S  41.5  3.4   5872:24
> ceph-osd
>   42653 ceph      20   0 3915476 2.489g  28840 S  41.2  4.0   5605:13
> ceph-osd
>   41560 ceph      20   0 3460900 1.801g  28660 S  38.5  2.9   5128:01
> ceph-osd
>   50675 ceph      20   0 3590288 1.827g  28840 S  37.9  2.9   5196:58
> ceph-osd
>   37897 ceph      20   0 4034180 2.814g  29000 S  34.9  4.5   4789:10
> ceph-osd
>   50237 ceph      20   0 3379780 1.930g  28892 S  34.6  3.1   4846:36
> ceph-osd
>   48608 ceph      20   0 3893684 2.721g  28880 S  33.9  4.3   4752:43
> ceph-osd
>   40323 ceph      20   0 4227864 2.959g  28800 S  33.6  4.7   4712:36
> ceph-osd
>   44638 ceph      20   0 3656780 2.437g  28896 S  33.2  3.9   4793:58
> ceph-osd
>   61639 ceph      20   0  527512 114300  20988 S   2.7  0.2   2722:03
> ceph-mgr
>   31586 ceph      20   0  765672 304140  21816 S   0.7  0.5 409:06.09
> ceph-mon
>      68 root      20   0       0      0      0 S   0.3  0.0   3:09.69
> ksoftirqd/12
>
> strace  doesn't show anything suspicious
>
> root at ecprdbcph10-opens:~# strace -p 36713
> strace: Process 36713 attached
> futex(0x563343c56764, FUTEX_WAIT_PRIVATE, 1, NUL
>
> Ceph logs don't reveal anything?
> Is this "normal" behavior in Luminous?
> Looking out in older threads I can only find a thread about time gaps
> which is not our case
>
> Thanks,
> Alon
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171109/c60c5d9d/attachment.html>


More information about the ceph-users mailing list