[ceph-users] High osd cpu usage

Alon Avrahami alonavrahami.isr at gmail.com
Thu Nov 9 00:32:15 PST 2017


Hi,

Yes, im using bluestore.
there is no I/O on the ceph cluster. it's totally idle.
All the CPU usage are by OSD who don't have any workload on it.

Thanks!

On Thu, Nov 9, 2017 at 9:37 AM, Vy Nguyen Tan <vynt.kenshiro at gmail.com>
wrote:

> Hello,
>
> I think it not normal behavior in Luminous. I'm testing 3 nodes, each node
> have 3 x 1TB HDD, 1 SSD for wal + db, E5-2620 v3, 32GB of RAM, 10Gbps NIC.
>
> I use fio for  I/O performance measurements. When I ran "fio
> --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test
> --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw
> --rwmixread=75" I get % CPU each ceph-osd as shown bellow:
>
>    2452 ceph      20   0 2667088 1.813g  15724 S  22.8  5.8  34:41.02
> /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
>    2178 ceph      20   0 2872152 2.005g  15916 S  22.2  6.4  43:22.80
> /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
>    1820 ceph      20   0 2713428 1.865g  15064 S  13.2  5.9  34:19.56
> /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph
>
> Are you using bluestore? How many IOPS / disk throughput did you get with
> your cluster ?
>
>
> Regards,
>
> On Wed, Nov 8, 2017 at 8:13 PM, Alon Avrahami <alonavrahami.isr at gmail.com>
> wrote:
>
>> Hello Guys
>>
>> We  have a fresh 'luminous'  (  12.2.0 ) (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
>> luminous (rc)   ( installed using ceph-ansible )
>>
>> the cluster contains 6 *  Intel  server board  S2600WTTR  (  96 osds and
>> 3 mons )
>>
>> We have 6 nodes  ( Intel server board  S2600WTTR ) , Mem - 64G , CPU
>> -> Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz , 32 cores .
>> Each server   has 16 * 1.6TB  Dell SSD drives ( SSDSC2BB016T7R )  , total
>> of 96 osds , 3 mons
>>
>> The main usage  is rbd's for our  OpenStack environment ( Okata )
>>
>> We're at the beginning of our production tests and it looks like the
>> osd's are too busy although  we don't generate  too much iops at this stage
>> ( almost nothing )
>> All ceph-osds using 50% of CPU usage and I can't figure out why are they
>> so busy :
>>
>> top - 07:41:55 up 49 days,  2:54,  2 users,  load average: 6.85, 6.40,
>> 6.37
>>
>> Tasks: 518 total,   1 running, 517 sleeping,   0 stopped,   0 zombie
>> %Cpu(s): 14.8 us,  4.3 sy,  0.0 ni, 80.3 id,  0.0 wa,  0.0 hi,  0.6 si,
>> 0.0 st
>> KiB Mem : 65853584 total, 23953788 free, 40342680 used,  1557116
>> buff/cache
>> KiB Swap:  3997692 total,  3997692 free,        0 used. 18020584 avail Mem
>>
>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
>> COMMAND
>>   36713 ceph      20   0 3869588 2.826g  28896 S  47.2  4.5   6079:20
>> ceph-osd
>>   53981 ceph      20   0 3998732 2.666g  28628 S  45.8  4.2   5939:28
>> ceph-osd
>>   55879 ceph      20   0 3707004 2.286g  28844 S  44.2  3.6   5854:29
>> ceph-osd
>>   46026 ceph      20   0 3631136 1.930g  29100 S  43.2  3.1   6008:50
>> ceph-osd
>>   39021 ceph      20   0 4091452 2.698g  28936 S  42.9  4.3   5687:39
>> ceph-osd
>>   47210 ceph      20   0 3598572 1.871g  29092 S  42.9  3.0   5759:19
>> ceph-osd
>>   52763 ceph      20   0 3843216 2.410g  28896 S  42.2  3.8   5540:11
>> ceph-osd
>>   49317 ceph      20   0 3794760 2.142g  28932 S  41.5  3.4   5872:24
>> ceph-osd
>>   42653 ceph      20   0 3915476 2.489g  28840 S  41.2  4.0   5605:13
>> ceph-osd
>>   41560 ceph      20   0 3460900 1.801g  28660 S  38.5  2.9   5128:01
>> ceph-osd
>>   50675 ceph      20   0 3590288 1.827g  28840 S  37.9  2.9   5196:58
>> ceph-osd
>>   37897 ceph      20   0 4034180 2.814g  29000 S  34.9  4.5   4789:10
>> ceph-osd
>>   50237 ceph      20   0 3379780 1.930g  28892 S  34.6  3.1   4846:36
>> ceph-osd
>>   48608 ceph      20   0 3893684 2.721g  28880 S  33.9  4.3   4752:43
>> ceph-osd
>>   40323 ceph      20   0 4227864 2.959g  28800 S  33.6  4.7   4712:36
>> ceph-osd
>>   44638 ceph      20   0 3656780 2.437g  28896 S  33.2  3.9   4793:58
>> ceph-osd
>>   61639 ceph      20   0  527512 114300  20988 S   2.7  0.2   2722:03
>> ceph-mgr
>>   31586 ceph      20   0  765672 304140  21816 S   0.7  0.5 409:06.09
>> ceph-mon
>>      68 root      20   0       0      0      0 S   0.3  0.0   3:09.69
>> ksoftirqd/12
>>
>> strace  doesn't show anything suspicious
>>
>> root at ecprdbcph10-opens:~# strace -p 36713
>> strace: Process 36713 attached
>> futex(0x563343c56764, FUTEX_WAIT_PRIVATE, 1, NUL
>>
>> Ceph logs don't reveal anything?
>> Is this "normal" behavior in Luminous?
>> Looking out in older threads I can only find a thread about time gaps
>> which is not our case
>>
>> Thanks,
>> Alon
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171109/b1e991d7/attachment.html>


More information about the ceph-users mailing list