[ceph-users] OSD killed by OOM when many cache available

Eric Nelson ericnelson at gmail.com
Fri Nov 17 19:10:04 PST 2017


As I understand it the memory would show up in /proc/meminfo or vmstat as
cached, however since it's being used for the page cache the kernel cannot
allocate it at the time of oom-killer invocation since it's paged out
(likely for cached reads/writes to your storage device). You could call #
sync ; echo 3 > /proc/sys/vm/drop_caches to immediately free up the memory
but that's interactive. Upping cache pressure and min_free_kbytes makes the
kernel keep things at bay and the system usable.

I had some way of monitoring this but forget since it's been quite a few
fires to fight since then! If I find it I'll send it your way.

Cheers,
E

On Fri, Nov 17, 2017 at 6:03 PM, Sam Huracan <nowitzki.sammy at gmail.com>
wrote:

> @Eric: How can I check status of fscache? Why can it be root cause?
>
> Thanks
>
> 2017-11-18 7:30 GMT+07:00 Eric Nelson <ericnelson at gmail.com>:
>
>> One thing that doesn't show up is fs cache, which is likely the cause
>> here. We went through this on our SSDs and had to add the following to stop
>> the crashes. I believe vm.vfs_cache_pressure and min_free_kbytes were the
>> really helpful things in getting the crashes to stop. HTH!
>>
>> sysctl_param 'vm.vfs_cache_pressure' do
>>
>>   value 400
>>
>> end
>>
>> sysctl_param 'vm.dirty_ratio' do
>>
>>   value 20
>>
>> end
>>
>> sysctl_param 'vm.dirty_background_ratio' do
>>
>>   value 2
>>
>> end
>>
>> sysctl_param 'vm.min_free_kbytes' do
>>
>>   value 4194304
>>
>> end
>>
>>
>>
>> On Fri, Nov 17, 2017 at 4:24 PM, Sam Huracan <nowitzki.sammy at gmail.com>
>> wrote:
>>
>>> I see some more logs about memory in syslog:
>>>
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553749] Node 0 DMA free:14828kB
>>> min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB
>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>> isolated(file):0kB present:15996kB managed:15896kB mlocked:0kB dirty:0kB
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
>>> slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB
>>> pages_scanned:0 all_unreclaimable? yes
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553753] lowmem_reserve[]: 0 1830
>>> 32011 32011 32011
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553755] Node 0 DMA32
>>> free:134848kB min:3860kB low:4824kB high:5788kB active_anon:120628kB
>>> inactive_anon:121792kB active_file:653404kB inactive_file:399272kB
>>> unevictable:0kB isolated(anon):0kB isolated(file):36kB present:1981184kB
>>> managed:1900752kB mlocked:0kB dirty:344kB writeback:0kB mapped:1200kB
>>> shmem:0kB slab_reclaimable:239900kB slab_unreclaimable:154560kB
>>> kernel_stack:43376kB pagetables:1176kB unstable:0kB bounce:0kB free_pcp:0kB
>>> local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
>>> all_unreclaimable? no
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553758] lowmem_reserve[]: 0 0
>>> 30180 30180 30180
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553760] Node 0 Normal
>>> free:237232kB min:63684kB low:79604kB high:95524kB active_anon:3075228kB
>>> inactive_anon:629052kB active_file:12544336kB inactive_file:12570716kB
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB present:31457280kB
>>> managed:30905084kB mlocked:0kB dirty:74368kB writeback:3796kB
>>> mapped:48516kB shmem:1276kB slab_reclaimable:713684kB
>>> slab_unreclaimable:416404kB kernel_stack:40896kB pagetables:25288kB
>>> unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553763] lowmem_reserve[]: 0 0 0 0
>>> 0
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553765] Node 0 DMA: 1*4kB (U)
>>> 1*8kB (U) 0*16kB 1*32kB (U) 3*64kB (U) 2*128kB (U) 0*256kB 0*512kB 0*1024kB
>>> 1*2048kB (M) 3*4096kB (M) = 14828kB
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553772] Node 0 DMA32: 10756*4kB
>>> (UME) 11442*8kB (UME) 25*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
>>> 0*1024kB 0*2048kB 0*4096kB = 134960kB
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553777] Node 0 Normal: 59473*4kB
>>> (UME) 77*8kB (U) 10*16kB (H) 8*32kB (H) 6*64kB (H) 2*128kB (H) 1*256kB (H)
>>> 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 240332kB
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553784] Node 0 hugepages_total=0
>>> hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553784] Node 0 hugepages_total=0
>>> hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553785] 6544612 total pagecache
>>> pages
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553786] 2347 pages in swap cache
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553787] Swap cache stats: add
>>> 2697126, delete 2694779, find 38874122/39241548
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553788] Free swap  = 61498092kB
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553788] Total swap = 62498812kB
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553789] 8363615 pages RAM
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553790] 0 pages
>>> HighMem/MovableOnly
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553790] 158182 pages reserved
>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553791] 0 pages cma reserved
>>>
>>> Is it relate to page caches?
>>>
>>> 2017-11-18 7:22 GMT+07:00 Sam Huracan <nowitzki.sammy at gmail.com>:
>>>
>>>> Today, one of our Ceph OSDs was down, I've check syslog and see this
>>>> OSD process was killed by OMM
>>>>
>>>>
>>>> Nov 17 10:01:06 ceph1 kernel: [2807926.762304] Out of memory: Kill
>>>> process 3330 (ceph-osd) score 7 or sacrifice child
>>>> Nov 17 10:01:06 ceph1 kernel: [2807926.763745] Killed process 3330
>>>> (ceph-osd) total-vm:2372392kB, anon-rss:559084kB, file-rss:7268kB
>>>> Nov 17 10:01:06 ceph1 kernel: [2807926.985830] init: ceph-osd (ceph/6)
>>>> main process (3330) killed by KILL signal
>>>> Nov 17 10:01:06 ceph1 kernel: [2807926.985844] init: ceph-osd (ceph/6)
>>>> main process ended, respawning
>>>> Nov 17 10:03:39 ceph1 bash: root [15524]: sudo ceph health detail [0]
>>>> Nov 17 10:03:40 ceph1 bash: root [15524]: sudo ceph health detail [0]
>>>> Nov 17 10:17:01 ceph1 CRON[75167]: (root) CMD (   cd / && run-parts
>>>> --report /etc/cron.hourly)
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553690] ceph-status.sh invoked
>>>> oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553693] ceph-status.sh cpuset=/
>>>> mems_allowed=0
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553697] CPU: 5 PID: 194271 Comm:
>>>> ceph-status.sh Not tainted 4.4.0-62-generic #83~14.04.1-Ubuntu
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553698] Hardware name: Dell Inc.
>>>> PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553699]  0000000000000000
>>>> ffff88001a857b38 ffffffff813dc4ac ffff88001a857cf0
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553701]  0000000000000000
>>>> ffff88001a857bc8 ffffffff811fb066 ffff88083d29e200
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553703]  ffff88001a857cf0
>>>> ffff88001a857c00 ffff8808453cf000 0000000000000000
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553704] Call Trace:
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553709]  [<ffffffff813dc4ac>]
>>>> dump_stack+0x63/0x87
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553713]  [<ffffffff811fb066>]
>>>> dump_header+0x5b/0x1d5
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553717]  [<ffffffff81188b75>]
>>>> oom_kill_process+0x205/0x3d0
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553718]  [<ffffffff8118857e>] ?
>>>> oom_unkillable_task+0x9e/0xc0
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553720]  [<ffffffff811891ab>]
>>>> out_of_memory+0x40b/0x460
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553722]  [<ffffffff811fbb1f>]
>>>> __alloc_pages_slowpath.constprop.87+0x742/0x7ad
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553725]  [<ffffffff8118e1a7>]
>>>> __alloc_pages_nodemask+0x237/0x240
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553727]  [<ffffffff8118e36d>]
>>>> alloc_kmem_pages_node+0x4d/0xd0
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553730]  [<ffffffff8107c125>]
>>>> copy_process+0x185/0x1ce0
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553732]  [<ffffffff813320f3>] ?
>>>> security_file_alloc+0x33/0x50
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553734]  [<ffffffff8107de1a>]
>>>> _do_fork+0x8a/0x310
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553737]  [<ffffffff8108dc01>] ?
>>>> sigprocmask+0x51/0x80
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553738]  [<ffffffff8107e149>]
>>>> SyS_clone+0x19/0x20
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553743]  [<ffffffff81802d36>]
>>>> entry_SYSCALL_64_fastpath+0x16/0x75
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553744] Mem-Info:
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553747] active_anon:798964
>>>> inactive_anon:187711 isolated_anon:0
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553747]  active_file:3299435
>>>> inactive_file:3242497 isolated_file:9
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553747]  unevictable:0
>>>> dirty:18678 writeback:949 unstable:0
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553747]  slab_reclaimable:238396
>>>> slab_unreclaimable:142741
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553747]  mapped:12429 shmem:319
>>>> pagetables:6616 bounce:0
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553747]  free:96727 free_pcp:0
>>>> free_cma:0
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553749] Node 0 DMA free:14828kB
>>>> min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB
>>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>>> isolated(file):0kB present:15996kB managed:15896kB mlocked:0kB dirty:0kB
>>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
>>>> slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
>>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB
>>>> pages_scanned:0 all_unreclaimable? yes
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553753] lowmem_reserve[]: 0 1830
>>>> 32011 32011 32011
>>>> Nov 17 10:47:17 ceph1 kernel: [2810698.553755] Node 0 DMA32
>>>> free:134848kB min:3860kB low:4824kB high:5788kB active_anon:120628kB
>>>> inactive_anon:121792kB active_file:653404kB inactive_file:399272kB
>>>> unevictable:0kB isolated(anon):0kB isolated(file):36kB present:1981184kB
>>>> managed:1900752kB mlocked:0kB dirty:344kB writeback:0kB mapped:1200kB
>>>> shmem:0kB slab_reclaimable:239900kB slab_unreclaimable:154560kB
>>>> kernel_stack:43376kB pagetables:1176kB unstable:0kB bounce:0kB free_pcp:0kB
>>>> local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
>>>> all_unreclaimable? no
>>>>
>>>>
>>>> We check free Memory, Cache is free 24,705 MB
>>>>
>>>> root at ceph1:/var/log# free -m
>>>>              total       used       free     shared    buffers
>>>>  cached
>>>> Mem:         32052      31729        323          1        125
>>>> 24256
>>>> -/+ buffers/cache:       7347      24705
>>>> Swap:        61033        120      60913
>>>>
>>>>
>>>>
>>>> It's so weird. Could you help me solve this problem, I'm afraid it 'll
>>>> come again.
>>>>
>>>> Thanks in advance.
>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171117/8327cd40/attachment.html>


More information about the ceph-users mailing list