Skip to content

nbrmgrd/buffermgrd were killed due to use too much memory in latest master image #2840

@keboliu

Description

@keboliu

server task like buffermgrd/nbrngrd were killed due to use too much memory:

Apr 30 07:12:11.435319 mtbc-sonic-01-2410 WARNING kernel: [  429.852782] buffermgrd: page allocation stalls for 10908ms, order:0, mode:0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD)
Apr 30 07:12:11.435330 mtbc-sonic-01-2410 WARNING kernel: [  429.852794] CPU: 1 PID: 10760 Comm: buffermgrd Tainted: G           O    4.9.0-8-2-amd64 #1 Debian 4.9.110-3+deb9u6
Apr 30 07:12:11.435333 mtbc-sonic-01-2410 WARNING kernel: [  429.852796] Hardware name: Mellanox Technologies Ltd. MSN2410/VMOD0001, BIOS 4.6.5 05/31/2018
Apr 30 07:12:11.435336 mtbc-sonic-01-2410 WARNING kernel: [  429.852798]  0000000000000000 ffffffffa5b312c4 ffffffffa62020a8 ffffbcc181513b70
Apr 30 07:12:11.435338 mtbc-sonic-01-2410 WARNING kernel: [  429.852803]  ffffffffa5989cca 024201caa66e7d00 ffffffffa62020a8 ffffbcc181513b10
Apr 30 07:12:11.435340 mtbc-sonic-01-2410 WARNING kernel: [  429.852807]  0000000000000010 ffffbcc181513b80 ffffbcc181513b30 ff283237d4d60199
Apr 30 07:12:11.435342 mtbc-sonic-01-2410 WARNING kernel: [  429.852811] Call Trace:
Apr 30 07:12:11.435344 mtbc-sonic-01-2410 WARNING kernel: [  429.852819]  [<ffffffffa5b312c4>] ? dump_stack+0x5c/0x78
Apr 30 07:12:11.435346 mtbc-sonic-01-2410 WARNING kernel: [  429.852824]  [<ffffffffa5989cca>] ? warn_alloc+0x13a/0x160
Apr 30 07:12:11.435348 mtbc-sonic-01-2410 WARNING kernel: [  429.852828]  [<ffffffffa598a6f5>] ? __alloc_pages_slowpath+0x995/0xbf0
Apr 30 07:12:11.435350 mtbc-sonic-01-2410 WARNING kernel: [  429.852832]  [<ffffffffa598ab51>] ? __alloc_pages_nodemask+0x201/0x260
Apr 30 07:12:11.435352 mtbc-sonic-01-2410 WARNING kernel: [  429.852835]  [<ffffffffa59dbe11>] ? alloc_pages_current+0x91/0x140
Apr 30 07:12:11.435353 mtbc-sonic-01-2410 WARNING kernel: [  429.852838]  [<ffffffffa5983686>] ? filemap_fault+0x326/0x5d0
Apr 30 07:12:11.435355 mtbc-sonic-01-2410 WARNING kernel: [  429.852864]  [<ffffffffc03e3a01>] ? ext4_filemap_fault+0x31/0x50 [ext4]
Apr 30 07:12:11.435357 mtbc-sonic-01-2410 WARNING kernel: [  429.852867]  [<ffffffffa59b4267>] ? __do_fault+0x87/0x170
Apr 30 07:12:11.435358 mtbc-sonic-01-2410 WARNING kernel: [  429.852870]  [<ffffffffa59b8b58>] ? handle_mm_fault+0xe78/0x12b0
Apr 30 07:12:11.435360 mtbc-sonic-01-2410 WARNING kernel: [  429.852874]  [<ffffffffa5861245>] ? __do_page_fault+0x255/0x4f0
Apr 30 07:12:11.435361 mtbc-sonic-01-2410 WARNING kernel: [  429.852879]  [<ffffffffa5e11772>] ? schedule+0x32/0x80
Apr 30 07:12:11.435363 mtbc-sonic-01-2410 WARNING kernel: [  429.852882]  [<ffffffffa5e173d8>] ? page_fault+0x28/0x30
Apr 30 07:12:11.435365 mtbc-sonic-01-2410 WARNING kernel: [  429.852884] Mem-Info:
Apr 30 07:12:11.435367 mtbc-sonic-01-2410 WARNING kernel: [  429.852890] active_anon:1808490 inactive_anon:5483 isolated_anon:0
Apr 30 07:12:11.435368 mtbc-sonic-01-2410 WARNING kernel: [  429.852890]  active_file:39 inactive_file:63 isolated_file:7
Apr 30 07:12:11.435370 mtbc-sonic-01-2410 WARNING kernel: [  429.852890]  unevictable:0 dirty:0 writeback:0 unstable:0
Apr 30 07:12:11.435372 mtbc-sonic-01-2410 WARNING kernel: [  429.852890]  slab_reclaimable:4197 slab_unreclaimable:166808
Apr 30 07:12:11.435373 mtbc-sonic-01-2410 WARNING kernel: [  429.852890]  mapped:3279 shmem:5805 pagetables:6361 bounce:0
Apr 30 07:12:11.435375 mtbc-sonic-01-2410 WARNING kernel: [  429.852890]  free:25271 free_pcp:0 free_cma:0
Apr 30 07:12:11.435412 mtbc-sonic-01-2410 WARNING kernel: [  431.208880] nbrmgrd: page allocation stalls for 11988ms, order:0, mode:0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD)
Apr 30 07:12:11.435414 mtbc-sonic-01-2410 WARNING kernel: [  431.208892] CPU: 0 PID: 12524 Comm: nbrmgrd Tainted: G           O    4.9.0-8-2-amd64 #1 Debian 4.9.110-3+deb9u6
Apr 30 07:12:11.435416 mtbc-sonic-01-2410 WARNING kernel: [  431.208894] Hardware name: Mellanox Technologies Ltd. MSN2410/VMOD0001, BIOS 4.6.5 05/31/2018
Apr 30 07:12:11.435418 mtbc-sonic-01-2410 WARNING kernel: [  431.208897]  0000000000000000 ffffffffa5b312c4 ffffffffa62020a8 ffffbcc18203bb70
Apr 30 07:12:11.435420 mtbc-sonic-01-2410 WARNING kernel: [  431.208901]  ffffffffa5989cca 024201ca00000006 ffffffffa62020a8 ffffbcc18203bb10
Apr 30 07:12:11.435422 mtbc-sonic-01-2410 WARNING kernel: [  431.208905]  0000000000000010 ffffbcc18203bb80 ffffbcc18203bb30 c34bb573a79c6442
Apr 30 07:12:11.435424 mtbc-sonic-01-2410 WARNING kernel: [  431.208909] Call Trace:
Apr 30 07:12:11.435425 mtbc-sonic-01-2410 WARNING kernel: [  431.208918]  [<ffffffffa5b312c4>] ? dump_stack+0x5c/0x78
Apr 30 07:12:11.435427 mtbc-sonic-01-2410 WARNING kernel: [  431.208923]  [<ffffffffa5989cca>] ? warn_alloc+0x13a/0x160
Apr 30 07:12:11.435429 mtbc-sonic-01-2410 WARNING kernel: [  431.208927]  [<ffffffffa598a6f5>] ? __alloc_pages_slowpath+0x995/0xbf0
Apr 30 07:12:11.435430 mtbc-sonic-01-2410 WARNING kernel: [  431.208930]  [<ffffffffa59dbe11>] ? alloc_pages_current+0x91/0x140
Apr 30 07:12:11.435432 mtbc-sonic-01-2410 WARNING kernel: [  431.208934]  [<ffffffffa598ab51>] ? __alloc_pages_nodemask+0x201/0x260
Apr 30 07:12:11.435434 mtbc-sonic-01-2410 WARNING kernel: [  431.208937]  [<ffffffffa59dbe11>] ? alloc_pages_current+0x91/0x140
Apr 30 07:12:11.435435 mtbc-sonic-01-2410 WARNING kernel: [  431.208939]  [<ffffffffa5983686>] ? filemap_fault+0x326/0x5d0
Apr 30 07:12:11.435437 mtbc-sonic-01-2410 WARNING kernel: [  431.208966]  [<ffffffffc03e3a01>] ? ext4_filemap_fault+0x31/0x50 [ext4]
Apr 30 07:12:11.435438 mtbc-sonic-01-2410 WARNING kernel: [  431.208969]  [<ffffffffa59b4267>] ? __do_fault+0x87/0x170
Apr 30 07:12:11.435440 mtbc-sonic-01-2410 WARNING kernel: [  431.208972]  [<ffffffffa59b8b58>] ? handle_mm_fault+0xe78/0x12b0
Apr 30 07:12:11.435441 mtbc-sonic-01-2410 WARNING kernel: [  431.208975]  [<ffffffffa5a5155f>] ? ep_poll+0x32f/0x350
Apr 30 07:12:11.435443 mtbc-sonic-01-2410 WARNING kernel: [  431.208979]  [<ffffffffa5861245>] ? __do_page_fault+0x255/0x4f0
Apr 30 07:12:11.435445 mtbc-sonic-01-2410 WARNING kernel: [  431.208983]  [<ffffffffa5e173d8>] ? page_fault+0x28/0x30
Apr 30 07:12:11.435447 mtbc-sonic-01-2410 WARNING kernel: [  431.208985] Mem-Info:
Apr 30 07:12:11.435455 mtbc-sonic-01-2410 WARNING kernel: [  431.208992] active_anon:1808490 inactive_anon:5483 isolated_anon:0
Apr 30 07:12:11.435457 mtbc-sonic-01-2410 WARNING kernel: [  431.208992]  active_file:29 inactive_file:46 isolated_file:3
Apr 30 07:12:11.435459 mtbc-sonic-01-2410 WARNING kernel: [  431.208992]  unevictable:0 dirty:0 writeback:0 unstable:0
Apr 30 07:12:11.435460 mtbc-sonic-01-2410 WARNING kernel: [  431.208992]  slab_reclaimable:4194 slab_unreclaimable:166812
Apr 30 07:12:11.435462 mtbc-sonic-01-2410 WARNING kernel: [  431.208992]  mapped:3277 shmem:5805 pagetables:6361 bounce:0
Apr 30 07:12:11.435463 mtbc-sonic-01-2410 WARNING kernel: [  431.208992]  free:25266 free_pcp:0 free_cma:0

Steps to reproduce the issue:
can be easily observed with the latest image after switch start up

Describe the results you received:

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**
SONiC Software Version: SONiC.HEAD.956-ad2c1b2
Distribution: Debian 9.9
Kernel: 4.9.0-8-2-amd64
Build commit: ad2c1b2
Build date: Sun Apr 28 08:38:17 UTC 2019
Built by: johnar@jenkins-worker-3

Platform: x86_64-mlnx_msn2410-r0
HwSKU: ACS-MSN2410
ASIC: mellanox
Serial Number: MT1848K10623
Uptime: 10:34:23 up  3:29,  1 user,  load average: 2.73, 2.55, 2.47

Docker images:
REPOSITORY                 TAG                 IMAGE ID            SIZE
docker-dhcp-relay          HEAD.956-ad2c1b2    4e8db672e338        256MB
docker-dhcp-relay          latest              4e8db672e338        256MB
docker-fpm-quagga          HEAD.956-ad2c1b2    b5d608c9c504        281MB
docker-fpm-quagga          latest              b5d608c9c504        281MB
docker-syncd-mlnx-rpc      HEAD.956-ad2c1b2    e59e740d124e        617MB
docker-syncd-mlnx-rpc      latest              e59e740d124e        617MB
docker-teamd               HEAD.956-ad2c1b2    184617224403        300MB
docker-teamd               latest              184617224403        300MB
docker-sonic-telemetry     HEAD.956-ad2c1b2    0b0b9f93bfd5        300MB
docker-sonic-telemetry     latest              0b0b9f93bfd5        300MB
docker-snmp-sv2            HEAD.956-ad2c1b2    dd7f19154668        317MB
docker-snmp-sv2            latest              dd7f19154668        317MB
docker-router-advertiser   HEAD.956-ad2c1b2    1e0782909d55        279MB
docker-router-advertiser   latest              1e0782909d55        279MB
docker-platform-monitor    HEAD.956-ad2c1b2    98dffb8c7efa        324MB
docker-platform-monitor    latest              98dffb8c7efa        324MB
docker-orchagent           HEAD.956-ad2c1b2    6f752e1afab4        319MB
docker-orchagent           latest              6f752e1afab4        319MB
docker-lldp-sv2            HEAD.956-ad2c1b2    199b4184f107        298MB
docker-lldp-sv2            latest              199b4184f107        298MB
docker-database            HEAD.956-ad2c1b2    6ff3d687146f        280MB
docker-database            latest              6ff3d687146f        280MB
**Attach debug file `sudo generate_dump`:**

syslog.zip
(paste your output here)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions