Skip to content

osrm-routed being killed even with high memory #964

@marinho

Description

@marinho

Hi, I have an EC2 machine setup on type m3.xlarge, which means 15Gb memory, plus 38Gb in swap partition, which I intend to run one instance of osrm-routed for each continent.

The problem is that with only Africa's instance running (the smallest continent), when I start Europe (the biggest), it is killed when "loading edge information".

This is what I get in syslog:

Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951371] Call Trace:
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951380]  [<ffffffff81119df1>] dump_header+0x91/0xe0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951383]  [<ffffffff8111a175>] oom_kill_process+0x85/0xb0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951386]  [<ffffffff8111a51a>] out_of_memory+0xfa/0x220
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951391]  [<ffffffff8111fef3>] __alloc_pages_nodemask+0x8c3/0x8e0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951397]  [<ffffffff81157026>] alloc_pages_current+0xb6/0x120
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951402]  [<ffffffff81116d17>] __page_cache_alloc+0xb7/0xd0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951406]  [<ffffffff81118ce2>] filemap_fault+0x212/0x3c0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951410]  [<ffffffff811393c2>] __do_fault+0x72/0x550
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951413]  [<ffffffff8113ca7a>] handle_pte_fault+0xfa/0x200
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951417]  [<ffffffff810063fe>] ? xen_pmd_val+0xe/0x10
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951421]  [<ffffffff81005379>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951424]  [<ffffffff8113dd59>] handle_mm_fault+0x269/0x370
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951430]  [<ffffffff8165e684>] do_page_fault+0x184/0x550
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951433]  [<ffffffff81004dd2>] ? xen_mc_flush+0xb2/0x1c0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951435]  [<ffffffff8100478d>] ? xen_clts+0x8d/0x190
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951439]  [<ffffffff8165b2b5>] page_fault+0x25/0x30
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951441] Mem-Info:
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951443] Node 0 DMA per-cpu:
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951445] CPU    0: hi:    0, btch:   1 usd:   0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951447] CPU    1: hi:    0, btch:   1 usd:   0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951449] CPU    2: hi:    0, btch:   1 usd:   0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951450] CPU    3: hi:    0, btch:   1 usd:   0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951452] Node 0 DMA32 per-cpu:
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951454] CPU    0: hi:  186, btch:  31 usd:   0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951455] CPU    1: hi:  186, btch:  31 usd:   0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951457] CPU    2: hi:  186, btch:  31 usd:   0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951459] CPU    3: hi:  186, btch:  31 usd:   0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951460] Node 0 Normal per-cpu:
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951462] CPU    0: hi:  186, btch:  31 usd:   0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951464] CPU    1: hi:  186, btch:  31 usd:   0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951466] CPU    2: hi:  186, btch:  31 usd:   2
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951468] CPU    3: hi:  186, btch:  31 usd:   0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951471] active_anon:3049 inactive_anon:446 isolated_anon:91
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951472]  active_file:0 inactive_file:35 isolated_file:18
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951473]  unevictable:3752633 dirty:0 writeback:461 unstable:0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951474]  free:16933 slab_reclaimable:2144 slab_unreclaimable:2419
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951475]  mapped:7758 shmem:8 pagetables:8193 bounce:0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951477] Node 0 DMA free:7880kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:7624kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951486] lowmem_reserve[]: 0 4016 15112 15112
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951489] Node 0 DMA32 free:48468kB min:4176kB low:5220kB high:6264kB active_anon:0kB inactive_anon:28kB active_file:0kB inactive_file:0kB unevictable:3990544kB isolated(anon):0kB isolated(file):0kB present:4112640kB mlocked:3990544kB dirty:0kB writeback:28kB mapped:4kB shmem:0kB slab_reclaimable:16kB slab_unreclaimable:48kB kernel_stack:0kB pagetables:7784kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1121 all_unreclaimable? yes
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951498] lowmem_reserve[]: 0 0 11095 11095
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951502] Node 0 Normal free:11384kB min:11548kB low:14432kB high:17320kB active_anon:12196kB inactive_anon:1756kB active_file:0kB inactive_file:140kB unevictable:11019988kB isolated(anon):364kB isolated(file):72kB present:11362176kB mlocked:11019980kB dirty:0kB writeback:1816kB mapped:31028kB shmem:32kB slab_reclaimable:8560kB slab_unreclaimable:9628kB kernel_stack:832kB pagetables:24988kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:696 all_unreclaimable? no
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951511] lowmem_reserve[]: 0 0 0 0
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951514] Node 0 DMA: 2*4kB 2*8kB 1*16kB 3*32kB 3*64kB 3*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 0*4096kB = 7880kB
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951523] Node 0 DMA32: 3*4kB 10*8kB 11*16kB 10*32kB 6*64kB 5*128kB 3*256kB 2*512kB 2*1024kB 3*2048kB 9*4096kB = 48460kB
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951533] Node 0 Normal: 292*4kB 120*8kB 79*16kB 19*32kB 19*64kB 4*128kB 2*256kB 3*512kB 0*1024kB 0*2048kB 1*4096kB = 11872kB
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951542] 8197 total pagecache pages
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951543] 517 pages in swap cache
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951545] Swap cache stats: add 6442, delete 5925, find 53/97
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951546] Free swap  = 39296888kB
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.951548] Total swap = 39321596kB
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.981973] 3934192 pages RAM
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.981975] 99286 pages reserved
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.981977] 20007 pages shared
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.981978] 3809731 pages non-shared
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.981979] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.981990] [  214]     0   214     6704      132   0       0             0 mountall
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.981994] [  306]     0   306     4309       47   3       0             0 upstart-udev-br
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.981998] [  310]     0   310     5401      154   0     -17         -1000 udevd
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982001] [  357]     0   357     5367       64   2     -17         -1000 udevd
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982004] [  358]     0   358     5367       73   1     -17         -1000 udevd
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982008] [  444]     0   444     3798        3   1       0             0 upstart-socket-
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982012] [  512]     0   512     1817        9   2       0             0 dhclient3
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982015] [  659]     0   659    12509      213   3     -17         -1000 sshd
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982018] [  671]   101   671    63430      159   1       0             0 rsyslogd
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982021] [  679]   102   679     5980       90   2       0             0 dbus-daemon
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982024] [  737]     0   737     4007      174   1       0             0 getty
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982026] [  746]     0   746     4007      174   1       0             0 getty
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982029] [  752]     0   752     4007      174   0       0             0 getty
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982033] [  753]     0   753     4007      174   0       0             0 getty
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982035] [  755]     0   755     4007      174   2       0             0 getty
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982037] [  758]     0   758     1083      124   0       0             0 acpid
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982039] [  760]     0   760     4779      131   0       0             0 cron
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982041] [  761]     0   761     4228       49   2       0             0 atd
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982043] [  781]   103   781    46919      206   1       0             0 whoopsie
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982046] [  783]     0   783    15740       20   1       0             0 nginx
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982048] [  784]    33   784    15822      100   1       0             0 nginx
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982050] [  786]    33   786    15822       92   1       0             0 nginx
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982052] [  787]    33   787    15822      102   3       0             0 nginx
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982054] [  788]    33   788    15822       91   2       0             0 nginx
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982056] [  790]     0   790    14945       96   1       0             0 supervisord
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982059] [  811]     0   811     4465      157   0       0             0 start-osrm.sh
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982061] [  814]     0   814   547030   487596   0       0             0 osrm-routed-afr
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982064] [  832]     0   832     4007      174   0       0             0 getty
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982066] [  838]     0   838    18360      238   0       0             0 sshd
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982069] [  928]  1000   928    18360       55   2       0             0 sshd
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982071] [  929]  1000   929     6857     1241   0       0             0 bash
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982073] [ 1055]  1000  1055    10858      305   0       0             0 sudo
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982075] [ 1056]     0  1056     3138      176   0       0             0 start-osrm.sh
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982077] [ 1057]     0  1057  3577996  3271966   0       0             0 osrm-routed-eur
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982079] [ 1058]     0  1058    18360      437   2       0             0 sshd
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982082] [ 1148]  1000  1148    18360      250   2       0             0 sshd
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982084] [ 1149]  1000  1149     6852     1694   1       0             0 bash
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982087] Out of memory: Kill process 1057 (osrm-routed-eur) score 209 or sacrifice child
Mar 24 10:13:15 ip-10-0-0-97 kernel: [  449.982096] Killed process 1057 (osrm-routed-eur) total-vm:14311984kB, anon-rss:13056836kB, file-rss:31028kB

I also tried to load it into shared memory before, but this was what I got straight way:

$ sudo ./osrm-datastore europe-latest.osrm
[info] load names from: "europe-latest.osrm.names"
[info] size: 2218242
[info] allocating shared memory of 16919150418 bytes
[warn] caught exception: Invalid argument, code 21
[warn] caught exception: Invalid argument

I use the last version available in Github in 20.03.2014

Any help? Thanks in advance!

Update: just in case it can help, I ran free -m after the process and it shown this:

$ free -m
             total       used       free     shared    buffers     cached
Mem:         14980      14907         72          0          0       1342
-/+ buffers/cache:      13564       1415
Swap:        38399         23      38376

That seems to be right, considering it probably cleaned the allocated memory after the process was killed, but even during the process it keeps showing the same amount in swap, which probably means it is not using swap? But osrm-routed requires some special setup to use swap???

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions