Skip to content

net-vm and nw-packet-forwarder fixes#1788

Merged
brianmcgillion merged 3 commits intotiiuae:mainfrom
vunnyso:vs-fixShutdown
Mar 6, 2026
Merged

net-vm and nw-packet-forwarder fixes#1788
brianmcgillion merged 3 commits intotiiuae:mainfrom
vunnyso:vs-fixShutdown

Conversation

@vunnyso
Copy link
Copy Markdown
Collaborator

@vunnyso vunnyso commented Feb 27, 2026

Description of Changes

  1. net-vm: Increase the default memory allocation from 512MB to 1024MB
    Some WiFi drivers require at least 1GB of memory to function properly. The default is updated to improve
    compatibility while still allowing overrides via vmConfig.

    Resolves the kernel crashes outlined below that are related to memory during net-vm boot:

    awk: page allocation failure: order:4, mode:0x40820(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=msFetchUrl.service,mems_allowed=0
    CPU: 1 UID: 994 PID: 918 Comm: awk Not tainted 6.18.8 #1-NixOS PREEMPT(voluntary)
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
    Call Trace:
    <TASK>
    dump_stack_lvl+0x5d/0x80
    warn_alloc+0x163/0x190
    ? wakeup_kswapd+0xa3/0x1d0
    __alloc_frozen_pages_noprof+0xc47/0x10f0
    ? get_page_from_freelist+0x1a06/0x1c20
    alloc_pages_mpol+0x86/0x170
    ? virtio_fs_enqueue_req+0x214/0x600 [virtiofs]
    ___kmalloc_large_node+0x99/0xb0
    __kmalloc_large_node_noprof+0x1d/0xb0
    __kmalloc_noprof+0x4dd/0x700
    ? __alloc_frozen_pages_noprof+0x478/0x10f0
    ? virtio_fs_enqueue_req+0x214/0x600 [virtiofs]
    virtio_fs_enqueue_req+0x214/0x600 [virtiofs]
    virtio_fs_send_req+0x51/0x110 [virtiofs]
    __fuse_simple_request+0x118/0x310 [fuse]
    fuse_readdir_uncached+0x16f/0x8c0 [fuse]
    ? virtqueue_add_sgs+0xb5/0xd0 [virtio_ring]
    ? vp_notify+0x16/0x20 [virtio_pci]
    ? virtqueue_notify+0x1f/0x40 [virtio_ring]
    ? virtio_fs_enqueue_req+0x50c/0x600 [virtiofs]
    iterate_dir+0xaa/0x270
    ovl_iterate+0x168/0x3a0 [overlay]
    ? __pfx_ovl_iterate+0x10/0x10 [overlay]
    wrap_directory_iterator+0x4b/0x70
    iterate_dir+0xaa/0x270
    __x64_sys_getdents64+0x7b/0x110
    ? __pfx_filldir64+0x10/0x10
    do_syscall_64+0xb6/0x7e0
    ? exc_page_fault+0x6a/0x150
  2. nw-packet-forwarder: wait for external interface IPv4 before startup
    Add a pre-start check to wait until the external network interface has a global IPv4 address before launching
    nw-pckt-fwd.

    This prevents the packet forwarder from starting prematurely when the interface (e.g., Wi-Fi) is not yet connected or
    configured, improving startup reliability.

    Fixes the service failure when no WiFi is connected:

    nw-packet-forwarder[4447]: Failed to assign interfaces: No IPv4 address found for interface wlp0s5f0
    systemd[1]: nw-packet-forwarder.service: Main process exited, code=exited, status=1/FAILURE
    systemd[1]: nw-packet-forwarder.service: Failed with result 'exit-code'.

Type of Change

  • New Feature
  • Bug Fix
  • Improvement / Refactor

Related Issues / Tickets

Checklist

  • Clear summary in PR description
  • Detailed and meaningful commit message(s)
  • Commits are logically organized and squashed if appropriate
  • Contribution guidelines followed
  • Ghaf documentation updated with the commit - https://tiiuae.github.io/ghaf/
  • Author has run make-checks and it passes
  • All automatic GitHub Action checks pass - see actions
  • Author has added reviewers and removed PR draft status

Testing Instructions

Applicable Targets

  • Orin AGX aarch64
  • Orin NX aarch64
  • Lenovo X1 x86_64
  • Dell Latitude x86_64
  • System 76 x86_64

Installation Method

  • Requires full re-installation
  • Can be updated with nixos-rebuild ... switch
  • Other:

Test Steps To Verify:

  1. It may fix https://jira.tii.ae/browse/SSRCSP-7512

@kajusnau
Copy link
Copy Markdown
Collaborator

Might also fix https://jira.tii.ae/browse/SSRCSP-8070

Especially on suspend -> resume, it seems like net-vm needs some extra memory to properly restore wi-fi functionality sometimes..?

@vunnyso
Copy link
Copy Markdown
Collaborator Author

vunnyso commented Feb 27, 2026

Might also fix https://jira.tii.ae/browse/SSRCSP-8070

Especially on suspend -> resume, it seems like net-vm needs some extra memory to properly restore wi-fi functionality sometimes..?

Yes, I think it is related. Although #1770 helped address the additional memory requirements, however, we are still seeing memory allocation crashes during the net-vm boot, as mentioned in the PR description. This may also affect driver initialisation.

@milva-unikie
Copy link
Copy Markdown

jenkins-pre-merge tests found one issue after the latest changes. Systemctl status in net-vm stays at starting because nw-packet-forwarder.service does not start. Test devices are using wired connections.

[ghaf@ghaf-0853589747:~]$ systemctl status nw-packet-forwarder.service
● nw-packet-forwarder.service - Network packet forwarder daemon
     Loaded: loaded (/etc/systemd/system/nw-packet-forwarder.service; enabled; preset: ignored)
     Active: activating (start-pre) since Fri 2026-02-27 11:37:19 UTC; 8min ago
        Job: 278
 Invocation: 675549b2cf2b431cb9e27c255d5960c1
  Cntrl PID: 516 (nw-pckt-wait-fo)
         IP: 0B in, 0B out
         IO: 0B read, 0B written
      Tasks: 2 (limit: 1121)
     Memory: 3.1M (peak: 8M)
        CPU: 821ms
     CGroup: /system.slice/nw-packet-forwarder.service
             ├─ 516 /nix/store/f15k3dpilmiyv6zgpib289rnjykgr1r4-bash-5.3p9/bin/bash /nix/store/m6v212y1wbkfjbj9fyamnfdb835zdd35-nw-pckt-wait-for-ipv4/bin/nw-pckt-wait-for-ipv4
             └─3738 sleep 10

Feb 27 11:44:32 ghaf-0853589747 nw-pckt-wait-for-ipv4[516]: Waiting for IPv4 address on interface wlp0s5f0...
Feb 27 11:44:42 ghaf-0853589747 nw-pckt-wait-for-ipv4[516]: Waiting for IPv4 address on interface wlp0s5f0...
Feb 27 11:44:52 ghaf-0853589747 nw-pckt-wait-for-ipv4[516]: Waiting for IPv4 address on interface wlp0s5f0...
Feb 27 11:45:02 ghaf-0853589747 nw-pckt-wait-for-ipv4[516]: Waiting for IPv4 address on interface wlp0s5f0...
Feb 27 11:45:12 ghaf-0853589747 nw-pckt-wait-for-ipv4[516]: Waiting for IPv4 address on interface wlp0s5f0...
Feb 27 11:45:22 ghaf-0853589747 nw-pckt-wait-for-ipv4[516]: Waiting for IPv4 address on interface wlp0s5f0...
Feb 27 11:45:32 ghaf-0853589747 nw-pckt-wait-for-ipv4[516]: Waiting for IPv4 address on interface wlp0s5f0...
Feb 27 11:45:42 ghaf-0853589747 nw-pckt-wait-for-ipv4[516]: Waiting for IPv4 address on interface wlp0s5f0...
Feb 27 11:45:52 ghaf-0853589747 nw-pckt-wait-for-ipv4[516]: Waiting for IPv4 address on interface wlp0s5f0...
Feb 27 11:46:02 ghaf-0853589747 nw-pckt-wait-for-ipv4[516]: Waiting for IPv4 address on interface wlp0s5f0...

@vunnyso
Copy link
Copy Markdown
Collaborator Author

vunnyso commented Feb 27, 2026

jenkins-pre-merge tests found one issue after the latest changes. Systemctl status in net-vm stays at starting because nw-packet-forwarder.service does not start. Test devices are using wired connections.

[ghaf@ghaf-0853589747:~]$ systemctl status nw-packet-forwarder.service
● nw-packet-forwarder.service - Network packet forwarder daemon
     Loaded: loaded (/etc/systemd/system/nw-packet-forwarder.service; enabled; preset: ignored)
     Active: activating (start-pre) since Fri 2026-02-27 11:37:19 UTC; 8min ago
        Job: 278
 Invocation: 675549b2cf2b431cb9e27c255d5960c1
  Cntrl PID: 516 (nw-pckt-wait-fo)
         IP: 0B in, 0B out
         IO: 0B read, 0B written
      Tasks: 2 (limit: 1121)
     Memory: 3.1M (peak: 8M)
        CPU: 821ms
     CGroup: /system.slice/nw-packet-forwarder.service
             ├─ 516 /nix/store/f15k3dpilmiyv6zgpib289rnjykgr1r4-bash-5.3p9/bin/bash /nix/store/m6v212y1wbkfjbj9fyamnfdb835zdd35-nw-pckt-wait-for-ipv4/bin/nw-pckt-wait-for-ipv4
             └─3738 sleep 10

Feb 27 11:44:32 ghaf-0853589747 nw-pckt-wait-for-ipv4[516]: Waiting for IPv4 address on interface wlp0s5f0...
Feb 27 11:44:42 ghaf-0853589747 nw-pckt-wait-for-ipv4[516]: Waiting for IPv4 address on interface wlp0s5f0...

Based on my observation, with mainline the behaviour appears as follows: the nw-packet-forwarder.service fails to start because it requires an IPv4 address on the Wi‑Fi interface wlp0s5f0 in order to forward packets. Since the service cannot obtain this address, it repeatedly fails. This was tested on a Gen12 Lenovo X1.

[ghaf@ghaf-2014721134:~]$ systemctl status nw-packet-forwarder.service
● nw-packet-forwarder.service - Network packet forwarder daemon
     Loaded: loaded (/etc/systemd/system/nw-packet-forwarder.service; enabled; preset: ignored)
     Active: activating (auto-restart) (Result: exit-code) since Fri 2026-02-27 12:13:47 UTC; 2s ago
 Invocation: d036ea0455d24d699ce744904c0cb2cc
    Process: 1912 ExecStart=/nix/store/cn0j80zwcvcbi9y06rrxkdpmvcx1mgxq-nw-pckt-fwd/bin/nw-pckt-fwd (code=exited, status=1/FAILURE)
   Main PID: 1912 (code=exited, status=1/FAILURE)
         IP: 0B in, 0B out
         IO: 0B read, 0B written
   Mem peak: 14.7M
        CPU: 32ms

Logs

ghaf-2014721134 systemd[1]: nw-packet-forwarder.service: Scheduled restart job, restart counter is at 12.
ghaf-2014721134 systemd[1]: Started Network packet forwarder daemon.
ghaf-2014721134 nw-packet-forwarder[1823]: Failed to assign interfaces: No IPv4 address found for interface wlp0s5f0
ghaf-2014721134 systemd[1]: nw-packet-forwarder.service: Main process exited, code=exited, status=1/FAILURE
ghaf-2014721134 systemd[1]: nw-packet-forwarder.service: Failed with result 'exit-code'.

@enesoztrk please correct me if am wrong.

@kajusnau
Copy link
Copy Markdown
Collaborator

kajusnau commented Feb 27, 2026

jenkins-pre-merge tests found one issue after the latest changes. Systemctl status in net-vm stays at starting because nw-packet-forwarder.service does not start. Test devices are using wired connections.

[ghaf@ghaf-0853589747:~]$ systemctl status nw-packet-forwarder.service
● nw-packet-forwarder.service - Network packet forwarder daemon
     Loaded: loaded (/etc/systemd/system/nw-packet-forwarder.service; enabled; preset: ignored)
     Active: activating (start-pre) since Fri 2026-02-27 11:37:19 UTC; 8min ago

If this is a blocker, we can scrap the idea of checking for the IP via ExecStartPre.. 🤷

@milva-unikie
Copy link
Copy Markdown

Based on my observation, with mainline the behaviour appears as follows: the nw-packet-forwarder.service fails to start because it requires an IPv4 address on the Wi‑Fi interface wlp0s5f0 in order to forward packets. Since the service cannot obtain this address, it repeatedly fails. This was tested on a Gen12 Lenovo X1.

I saw the same thing, however it is not keeping systemctl status hostage in mainline. The status is running even though that service is trying to restart.

If this is a blocker, we can scrap the idea of checking for the IP via ExecStartPre.. 🤷

It does not have to be a blocker on our side, but it will need a few new checks and skips.

@vunnyso
Copy link
Copy Markdown
Collaborator Author

vunnyso commented Feb 27, 2026

Based on my observation, with mainline the behaviour appears as follows: the nw-packet-forwarder.service fails to start because it requires an IPv4 address on the Wi‑Fi interface wlp0s5f0 in order to forward packets. Since the service cannot obtain this address, it repeatedly fails. This was tested on a Gen12 Lenovo X1.

I saw the same thing, however it is not keeping systemctl status hostage in mainline. The status is running even though that service is trying to restart.

@kajusnau we can move logic of checking for the IP in ExecStart instead of ExecStartPre then service will be active and running.

@kajusnau
Copy link
Copy Markdown
Collaborator

Based on my observation, with mainline the behaviour appears as follows: the nw-packet-forwarder.service fails to start because it requires an IPv4 address on the Wi‑Fi interface wlp0s5f0 in order to forward packets. Since the service cannot obtain this address, it repeatedly fails. This was tested on a Gen12 Lenovo X1.

I saw the same thing, however it is not keeping systemctl status hostage in mainline. The status is running even though that service is trying to restart.

@kajusnau we can move logic of checking for the IP in ExecStart instead of ExecStartPre then service will be active and running.

Yep, but if QA can adjust with a few checks, then I think it makes more sense to keep it in ExecStartPre. Anyway, I'll leave that to you guys to decide 😃

vunnyso added 3 commits March 2, 2026 09:34
…igs"

This reverts commit 6a25444.

Signed-off-by: Vunny Sodhi <vunny.sodhi@unikie.com>
Some WiFi drivers require at least 1GB of memory to function properly.
The default is updated to improve compatibility while still allowing
overrides via vmConfig.

Resolves the kernel crashes outlined below that are related to memory:

awk: page allocation failure: order:4, mode:0x40820(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=msFetchUrl.service,mems_allowed=0
CPU: 1 UID: 994 PID: 918 Comm: awk Not tainted 6.18.8 tiiuae#1-NixOS PREEMPT(voluntary)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x5d/0x80
 warn_alloc+0x163/0x190
 ? wakeup_kswapd+0xa3/0x1d0
 __alloc_frozen_pages_noprof+0xc47/0x10f0
 ? get_page_from_freelist+0x1a06/0x1c20
 alloc_pages_mpol+0x86/0x170
 ? virtio_fs_enqueue_req+0x214/0x600 [virtiofs]
 ___kmalloc_large_node+0x99/0xb0
 __kmalloc_large_node_noprof+0x1d/0xb0
 __kmalloc_noprof+0x4dd/0x700
 ? __alloc_frozen_pages_noprof+0x478/0x10f0
 ? virtio_fs_enqueue_req+0x214/0x600 [virtiofs]
 virtio_fs_enqueue_req+0x214/0x600 [virtiofs]
 virtio_fs_send_req+0x51/0x110 [virtiofs]
 __fuse_simple_request+0x118/0x310 [fuse]
 fuse_readdir_uncached+0x16f/0x8c0 [fuse]
 ? virtqueue_add_sgs+0xb5/0xd0 [virtio_ring]
 ? vp_notify+0x16/0x20 [virtio_pci]
 ? virtqueue_notify+0x1f/0x40 [virtio_ring]
 ? virtio_fs_enqueue_req+0x50c/0x600 [virtiofs]
 iterate_dir+0xaa/0x270
 ovl_iterate+0x168/0x3a0 [overlay]
 ? __pfx_ovl_iterate+0x10/0x10 [overlay]
 wrap_directory_iterator+0x4b/0x70
 iterate_dir+0xaa/0x270
 __x64_sys_getdents64+0x7b/0x110
 ? __pfx_filldir64+0x10/0x10
 do_syscall_64+0xb6/0x7e0
 ? exc_page_fault+0x6a/0x150

Signed-off-by: Vunny Sodhi <vunny.sodhi@unikie.com>
Add a pre-start check to wait until the external network interface
has a global IPv4 address before launching nw-pckt-fwd.

This prevents the packet forwarder from starting prematurely when
the interface (e.g., Wi-Fi) is not yet connected or configured,
improving startup reliability.

Fixes:
nw-packet-forwarder[4447]: Failed to assign interfaces: No IPv4 address found for interface wlp0s5f0
systemd[1]: nw-packet-forwarder.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: nw-packet-forwarder.service: Failed with result 'exit-code'.

Signed-off-by: Vunny Sodhi <vunny.sodhi@unikie.com>
Copy link
Copy Markdown
Contributor

@enesoztrk enesoztrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a DCO error in the pipeline. It is good to go if you fix it

@vunnyso
Copy link
Copy Markdown
Collaborator Author

vunnyso commented Mar 4, 2026

WiFi OOM-related crashes are still getting reported in https://jira.tii.ae/browse/SSRCSP-8070 despite zram being enabled.  @brianmcgillion can you please review this PR? 

@brianmcgillion brianmcgillion merged commit 2456ed3 into tiiuae:main Mar 6, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants