Skip to content

Bump vhotplug to fix NIC reattachment on resume#1786

Merged
brianmcgillion merged 2 commits intotiiuae:mainfrom
nesteroff:fix-suspend
Mar 6, 2026
Merged

Bump vhotplug to fix NIC reattachment on resume#1786
brianmcgillion merged 2 commits intotiiuae:mainfrom
nesteroff:fix-suspend

Conversation

@nesteroff
Copy link
Copy Markdown
Contributor

@nesteroff nesteroff commented Feb 25, 2026

Description of Changes

This fixes an issue where Wi-Fi doesn’t reconnect after resuming from suspend. I couldn’t reproduce it myself but the logs suggest that some PCI devices aren’t fully detached from VMs during suspend, which prevents them from being reattached on resume.

The updated vhotplug now waits until a device is fully removed from the VM before continuing the suspend process.

Type of Change

  • New Feature
  • Bug Fix
  • Improvement / Refactor

Related Issues / Tickets

https://jira.tii.ae/browse/SSRCSP-8058

Checklist

  • Clear summary in PR description
  • Detailed and meaningful commit message(s)
  • Commits are logically organized and squashed if appropriate
  • Contribution guidelines followed
  • Ghaf documentation updated with the commit - https://tiiuae.github.io/ghaf/
  • Author has run make-checks and it passes
  • All automatic GitHub Action checks pass - see actions
  • Author has added reviewers and removed PR draft status

Testing Instructions

Applicable Targets

  • Orin AGX aarch64
  • Orin NX aarch64
  • Lenovo X1 x86_64
  • Dell Latitude x86_64
  • System 76 x86_64

Installation Method

  • Requires full re-installation
  • Can be updated with nixos-rebuild ... switch
  • Other:

Test Steps To Verify:

Suspend, resume and make sure Wi-Fi works. I couldn’t reproduce the issue but SSRCSP-8058 reports that it was observed in 1 out of 5 attempts.

Signed-off-by: Yuri Nesterov <yuriy.nesterov@unikie.com>
@kajusnau
Copy link
Copy Markdown
Collaborator

Didn't see the original PR at vhotplug repo, but based on the description here, is there any possible edge-case where a PCI device might not be disconnected properly at all and so suspend would be stuck indefinitely? Is there a built-in timeout?

@nesteroff
Copy link
Copy Markdown
Contributor Author

Didn't see the original PR at vhotplug repo, but based on the description here, is there any possible edge-case where a PCI device might not be disconnected properly at all and so suspend would be stuck indefinitely? Is there a built-in timeout?

Yes, there’s a built-in timeout but I guess if something like that happens, we’d want the user to notice so we can investigate the root cause.

@vunnyso
Copy link
Copy Markdown
Collaborator

vunnyso commented Feb 27, 2026

Could this issue be related to low net-vm memory? I’ve noticed similar failures earlier with the storeDisk image, where Wi‑Fi doesn’t connect when the net-vm PCI device is reattached.

Maybe its worth to try with #1788? in which net-vm memory increased from 512 to 1024

@brianmcgillion
Copy link
Copy Markdown
Collaborator

Could this issue be related to low net-vm memory? I’ve noticed similar failures earlier with the storeDisk image, where Wi‑Fi doesn’t connect when the net-vm PCI device is reattached.

Maybe its worth to try with #1788? in which net-vm memory increased from 512 to 1024

we have balooning and zram enabled now. so memory pressures should be greatly reduced

@nesteroff
Copy link
Copy Markdown
Contributor Author

Could this issue be related to low net-vm memory? I’ve noticed similar failures earlier with the storeDisk image, where Wi‑Fi doesn’t connect when the net-vm PCI device is reattached.

Maybe its worth to try with #1788? in which net-vm memory increased from 512 to 1024

In one of the logs Ilkka provided, the Wi-Fi service crashed with a no memory error but hopefully it won’t happen anymore with the latest updates. This patch fixes a different problem where the device is not attached to the net-vm after resume.

@vunnyso
Copy link
Copy Markdown
Collaborator

vunnyso commented Feb 27, 2026

Could this issue be related to low net-vm memory? I’ve noticed similar failures earlier with the storeDisk image, where Wi‑Fi doesn’t connect when the net-vm PCI device is reattached.
Maybe its worth to try with #1788? in which net-vm memory increased from 512 to 1024

we have balooning and zram enabled now. so memory pressures should be greatly reduced

There are still memory‑related crashes visible in the early boot kernel logs, likely because zram is not yet enabled at that stage. Additionally with zram enabled, the reported MemTotal does not deflate #1770 (comment)

Signed-off-by: Yuri Nesterov <yuriy.nesterov@unikie.com>
@brianmcgillion brianmcgillion merged commit aa64037 into tiiuae:main Mar 6, 2026
32 checks passed
@nesteroff nesteroff deleted the fix-suspend branch March 6, 2026 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants