Skip to content

Storage fixes#1538

Merged
brianmcgillion merged 8 commits intotiiuae:mainfrom
mbssrc:storage-fixes
Nov 10, 2025
Merged

Storage fixes#1538
brianmcgillion merged 8 commits intotiiuae:mainfrom
mbssrc:storage-fixes

Conversation

@mbssrc
Copy link
Copy Markdown
Collaborator

@mbssrc mbssrc commented Nov 5, 2025

Description of Changes

This PR unifies the storage backend for encrypted/unencrypted storage and removes guest machine-id generation
from the host.

Storage Backend:

  • Switch to volume-based VM persistent storage, unifying encrypted/unencrypted storage
  • Change home filesystem to ext4 for better recovery and failure handling
  • Configure default storage sizes across guest VMs to handle image-based storage properly

Note: as before, there is no error handling when running out of host disk space

Side effect should be also faster disk I/O, quick measurement showed small improvements for random and 3x for sequential I/O. Additionally there should be less host overhead (no virtiofsd, permissions mapping, etc.). Virtiofs shares could potentially be made faster with DAX, but that won't work with memory blinding.

User Management:

  • Remove host-based user removal
  • Users can now be removed via homectl remove <user>
  • Setup script auto-starts on boot if no user found

Machine-id:

  • Fixe machine-id VM setup with initrd service ordering to avoid runtime conflicts
  • Remove manual machine-id host-generation, now uses systemd native mechanism in VM

TPM setup:

  • Add restart on failure for storagevm-enroll service to handle TPM communication failures
  • Clear existing persistent handles before creating new ones
    Note: TPM should still be cleared before first boot!

IMPORTANT for automated testing

If automated user removal is used, please switch to homectl based removal as described below!

Type of Change

  • New Feature
  • Bug Fix
  • Improvement / Refactor

Related Issues / Tickets

Checklist

  • Clear summary in PR description
  • Detailed and meaningful commit message(s)
  • Commits are logically organized and squashed if appropriate
  • Contribution guidelines followed
  • Ghaf documentation updated with the commit - https://tiiuae.github.io/ghaf/
  • Author has run make-checks and it passes
  • All automatic GitHub Action checks pass - see actions
  • Author has added reviewers and removed PR draft status

Testing Instructions

Applicable Targets

  • Orin AGX aarch64
  • Orin NX aarch64
  • Lenovo X1 x86_64
  • Dell Latitude x86_64
  • System 76 x86_64

Installation Method

  • Requires full re-installation
  • Can be updated with nixos-rebuild ... switch
  • Other:

Test Steps To Verify:

  1. Full regression test, including:
  • full installation,
  • all VMs and apps run
  • shared folders work
  • home folders in VMs work as before
  • test encrypted scenario (virtualization.storagevm-encryption.enable = true; in mvp-user-trial.nix or use -extras, don't forget to clear TPM before first boot, enc shares on system76 may not work (had no tpm clear and couldn't test last iteration)
  • reboot and verify user can login (machine-id is preserved across reboots)
  1. Test user removal:
    (a) install. start, and add user as before
    (b) logout
    (c) use other tty (local) or ssh to login to gui-vm with ghaf user
    (d) homectl (check user is present and inactive) - homectl remove <username> - homectl (user gone)
    (e) reboot and verify startup script runs (also cosmic setup will run again)

@brianmcgillion
Copy link
Copy Markdown
Collaborator

Perfectly timed 🙂 nice one!

@brianmcgillion brianmcgillion added the Needs Testing CI Team to pre-verify label Nov 6, 2025
@milva-unikie
Copy link
Copy Markdown

Tested on Darter Pro (new installation)

  • With storagevm encryption disabled
    • All automated regression tests pass (including checks that all VMs are running and all app launch)
    • Shared folders work as before
    • No issues with rebooting
    • User can be removed and a new one created on the next boot (user removal is not tested automatically, but I will add it to a list of cases to be automated)
  • With storagevm encryption enabled
    • Boot & reboot ok
    • VMs have a TPM device
    • GuestStorage partition is encrypted

@milva-unikie milva-unikie added Tested on System76 and removed Needs Testing CI Team to pre-verify labels Nov 7, 2025
Clear a potentially existing persistent handle before creating the new one.

Signed-off-by: Manuel Bluhm <manuel@ssrc.tii.ae>
Restart storagevm-enroll service in case it fails, which
can currently happen due to TPM communication failures.

Signed-off-by: Manuel Bluhm <manuel@ssrc.tii.ae>
Fix per-vm machine-id generation with preservation, order tmpfiles to
assert /etc is mounted and machine-id file created before etc processing
to avoid runtime ordering conflicts where the file is not yet created.

Disable homed's builtin firstboot setup by disabling the service.

Signed-off-by: Manuel Bluhm <manuel@ssrc.tii.ae>
Change home fs to ext4 for better recovery and failure handling.

Signed-off-by: Manuel Bluhm <manuel@ssrc.tii.ae>
Switch storage backend to volume instead of shared folders. This unifies
encrypted and unencrypted storage options, and removes host dependencies
for setting up virtiofs shares.

Signed-off-by: Manuel Bluhm <manuel@ssrc.tii.ae>
@mbssrc
Copy link
Copy Markdown
Collaborator Author

mbssrc commented Nov 10, 2025

rebased

Join storagevm location to work for both encrypted and unencrypted case,
and cleanup host dependencies.

Signed-off-by: Manuel Bluhm <manuel@ssrc.tii.ae>
Remove file lock for user creation script, add change user setup start
condition to checking an identity file being present.

Users can be removed within the gui-vm (by root/admin) via homectl
remove <user>. On next boot, the user setup is being triggered
automatically.

Signed-off-by: Manuel Bluhm <manuel@ssrc.tii.ae>
Change default sizes for persistent storage across guests.

With the switch to images for guest storage, the size handling becomes
more important. Previously, shares were using disk space transparently.
The images have a max disk size, which can be over-committed as the
host increases image size on demand.

To make disk space handling generic, either a hardware definition or
a dynamic size definition during image creation (e.g. using percentages)
needs to be introduced.

Signed-off-by: Manuel Bluhm <manuel@ssrc.tii.ae>
@brianmcgillion brianmcgillion merged commit fdda35e into tiiuae:main Nov 10, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants