Skip to content

sysext: Skip refresh if no changes are found#39980

Merged
poettering merged 2 commits intosystemd:mainfrom
pothos:sysext-skip-refresh
Feb 4, 2026
Merged

sysext: Skip refresh if no changes are found#39980
poettering merged 2 commits intosystemd:mainfrom
pothos:sysext-skip-refresh

Conversation

@pothos
Copy link
Copy Markdown
Contributor

@pothos pothos commented Dec 3, 2025

When the extensions for the final system are already set up from the
initrd we should avoid disrupting the boot process with the remount
(which currently isn't atomic) and the daemon reload for
systemd-confext and systemd-sysext. Similarly, when sysupdate ran and
updated extensions it's best to avoid the remount and daemon reload if
no changes are found.
To do this, encode the current extension state in more detail than
before where only the names of the extensions where encoded in the
overlay mount. This can also be used to provide more details about the
extension origin in "systemd-sysext status (--json=)". During the
refresh add a check whether the old state matches the new state and in
this case skip the refresh unless the user provides a flag to always
refresh. Besides the extension name and the resolved path the best
method for identification is the verity hash but that is not available
for plain image files or directories. Therefore, also include data to
check for file/directory replacements. The creation/modification times
are not always real on reproducible images or extracted archive content.
The file handle together with the unique mount ID is the next best
identifier we can use when we have no verity hash. Fall back to an inode
when we get no handle. With the creation/modification time and the path
this should be good enough. Using a unique mount ID is important (with
a fallback to the regular non-unique mount ID) instead of st_dev because
st_dev gets reused too easily, e.g., by a loop device mount and the
mount ID helps to catch this. For the mount ID to be valid it has to be
resolved before we enter the new mount namespace. Thus, it gets provided
by the image dissect logic and handed over to the sysext subprocess
which runs in a new mount namespace.
Luckily, we can rule out online modification of directories or image
files because this is anyway not well supported with overlay mounts, so
we don't do a file checksum nor do we recurse into a directory to look
for the most recently touched files. But, as said, with the
always-refresh flag one can force a reload.

@github-actions github-actions bot added documentation util-lib tests mount sysext please-review PR is ready for (re-)review by a maintainer labels Dec 3, 2025
@pothos
Copy link
Copy Markdown
Contributor Author

pothos commented Dec 3, 2025

We could replace the "extensions" file which currently has a list of
extension names with the new "origin" file and also use this list when
stacking the extensions but it results in a slightly changed stacking
order.

Should I do this here or not?

@pothos pothos force-pushed the sysext-skip-refresh branch from a327462 to 18e8ded Compare December 3, 2025 06:42
@pothos
Copy link
Copy Markdown
Contributor Author

pothos commented Dec 3, 2025

FYI, even with AT_HANDLE_FID the handle from name_to_handle_at was noisy on btrfs and tmpfs when the parent directory got updated. We should probably also try to set AT_HANDLE_MNT_ID_UNIQUE for name_to_handle_at to make it more robust but that's another topic.

@pothos pothos force-pushed the sysext-skip-refresh branch 2 times, most recently from bc209d1 to 3c9f658 Compare December 3, 2025 07:14
@pothos pothos force-pushed the sysext-skip-refresh branch from 3c9f658 to 3db0ae4 Compare December 3, 2025 07:45
@poettering
Copy link
Copy Markdown
Member

FYI, even with AT_HANDLE_FID the handle from name_to_handle_at was noisy on btrfs and tmpfs when the parent directory got updated. We should probably also try to set AT_HANDLE_MNT_ID_UNIQUE for name_to_handle_at to make it more robust but that's another topic.

hm, i cannot parse this. "noisy"? what do you mean by that?

@poettering
Copy link
Copy Markdown
Member

hmm, you are using the inode nr, but that's not a great identifier? why not use the name_to_handle_at() FID handle?

@poettering
Copy link
Copy Markdown
Member

hmm, if i grok this properly, then you are formatting everything as string, and compare things as string.

given the complexity of the data, and the history (i.e. two different kinds of mnt ids for example) I think i'd really prefer if this is just encoded in json, we have all the functionality for that in sd-json after all. with that you can clearly say if things are a mnt id, or not, and that's much better.

mnt ids might change if people clone namespaces btw, hence, the "official" way to reference an inode via a stable identifier is via the FID in combination with the UUID of the superblock. That's in fact wat overlayfs references use too. It would be wise to follow their logic. i.e. use FS_IOC_GETUUID to acquire the uuid. not that no all fstypes have a uuid though, hence errors must be handled gracefully (squashfs doesn't have a uuid).

@poettering poettering added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed please-review PR is ready for (re-)review by a maintainer labels Dec 3, 2025
static bool arg_legend = true;
static bool arg_force = false;
static bool arg_no_reload = false;
static bool arg_always_refresh = false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the default so that the current behaviour stays the same, and backward compatibility is kept around and the new functionality that you need for your specific use case is opt-in

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is that the state encoding is good enough to not break any (valid) use case. This behavior should be used in systemd-sysext.service ExecStart= and while one could add a flag there I thought it's best to have it on by default. What's the use case for doing a remount when nothing changed? Requiring the flag for that seemed to be ok for me and I can add call it out better in the docs together with the case where one modifies the lowerdir while it's loaded.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling the refresh verb means refresh, if it starts silently being a no-op because of some complicated logic tailored to your use case while it wasn't before, then it's a compat break. I understand you need it for your use case, and that's fine, but please make it opt-in, like the rest, and then you can select it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is the same and any changes are picked up but when no changes were found it prints a message that the refresh was skipped and what flag to set to force it. If there is a case where the logic doesn't do a refresh but you think it should, then we can probably add this (and also add a test).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is not the same, because before an operation with a lot of side effects happened, and now it doesn't anymore. Please switch the default to keep the current behaviour, so that there are no surprises. Thanks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it's good to avoid breaking stuff but at the same time I would like to understand it more - it's like running meson compile and whether it does a full rebuild or incremental build as needed and in the end we don't care about the steps but the end result, which here is that everything is mounted. So what side effects do you rely on? I'm not totally opposed to doing a --skip-if-no-changes flag and changing systemd-sysext.service ExecStart= to use it but wouldn't do this based on a general gut feeling and rather on what contract this should provide to support which specific scenarios.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service file also needs to stay as is. You can opt-in in your own images and deployments.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with @pothos. i think suppressing reappication of sysexct/confext if nothing changes is safe enough.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shuld be mentioned in NEWS of course, but otherwise i think this is a change we can make that might have minor behaviour changes as effect, but not prohibitively many

@pothos
Copy link
Copy Markdown
Contributor Author

pothos commented Dec 4, 2025

The name_to_handle_at() value (even with FID) changed directly during the systemd-sysext run for the parent directory /etc/extensions/ probably because the directory got a new timestamp. I didn't look into this in detail but since this also happened for the vpick inside it might be related. Anyway, because the file handle can encode the state of the parent directory it's not a good use case here and I had to resort to the inode. Ideally this would be fixed in the kernel but that won't help right now. We could include the inode generation by converting the path fd to a real one but since not all FSs support this and the inode is combined with the full path and ctime/mtime I think it's not worth adding.

As for the UUID, yes, that's an option for some filesystems but not for all and instead of it we can use the unique mount ID as a stronger identifier here and then there is no need to include the UUID, I would say. That the mound IT changes with bind mounts and namespaces is not a problem for the check here.

Using JSON sounds ok and gives a reason to keep the current /extensions//confexts special file in addition to a new JSON file. I'll look into it.

@poettering
Copy link
Copy Markdown
Member

The name_to_handle_at() value (even with FID) changed directly during the systemd-sysext run for the parent directory /etc/extensions/ probably because the directory got a new timestamp. I didn't look into this in detail but since this also happened for the vpick inside it might be related. Anyway, because the file handle can encode the state of the parent directory it's not a good use case here and I had to resort to the inode. Ideally this would be fixed in the kernel but that won't help right now. We could include the inode generation by converting the path fd to a real one but since not all FSs support this and the inode is combined with the full path and ctime/mtime I think it's not worth adding.

hmm, the fid should not change during an mtime fix.

To me this sounds as if you queried the fid of an overlayfs inode, and the mtime fix triggered a copy-up, and that of course resuled in a new inode being created.

but of course, we the images shold not be on an overlayfs, but on a real fs, and hence the issue should not apply

return log_error_errno(r, "Failed to create a directory '%s': %m", path_in_root);
return log_error_errno(r, "Failed to chase/create base directory '%s/%s': %m", strempty(root), skip_leading_slash(path));

chmod_fd = fd_reopen(path_fd, O_CLOEXEC|O_DIRECTORY);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, if you need the thing fully opened, and not just an O_PATH you can use chase_and_open()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not here because it relies on chase to create the missing directories but with chase_and_open it missed creating the last one due to CHASE_PARENT being used in the internal chase call there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a comment

@poettering
Copy link
Copy Markdown
Member

looks pretty good, just some minor nits, and a rebase pls.

@poettering poettering added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed please-review PR is ready for (re-)review by a maintainer labels Jan 16, 2026
@pothos pothos force-pushed the sysext-skip-refresh branch from 16a3319 to f2e6a3d Compare January 16, 2026 17:04
@github-actions github-actions bot added please-review PR is ready for (re-)review by a maintainer and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks needs-rebase labels Jan 16, 2026
pothos added a commit to flatcar/scripts that referenced this pull request Feb 2, 2026
For A/B-updated /etc contents we used a custom overlay mount that
provides the default files through a lowerdir loaded from /usr. Since
then we upstreamed mutable systemd-confext support and now we can switch
to it.
This pulls in flatcar/init#138 and
flatcar/bootengine#115 together with backported
systemd patches that have opened or merged upstream PRs to fix --root=
issues and add a refresh skip check to prevent boot disruptions due to
the multiple daemon reloads and - more important - the missing atomic
remount that would mean /etc is gone for a few milliseconds during boot.
The skip logic works best with verity hashes and thus the default
confext must be a verity extension image.
User-provided confext don't work well yet unless they use verity due to
the missing atomic remount and reliance on the skipping logic. We also
need to look into stacking order and other mutabiliy settings.

The backported systemd patches relate to the following upstream PRs:

systemd/systemd#39843 for
vpick-Don-t-use-openat-directly-but-resolve-symlinks
discover-image-Follow-symlinks-in-a-given-root
sysext-Use-correct-image-name-for-extension-release
test-Add-tests-for-handling-symlinks-with-systemd-sy
Note that the patch in the PR relies on
0859fe3f32774f1e0c787974cc252ff922a1b868 but the backport patch not.

systemd/systemd#39980 for
sysext-Create-mutable-directory-with-the-right-mode
sysext-Skip-refresh-if-no-changes-are-found

systemd/systemd#39991 for
sysext-Get-verity-user-certs-from-given-root

systemd/systemd#40063 for
sysext-Fix-config-file-support-with-root
which relies on systemd/systemd#38250 for
man-sysext.conf-add-systemd-sysext-config-files
sysext-introduce-global-config-file
sysext-support-ImagePolicy-global-config-option

Signed-off-by: Kai Lueke <kailuke@microsoft.com>
@yuwata
Copy link
Copy Markdown
Member

yuwata commented Feb 3, 2026

Please rebase again.

@yuwata yuwata added needs-rebase and removed please-review PR is ready for (re-)review by a maintainer labels Feb 3, 2026
@pothos pothos force-pushed the sysext-skip-refresh branch from f2e6a3d to 695bb38 Compare February 3, 2026 19:27
@github-actions github-actions bot added please-review PR is ready for (re-)review by a maintainer and removed needs-rebase labels Feb 3, 2026
When the mutable directory didn't exist but gets created with
--mutable=yes then it used to get mode 700 and later it got patched by
a chmod because it is the top layer and must match the target hierarchy.
This meant one could not call the function to resolve the mutable
directory twice before the mount because it has a check for a proper
mode when the directory exists which is the case for the second call.
Also, this resulted in /var/lib/extensions.mutable getting created with
mode 700 which is not really required.

Don't rely on the chmod for the upper dir but directly create the
directory with the right mode by first creating all missing directories
with 755 as a sane default and then changing the mode as needed for the
mutable directory.
@pothos pothos force-pushed the sysext-skip-refresh branch from 695bb38 to b75c031 Compare February 3, 2026 23:00
When the extensions for the final system are already set up from the
initrd we should avoid disrupting the boot process with the remount
(which currently isn't atomic) and the daemon reload for
systemd-confext and systemd-sysext. Similarly, when sysupdate ran and
updated extensions it's best to avoid the remount and daemon reload if
no changes are found.
To do this, encode the current extension state in more detail than
before where only the names of the extensions where encoded in the
overlay mount. This can also be used to provide more details about the
extension origin in "systemd-sysext status (--json=)". During the
refresh add a check whether the old state matches the new state and in
this case skip the refresh unless the user provides a flag to always
refresh. Besides the extension name and the resolved path the best
method for identification is the verity hash but that is not available
for plain image files or directories. Therefore, also include data to
check for file/directory replacements. The creation/modification times
are not always real on reproducible images or extracted archive content.
The file handle together with the unique mount ID is the next best
identifier we can use when we have no verity hash. Fall back to an inode
when we get no handle. With the creation/modification time and the path
this should be good enough. Using a unique mount ID is important (with
a fallback to the regular non-unique mount ID) instead of st_dev because
st_dev gets reused too easily, e.g., by a loop device mount and the
mount ID helps to catch this. For the mount ID to be valid it has to be
resolved before we enter the new mount namespace. Thus, it gets provided
by the image dissect logic and handed over to the sysext subprocess
which runs in a new mount namespace.
Luckily, we can rule out online modification of directories or image
files because this is anyway not well supported with overlay mounts, so
we don't do a file checksum nor do we recurse into a directory to look
for the most recently touched files.  But, as said, with the
always-refresh flag one can force a reload.
@pothos pothos force-pushed the sysext-skip-refresh branch from b75c031 to 23115ee Compare February 3, 2026 23:05
@poettering poettering merged commit 13d8287 into systemd:main Feb 4, 2026
49 of 57 checks passed
@github-actions github-actions bot removed the please-review PR is ready for (re-)review by a maintainer label Feb 4, 2026
pothos added a commit to flatcar/scripts that referenced this pull request Feb 28, 2026
For A/B-updated /etc contents we used a custom overlay mount that
provides the default files through a lowerdir loaded from /usr. Since
then we upstreamed mutable systemd-confext support and now we can switch
to it.
This pulls in flatcar/init#138 and
flatcar/bootengine#115 together with backported
systemd patches that have opened or merged upstream PRs to fix --root=
issues and add a refresh skip check to prevent boot disruptions due to
the multiple daemon reloads and - more important - the missing atomic
remount that would mean /etc is gone for a few milliseconds during boot.
The skip logic works best with verity hashes and thus the default
confext must be a verity extension image.
User-provided confext don't work well yet unless they use verity due to
the missing atomic remount and reliance on the skipping logic. We also
need to look into stacking order and other mutabiliy settings.

The backported systemd patches relate to the following upstream PRs:

systemd/systemd#39843 for
vpick-Don-t-use-openat-directly-but-resolve-symlinks
discover-image-Follow-symlinks-in-a-given-root
sysext-Use-correct-image-name-for-extension-release
test-Add-tests-for-handling-symlinks-with-systemd-sy
Note that the patch in the PR relies on
0859fe3f32774f1e0c787974cc252ff922a1b868 but the backport patch not.

systemd/systemd#39980 for
sysext-Create-mutable-directory-with-the-right-mode
sysext-Skip-refresh-if-no-changes-are-found

systemd/systemd#39991 for
sysext-Get-verity-user-certs-from-given-root

systemd/systemd#40063 for
sysext-Fix-config-file-support-with-root
which relies on systemd/systemd#38250 for
man-sysext.conf-add-systemd-sysext-config-files
sysext-introduce-global-config-file
sysext-support-ImagePolicy-global-config-option

Signed-off-by: Kai Lueke <kailuke@microsoft.com>
pothos added a commit to flatcar/scripts that referenced this pull request Feb 28, 2026
For A/B-updated /etc contents we used a custom overlay mount that
provides the default files through a lowerdir loaded from /usr. Since
then we upstreamed mutable systemd-confext support and now we can switch
to it.
This pulls in flatcar/init#138 and
flatcar/bootengine#115 together with backported
systemd patches that have opened or merged upstream PRs to fix --root=
issues and add a refresh skip check to prevent boot disruptions due to
the multiple daemon reloads and - more important - the missing atomic
remount that would mean /etc is gone for a few milliseconds during boot.
The skip logic works best with verity hashes and thus the default
confext must be a verity extension image.
User-provided confext don't work well yet unless they use verity due to
the missing atomic remount and reliance on the skipping logic. We also
need to look into stacking order and other mutabiliy settings.

The backported systemd patches relate to the following upstream PRs:

systemd/systemd#39843 for
vpick-Don-t-use-openat-directly-but-resolve-symlinks
discover-image-Follow-symlinks-in-a-given-root
sysext-Use-correct-image-name-for-extension-release
test-Add-tests-for-handling-symlinks-with-systemd-sy
Note that the patch in the PR relies on
0859fe3f32774f1e0c787974cc252ff922a1b868 but the backport patch not.

systemd/systemd#39980 for
sysext-Create-mutable-directory-with-the-right-mode
sysext-Skip-refresh-if-no-changes-are-found

systemd/systemd#39991 for
sysext-Get-verity-user-certs-from-given-root

systemd/systemd#40063 for
sysext-Fix-config-file-support-with-root
which relies on systemd/systemd#38250 for
man-sysext.conf-add-systemd-sysext-config-files
sysext-introduce-global-config-file
sysext-support-ImagePolicy-global-config-option

Signed-off-by: Kai Lueke <kailuke@microsoft.com>
pothos added a commit to flatcar/scripts that referenced this pull request Mar 3, 2026
For A/B-updated /etc contents we used a custom overlay mount that
provides the default files through a lowerdir loaded from /usr. Since
then we upstreamed mutable systemd-confext support and now we can switch
to it.
This pulls in flatcar/init#138 and
flatcar/bootengine#115 together with backported
systemd patches that have opened or merged upstream PRs to fix --root=
issues and add a refresh skip check to prevent boot disruptions due to
the multiple daemon reloads and - more important - the missing atomic
remount that would mean /etc is gone for a few milliseconds during boot.
The skip logic works best with verity hashes and thus the default
confext must be a verity extension image.
User-provided confext don't work well yet unless they use verity due to
the missing atomic remount and reliance on the skipping logic. We also
need to look into stacking order and other mutabiliy settings.

The backported systemd patches relate to the following upstream PRs:

systemd/systemd#39843 for
vpick-Don-t-use-openat-directly-but-resolve-symlinks
discover-image-Follow-symlinks-in-a-given-root
sysext-Use-correct-image-name-for-extension-release
test-Add-tests-for-handling-symlinks-with-systemd-sy
Note that the patch in the PR relies on
0859fe3f32774f1e0c787974cc252ff922a1b868 but the backport patch not.

systemd/systemd#39980 for
sysext-Create-mutable-directory-with-the-right-mode
sysext-Skip-refresh-if-no-changes-are-found

systemd/systemd#39991 for
sysext-Get-verity-user-certs-from-given-root

systemd/systemd#40063 for
sysext-Fix-config-file-support-with-root
which relies on systemd/systemd#38250 for
man-sysext.conf-add-systemd-sysext-config-files
sysext-introduce-global-config-file
sysext-support-ImagePolicy-global-config-option

Signed-off-by: Kai Lueke <kailuke@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants