sysext: Skip refresh if no changes are found#39980
Conversation
Should I do this here or not? |
a327462 to
18e8ded
Compare
|
FYI, even with |
bc209d1 to
3c9f658
Compare
3c9f658 to
3db0ae4
Compare
hm, i cannot parse this. "noisy"? what do you mean by that? |
|
hmm, you are using the inode nr, but that's not a great identifier? why not use the name_to_handle_at() FID handle? |
|
hmm, if i grok this properly, then you are formatting everything as string, and compare things as string. given the complexity of the data, and the history (i.e. two different kinds of mnt ids for example) I think i'd really prefer if this is just encoded in json, we have all the functionality for that in sd-json after all. with that you can clearly say if things are a mnt id, or not, and that's much better. mnt ids might change if people clone namespaces btw, hence, the "official" way to reference an inode via a stable identifier is via the FID in combination with the UUID of the superblock. That's in fact wat overlayfs references use too. It would be wise to follow their logic. i.e. use FS_IOC_GETUUID to acquire the uuid. not that no all fstypes have a uuid though, hence errors must be handled gracefully (squashfs doesn't have a uuid). |
| static bool arg_legend = true; | ||
| static bool arg_force = false; | ||
| static bool arg_no_reload = false; | ||
| static bool arg_always_refresh = false; |
There was a problem hiding this comment.
Please change the default so that the current behaviour stays the same, and backward compatibility is kept around and the new functionality that you need for your specific use case is opt-in
There was a problem hiding this comment.
The idea here is that the state encoding is good enough to not break any (valid) use case. This behavior should be used in systemd-sysext.service ExecStart= and while one could add a flag there I thought it's best to have it on by default. What's the use case for doing a remount when nothing changed? Requiring the flag for that seemed to be ok for me and I can add call it out better in the docs together with the case where one modifies the lowerdir while it's loaded.
There was a problem hiding this comment.
Calling the refresh verb means refresh, if it starts silently being a no-op because of some complicated logic tailored to your use case while it wasn't before, then it's a compat break. I understand you need it for your use case, and that's fine, but please make it opt-in, like the rest, and then you can select it.
There was a problem hiding this comment.
The result is the same and any changes are picked up but when no changes were found it prints a message that the refresh was skipped and what flag to set to force it. If there is a case where the logic doesn't do a refresh but you think it should, then we can probably add this (and also add a test).
There was a problem hiding this comment.
The result is not the same, because before an operation with a lot of side effects happened, and now it doesn't anymore. Please switch the default to keep the current behaviour, so that there are no surprises. Thanks.
There was a problem hiding this comment.
Yes, I think it's good to avoid breaking stuff but at the same time I would like to understand it more - it's like running meson compile and whether it does a full rebuild or incremental build as needed and in the end we don't care about the steps but the end result, which here is that everything is mounted. So what side effects do you rely on? I'm not totally opposed to doing a --skip-if-no-changes flag and changing systemd-sysext.service ExecStart= to use it but wouldn't do this based on a general gut feeling and rather on what contract this should provide to support which specific scenarios.
There was a problem hiding this comment.
The service file also needs to stay as is. You can opt-in in your own images and deployments.
There was a problem hiding this comment.
i agree with @pothos. i think suppressing reappication of sysexct/confext if nothing changes is safe enough.
There was a problem hiding this comment.
this shuld be mentioned in NEWS of course, but otherwise i think this is a change we can make that might have minor behaviour changes as effect, but not prohibitively many
|
The name_to_handle_at() value (even with FID) changed directly during the systemd-sysext run for the parent directory As for the UUID, yes, that's an option for some filesystems but not for all and instead of it we can use the unique mount ID as a stronger identifier here and then there is no need to include the UUID, I would say. That the mound IT changes with bind mounts and namespaces is not a problem for the check here. Using JSON sounds ok and gives a reason to keep the current |
hmm, the fid should not change during an mtime fix. To me this sounds as if you queried the fid of an overlayfs inode, and the mtime fix triggered a copy-up, and that of course resuled in a new inode being created. but of course, we the images shold not be on an overlayfs, but on a real fs, and hence the issue should not apply |
| return log_error_errno(r, "Failed to create a directory '%s': %m", path_in_root); | ||
| return log_error_errno(r, "Failed to chase/create base directory '%s/%s': %m", strempty(root), skip_leading_slash(path)); | ||
|
|
||
| chmod_fd = fd_reopen(path_fd, O_CLOEXEC|O_DIRECTORY); |
There was a problem hiding this comment.
hmm, if you need the thing fully opened, and not just an O_PATH you can use chase_and_open()
There was a problem hiding this comment.
I think not here because it relies on chase to create the missing directories but with chase_and_open it missed creating the last one due to CHASE_PARENT being used in the internal chase call there.
|
looks pretty good, just some minor nits, and a rebase pls. |
16a3319 to
f2e6a3d
Compare
For A/B-updated /etc contents we used a custom overlay mount that provides the default files through a lowerdir loaded from /usr. Since then we upstreamed mutable systemd-confext support and now we can switch to it. This pulls in flatcar/init#138 and flatcar/bootengine#115 together with backported systemd patches that have opened or merged upstream PRs to fix --root= issues and add a refresh skip check to prevent boot disruptions due to the multiple daemon reloads and - more important - the missing atomic remount that would mean /etc is gone for a few milliseconds during boot. The skip logic works best with verity hashes and thus the default confext must be a verity extension image. User-provided confext don't work well yet unless they use verity due to the missing atomic remount and reliance on the skipping logic. We also need to look into stacking order and other mutabiliy settings. The backported systemd patches relate to the following upstream PRs: systemd/systemd#39843 for vpick-Don-t-use-openat-directly-but-resolve-symlinks discover-image-Follow-symlinks-in-a-given-root sysext-Use-correct-image-name-for-extension-release test-Add-tests-for-handling-symlinks-with-systemd-sy Note that the patch in the PR relies on 0859fe3f32774f1e0c787974cc252ff922a1b868 but the backport patch not. systemd/systemd#39980 for sysext-Create-mutable-directory-with-the-right-mode sysext-Skip-refresh-if-no-changes-are-found systemd/systemd#39991 for sysext-Get-verity-user-certs-from-given-root systemd/systemd#40063 for sysext-Fix-config-file-support-with-root which relies on systemd/systemd#38250 for man-sysext.conf-add-systemd-sysext-config-files sysext-introduce-global-config-file sysext-support-ImagePolicy-global-config-option Signed-off-by: Kai Lueke <kailuke@microsoft.com>
|
Please rebase again. |
f2e6a3d to
695bb38
Compare
When the mutable directory didn't exist but gets created with --mutable=yes then it used to get mode 700 and later it got patched by a chmod because it is the top layer and must match the target hierarchy. This meant one could not call the function to resolve the mutable directory twice before the mount because it has a check for a proper mode when the directory exists which is the case for the second call. Also, this resulted in /var/lib/extensions.mutable getting created with mode 700 which is not really required. Don't rely on the chmod for the upper dir but directly create the directory with the right mode by first creating all missing directories with 755 as a sane default and then changing the mode as needed for the mutable directory.
695bb38 to
b75c031
Compare
When the extensions for the final system are already set up from the initrd we should avoid disrupting the boot process with the remount (which currently isn't atomic) and the daemon reload for systemd-confext and systemd-sysext. Similarly, when sysupdate ran and updated extensions it's best to avoid the remount and daemon reload if no changes are found. To do this, encode the current extension state in more detail than before where only the names of the extensions where encoded in the overlay mount. This can also be used to provide more details about the extension origin in "systemd-sysext status (--json=)". During the refresh add a check whether the old state matches the new state and in this case skip the refresh unless the user provides a flag to always refresh. Besides the extension name and the resolved path the best method for identification is the verity hash but that is not available for plain image files or directories. Therefore, also include data to check for file/directory replacements. The creation/modification times are not always real on reproducible images or extracted archive content. The file handle together with the unique mount ID is the next best identifier we can use when we have no verity hash. Fall back to an inode when we get no handle. With the creation/modification time and the path this should be good enough. Using a unique mount ID is important (with a fallback to the regular non-unique mount ID) instead of st_dev because st_dev gets reused too easily, e.g., by a loop device mount and the mount ID helps to catch this. For the mount ID to be valid it has to be resolved before we enter the new mount namespace. Thus, it gets provided by the image dissect logic and handed over to the sysext subprocess which runs in a new mount namespace. Luckily, we can rule out online modification of directories or image files because this is anyway not well supported with overlay mounts, so we don't do a file checksum nor do we recurse into a directory to look for the most recently touched files. But, as said, with the always-refresh flag one can force a reload.
b75c031 to
23115ee
Compare
For A/B-updated /etc contents we used a custom overlay mount that provides the default files through a lowerdir loaded from /usr. Since then we upstreamed mutable systemd-confext support and now we can switch to it. This pulls in flatcar/init#138 and flatcar/bootengine#115 together with backported systemd patches that have opened or merged upstream PRs to fix --root= issues and add a refresh skip check to prevent boot disruptions due to the multiple daemon reloads and - more important - the missing atomic remount that would mean /etc is gone for a few milliseconds during boot. The skip logic works best with verity hashes and thus the default confext must be a verity extension image. User-provided confext don't work well yet unless they use verity due to the missing atomic remount and reliance on the skipping logic. We also need to look into stacking order and other mutabiliy settings. The backported systemd patches relate to the following upstream PRs: systemd/systemd#39843 for vpick-Don-t-use-openat-directly-but-resolve-symlinks discover-image-Follow-symlinks-in-a-given-root sysext-Use-correct-image-name-for-extension-release test-Add-tests-for-handling-symlinks-with-systemd-sy Note that the patch in the PR relies on 0859fe3f32774f1e0c787974cc252ff922a1b868 but the backport patch not. systemd/systemd#39980 for sysext-Create-mutable-directory-with-the-right-mode sysext-Skip-refresh-if-no-changes-are-found systemd/systemd#39991 for sysext-Get-verity-user-certs-from-given-root systemd/systemd#40063 for sysext-Fix-config-file-support-with-root which relies on systemd/systemd#38250 for man-sysext.conf-add-systemd-sysext-config-files sysext-introduce-global-config-file sysext-support-ImagePolicy-global-config-option Signed-off-by: Kai Lueke <kailuke@microsoft.com>
For A/B-updated /etc contents we used a custom overlay mount that provides the default files through a lowerdir loaded from /usr. Since then we upstreamed mutable systemd-confext support and now we can switch to it. This pulls in flatcar/init#138 and flatcar/bootengine#115 together with backported systemd patches that have opened or merged upstream PRs to fix --root= issues and add a refresh skip check to prevent boot disruptions due to the multiple daemon reloads and - more important - the missing atomic remount that would mean /etc is gone for a few milliseconds during boot. The skip logic works best with verity hashes and thus the default confext must be a verity extension image. User-provided confext don't work well yet unless they use verity due to the missing atomic remount and reliance on the skipping logic. We also need to look into stacking order and other mutabiliy settings. The backported systemd patches relate to the following upstream PRs: systemd/systemd#39843 for vpick-Don-t-use-openat-directly-but-resolve-symlinks discover-image-Follow-symlinks-in-a-given-root sysext-Use-correct-image-name-for-extension-release test-Add-tests-for-handling-symlinks-with-systemd-sy Note that the patch in the PR relies on 0859fe3f32774f1e0c787974cc252ff922a1b868 but the backport patch not. systemd/systemd#39980 for sysext-Create-mutable-directory-with-the-right-mode sysext-Skip-refresh-if-no-changes-are-found systemd/systemd#39991 for sysext-Get-verity-user-certs-from-given-root systemd/systemd#40063 for sysext-Fix-config-file-support-with-root which relies on systemd/systemd#38250 for man-sysext.conf-add-systemd-sysext-config-files sysext-introduce-global-config-file sysext-support-ImagePolicy-global-config-option Signed-off-by: Kai Lueke <kailuke@microsoft.com>
For A/B-updated /etc contents we used a custom overlay mount that provides the default files through a lowerdir loaded from /usr. Since then we upstreamed mutable systemd-confext support and now we can switch to it. This pulls in flatcar/init#138 and flatcar/bootengine#115 together with backported systemd patches that have opened or merged upstream PRs to fix --root= issues and add a refresh skip check to prevent boot disruptions due to the multiple daemon reloads and - more important - the missing atomic remount that would mean /etc is gone for a few milliseconds during boot. The skip logic works best with verity hashes and thus the default confext must be a verity extension image. User-provided confext don't work well yet unless they use verity due to the missing atomic remount and reliance on the skipping logic. We also need to look into stacking order and other mutabiliy settings. The backported systemd patches relate to the following upstream PRs: systemd/systemd#39843 for vpick-Don-t-use-openat-directly-but-resolve-symlinks discover-image-Follow-symlinks-in-a-given-root sysext-Use-correct-image-name-for-extension-release test-Add-tests-for-handling-symlinks-with-systemd-sy Note that the patch in the PR relies on 0859fe3f32774f1e0c787974cc252ff922a1b868 but the backport patch not. systemd/systemd#39980 for sysext-Create-mutable-directory-with-the-right-mode sysext-Skip-refresh-if-no-changes-are-found systemd/systemd#39991 for sysext-Get-verity-user-certs-from-given-root systemd/systemd#40063 for sysext-Fix-config-file-support-with-root which relies on systemd/systemd#38250 for man-sysext.conf-add-systemd-sysext-config-files sysext-introduce-global-config-file sysext-support-ImagePolicy-global-config-option Signed-off-by: Kai Lueke <kailuke@microsoft.com>
When the extensions for the final system are already set up from the
initrd we should avoid disrupting the boot process with the remount
(which currently isn't atomic) and the daemon reload for
systemd-confext and systemd-sysext. Similarly, when sysupdate ran and
updated extensions it's best to avoid the remount and daemon reload if
no changes are found.
To do this, encode the current extension state in more detail than
before where only the names of the extensions where encoded in the
overlay mount. This can also be used to provide more details about the
extension origin in "systemd-sysext status (--json=)". During the
refresh add a check whether the old state matches the new state and in
this case skip the refresh unless the user provides a flag to always
refresh. Besides the extension name and the resolved path the best
method for identification is the verity hash but that is not available
for plain image files or directories. Therefore, also include data to
check for file/directory replacements. The creation/modification times
are not always real on reproducible images or extracted archive content.
The file handle together with the unique mount ID is the next best
identifier we can use when we have no verity hash. Fall back to an inode
when we get no handle. With the creation/modification time and the path
this should be good enough. Using a unique mount ID is important (with
a fallback to the regular non-unique mount ID) instead of st_dev because
st_dev gets reused too easily, e.g., by a loop device mount and the
mount ID helps to catch this. For the mount ID to be valid it has to be
resolved before we enter the new mount namespace. Thus, it gets provided
by the image dissect logic and handed over to the sysext subprocess
which runs in a new mount namespace.
Luckily, we can rule out online modification of directories or image
files because this is anyway not well supported with overlay mounts, so
we don't do a file checksum nor do we recurse into a directory to look
for the most recently touched files. But, as said, with the
always-refresh flag one can force a reload.