core/device: fix state transition on switching root with persistent database#23437
core/device: fix state transition on switching root with persistent database#23437yuwata wants to merge 4 commits intosystemd:mainfrom
Conversation
| * device_catchup() or uevents. */ | ||
|
|
||
| if (MANAGER_IS_SWITCHING_ROOT(m)) { | ||
| if (MANAGER_IS_SWITCHING_ROOT(m) && !FLAGS_SET(d->enumerated_found, DEVICE_FOUND_UDEV)) { |
There was a problem hiding this comment.
if you are able to reproduce the issue, can you confirm whether this fixes it for you?
There was a problem hiding this comment.
I'm not seeing these "dead->plugged" transitions anymore right after switching root for devices with the "db_persist" option set.
There was a problem hiding this comment.
I think this is wrong. We must reset the state to DEVICE_TENTATIVE to prevent premature device activation in the switching root case. The argument is the same as in #23208. Even with db_persist, the udev rules may have changed while switching root, and a device which was "plugged" in the initramfs may not be any more in the mounted root (SYSTEMD_READY="0" may be set on it). IMO DEVICE_READY should never be set after switching root until an actual "add" uevent for the device has been processed.
After looking at the dm-crypt example, I have to say this statement of mine was wrong. dm-crypt devices apparently need to be in "plugged" state early on. We can't wait for udev coldplug to establish this state, because cryptsetup@,service may need to be activated before udev after switching root, and that wouldn't happen if the device was only in tentative state.
There was a problem hiding this comment.
Well if db_persist is used it's clearly an indication that the device must remain plugged, see the doc. If SYSTEMD_READY=0 is used in the host then it seems actually expected that the device goes through the plugged->dead transition when the new set of rules from the host is processed by udev.
There was a problem hiding this comment.
That's one possible intepretation of the text in udev(7). But you're just inventing it :-)
db_persist dates back to 2011, before the systemd/udev merge. The man page paragraph was added much later. I don't think the relationship of db_persist and systemd's state attribute has ever been clearly defined.
db_persist was created in the past for LVM, because LVM uses "change" events for device activation rather than "add" events, and needs special treatment for coldplug "add" events for devices that are already active (see 10-dm.rules). LVM doesn't care about systemd's state, so as far as LVM is concerned, it doesn't matter whether we translate db_persist to DEVICE_READY or DEVICE_TENTATIVE.
I still think DEVICE_TENTATIVE is safer. It doesn't do harm, because the actual state will be resolved sooner or later either way, and it avoids premature activation.
There was a problem hiding this comment.
I think this is wrong. We must reset the state to
DEVICE_TENTATIVEto prevent premature device activation in the switching root case. The argument is the same as in #23208. Even withdb_persist, the udev rules may have changed while switching root, and a device which was "plugged" in the initramfs may not be any more in the mounted root (SYSTEMD_READY="0" may be set on it). IMODEVICE_READYshould never be set after switching root until an actual "add" uevent for the device has been processed.
@mwilck The host PID1 has used and honored the persistent database generated in initrd at least more than two years. Why do you want to change the behavior? Of course, we have not documented the behavior, but IIRC, there is no issue related to that.
As reported in #23429, after PR #23218, the device with persistent database has unnecessary state transition. It may be harmless in most cases, but such state transition did not occur previously. If we can avoid that, why not to do that?
Well if db_persist is used it's clearly an indication that the device must remain plugged, see the doc. If SYSTEMD_READY=0 is used in the host then it seems actually expected that the device goes through the
plugged->deadtransition when the new set of rules from the host is processed by udev.
@fbuihuu I agree with you. Even if the behavior is not documented, we have done so for a while. There is no reason to change it.
There was a problem hiding this comment.
@mwilck The host PID1 has used and honored the persistent database generated in initrd at least more than two years. Why do you want to change the behavior?
@yuwata, you're right that as a matter of fact, systemd kept DEVICE_READY state over switching root. It did this for all devices though, which was wrong, and was fixed in #23218. #23218 did indeed "change the behavior". I hope we agree it did so for the better.
The open question is how to deal with db_persist. We need to realize that we have two conflicting semantic rules:
- udev rules are allowed to be different between initrd and root fs, and thus need to be re-parsed and re-executed after switching root before switching device states and forwarding them to dependent units.
db_persistdevices keep state over switching root.
@fbuihuu and you think that rule 2) defines an exception to rule 1). I disagree. 1) is a general principle. Nothing prevents udev rules from changing between initrd and root FS. Thus if we run device_set_state(DEVICE_READY) on these devices, we'd open up new ways for the race described in #23208 to occur, which may cause boot failure.
db_persist is only used by dm and md. After switching root, the udev rules for these subsystems need to distinguish between "activation" events and other events, which requires certain udev properties to survive the cleanup (see e.g. this dracut commit). This distinction is made in udev rules and relies 100% on properties in the udev db. It has nothing to do with systemd's device activation. Thus for the current consumers of db_persist, it doesn't matter if we set the state to "tentative" rather than "plugged" here.
It does matter for the further processing of dependent devices, though. The processing of dm and md devices during boot relies heavily on (udev) coldplug. It's not in the interest of these subsystems to have dependent units activated before the (udev) coldplug events have been processed. Strictly speaking, after switching root, we should never call device_set_state(DEVICE_READY), except from device_dispatch_io().
However, I realize that this hunk is not the point. With this hunk, we'll set DEVICE_READY in device_coldplug() for db_persist devices, whereas without it, we set it in device_catchup(). If my argument above is correct, both calls would be premature. @fbuihuu, with this observation I don't understand how this patch fixes your issue. The extra "dead→plugged" transition should still observed, just a little bit earlier.
| * device_catchup() or uevents. */ | ||
|
|
||
| if (MANAGER_IS_SWITCHING_ROOT(m)) { | ||
| if (MANAGER_IS_SWITCHING_ROOT(m) && !FLAGS_SET(d->enumerated_found, DEVICE_FOUND_UDEV)) { |
There was a problem hiding this comment.
I think this is wrong. We must reset the state to DEVICE_TENTATIVE to prevent premature device activation in the switching root case. The argument is the same as in #23208. Even with db_persist, the udev rules may have changed while switching root, and a device which was "plugged" in the initramfs may not be any more in the mounted root (SYSTEMD_READY="0" may be set on it). IMO DEVICE_READY should never be set after switching root until an actual "add" uevent for the device has been processed.
After looking at the dm-crypt example, I have to say this statement of mine was wrong. dm-crypt devices apparently need to be in "plugged" state early on. We can't wait for udev coldplug to establish this state, because cryptsetup@,service may need to be activated before udev after switching root, and that wouldn't happen if the device was only in tentative state.
588ca27 to
1ec046e
Compare
|
Whatever we eventually do here, can we please separate the |
|
TL;DR: IMO this is no regression. The Long story@fbuihuu, this patch simply moves the spurious "dead→plugged" transition from During the Now I see the following with the code from #23218: With a5dedd6 applied, it becomes: You can see that the |
|
@mwilck Thank you for the investigation. Still I think this PR is OK, but let's postpone to v252. |
I hope the last commit in this PR addresses that. WDYT? |
I guess so, but I haven't taken a closer look. I've tested these patches one-by-one. I've been wondering if it'd be possible (in the switching-root case at least) to postpone the call to |
I see, thanks for the investigation, I've been fooled by the rate limiting in Then issue #23429 is not a regression. |
This is basically what the last commit of #23436 is all about. |
Well unfortunately it seems to be a regression for systems using dm-crypt for the root FS. Another example for the general theorem that whatever we do in this area, we overlook something 😭 I also have to revoke my statement above that "tentative" is always better than "plugged". I still think that we have a conflict between two principles here, as stated above, but at least the way things are set up today, we have to carry over "plugged" state from the initrd for |
Will be used by the following commit.
On switching root, a device may have a persistent databse. In that case, Device.enumerated_found may have DEVICE_FOUND_UDEV flag, and it is not necessary to downgrade the Device.deserialized_found and Device.deserialized_state. Otherwise, the state of the device unit may be changed plugged -> dead -> plugged, if the device has not been mounted. Fixes systemd#23429.
…as a swap Otherwise, the device state will changed as plugged -> dead -> tentative, and the corresponding .mount or .swap units will be stopped. Fixes systemd#23429 (comment).
1ec046e to
fb4a249
Compare
On switching root, a device may have a persistent databse. In that case, Device.enumerated_found may have DEVICE_FOUND_UDEV flag, and it is not necessary to downgrade the Device.deserialized_found and Device.deserialized_state. Otherwise, the state of the device unit may be changed plugged -> dead -> plugged, if the device has not been mounted. Fixes systemd#23429. [mwilck: cherry-picked from systemd#23437]
On switching root, a device may have a persistent databse. In that case, Device.enumerated_found may have DEVICE_FOUND_UDEV flag, and it is not necessary to downgrade the Device.deserialized_found and Device.deserialized_state. Otherwise, the state of the device unit may be changed plugged -> dead -> plugged, if the device has not been mounted. Fixes systemd#23429. [mwilck: cherry-picked from systemd#23437]
On switching root, a device may have a persistent databse. In that case, Device.enumerated_found may have DEVICE_FOUND_UDEV flag, and it is not necessary to downgrade the Device.deserialized_found and Device.deserialized_state. Otherwise, the state of the device unit may be changed plugged -> dead -> plugged, if the device has not been mounted. Fixes systemd#23429. [mwilck: cherry-picked from systemd#23437]
On switching root, a device may have a persistent databse. In that case, Device.enumerated_found may have DEVICE_FOUND_UDEV flag, and it is not necessary to downgrade the Device.deserialized_found and Device.deserialized_state. Otherwise, the state of the device unit may be changed plugged -> dead -> plugged, if the device has not been mounted. Fixes systemd#23429. [mwilck: cherry-picked from systemd#23437]
On switching root, a device may have a persistent databse. In that case, Device.enumerated_found may have DEVICE_FOUND_UDEV flag, and it is not necessary to downgrade the Device.deserialized_found and Device.deserialized_state. Otherwise, the state of the device unit may be changed plugged -> dead -> plugged, if the device has not been mounted. Fixes systemd#23429. [mwilck: cherry-picked from systemd#23437] (cherry picked from commit 4fc69e8)
On switching root, a device may have a persistent databse. In that case, Device.enumerated_found may have DEVICE_FOUND_UDEV flag, and it is not necessary to downgrade the Device.deserialized_found and Device.deserialized_state. Otherwise, the state of the device unit may be changed plugged -> dead -> plugged, if the device has not been mounted. Fixes systemd#23429. [mwilck: cherry-picked from systemd#23437] (cherry picked from commit 4fc69e8) (cherry picked from commit 131206d)
On switching root, a device may have a persistent databse. In that case, Device.enumerated_found may have DEVICE_FOUND_UDEV flag, and it is not necessary to downgrade the Device.deserialized_found and Device.deserialized_state. Otherwise, the state of the device unit may be changed plugged -> dead -> plugged, if the device has not been mounted. Fixes systemd#23429. [mwilck: cherry-picked from systemd#23437] (cherry picked from commit 4fc69e8)
Fixes #23429.