libct/cgroups/systemd: eliminate runc/systemd race#2614
Merged
mrunalp merged 1 commit intoopencontainers:masterfrom Sep 30, 2020
Merged
libct/cgroups/systemd: eliminate runc/systemd race#2614mrunalp merged 1 commit intoopencontainers:masterfrom
mrunalp merged 1 commit intoopencontainers:masterfrom
Conversation
In case it takes more than 1 second for systemd to create a unit, startUnit() times out with a warning and then runc proceeds (to create cgroups using fs manager and so on). Now runc and systemd are racing, and multiple scenarios are possible. In one such scenario, by the time runc calls systemd manager's Apply() the unit is not yet created, the dbusConnection.SetUnitProperties() call fails with "unit xxx.scope not found", and the whole container start also fails. To eliminate the race, we need to return an error in case the timeout is hit. To reduce the chance to fail, increase the timeout from 1 to 30 seconds, to not error out too early on a busy/slow system (and times like 3-5 seconds are not unrealistic). While at it, as the timeout is quite long now, make sure to not leave a stray timer. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
AkihiroSuda
approved these changes
Sep 30, 2020
mrunalp
approved these changes
Sep 30, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In case it takes more than 1 second for systemd to create a unit,
startUnit() times out with a warning and then runc proceeds
(to create cgroups using fs manager and so on).
Now runc and systemd are racing, and multiple scenarios are possible.
In one such scenario, by the time runc calls systemd manager's Apply()
the unit is not yet created, the dbusConnection.SetUnitProperties()
call fails with "unit xxx.scope not found", and the whole container
start also fails.
To eliminate the race, we need to return an error in case the timeout is
hit.
To reduce the chance to fail, increase the timeout from 1 to 30 seconds,
to not error out too early on a busy/slow system (and times like 3-5
seconds are not unrealistic).
While at it, as the timeout is quite long now, make sure to not leave
a stray timer.
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1883640
Signed-off-by: Kir Kolyshkin kolyshkin@gmail.com