-
Notifications
You must be signed in to change notification settings - Fork 249
dont ignore failure to create cgroup after timeout #349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c984793 to
8ac0138
Compare
|
Looks good, but can you squash the commits and write the same blurb you have in the PR description in the commit message? Thanks! |
8ac0138 to
d919943
Compare
Done! |
cgroup2/manager.go
Outdated
| log.G(ctx).Warnf("Timed out while waiting for StartTransientUnit(%s) completion signal from dbus. Continuing...", group) | ||
| case <-time.After(systemdStartUnitTimeout): | ||
| attemptFailedUnitReset(conn, group) | ||
| return fmt.Errorf("Timed out while waiting for StartTransientUnit(%s) completion signal from dbus after %v", group, systemdStartUnitTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, one nit: Make Timed lowercase to appease the Golang idiom gods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Before this commit, creating a cgroup would silently ignore timeouts and carry on. Concretely, this caused cases where a cgroup failed to create, but the caller doesn't realize and ends up looking for files that should exist (e.g. cgroups.controllers), only to find they don't exist. It's very difficult as a caller to deal with this case, where NewSystemd succeeds but the group doesn't exist. The origins of this code seem to trace back to an initial implementation written 5+ years ago: containerd@5efa14e#diff-3331981e4ac06a8d9b06e91842b7f2759c7af3b65287e489a88385948d311ebdR672 runc added roughly the same logic here to deal with the same issue: opencontainers/runc#3782 Now, containerd will also error if a cgroup cannot be created within the timeout window. Signed-off-by: Josh Chorlton <jchorlton@gmail.com>
d919943 to
2e25118
Compare
|
@dcantah what's the path to getting this merged? |
|
@jchorl Just needs another approval, I'll shop this around to folks |
I noticed that creating a cgroup will silently ignore timeouts and continue on. Concretely, I've hit cases where a cgroup fails to get created, and the caller ends up looking for
cgroups.controllersonly to find it isn't there.It's very difficult as a caller to deal with this case, where
NewSystemdsucceeds but the group doesn't exist.I traced the origins of this code to 5efa14e#diff-3331981e4ac06a8d9b06e91842b7f2759c7af3b65287e489a88385948d311ebdR672 - which was written 5+ years ago.
runcadded roughly the same logic here: opencontainers/runc#3782