-
Notifications
You must be signed in to change notification settings - Fork 18.9k
refactor some CPU RT and CFS code #41016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@thaJeztah PTAL |
|
I seem to recall there was some "setup" needed for this functionality in #23430 (parent cgroup for this to be created so that child containers got a share of this?) don't recall exactly |
|
Oh, perhaps I'm recalling #29846, where I was looking at that 🤔 |
I think I'm not breaking what was fixed in there, I'm just moving this check earlier in the code. |
|
@kolyshkin Looks like there's a failure: |
|
Perhaps would be good to update |
My bad, fixed: - if hc.Resources.CPURealtimePeriod != 0 || hc.Resources.CPURealtimeRuntime != 0 && !si.CPURealtime {
+ if (hc.Resources.CPURealtimePeriod != 0 || hc.Resources.CPURealtimeRuntime != 0) && !si.CPURealtime { |
|
Ah
This looks too opinionated to me, but fixed. (technically, we should change Rt to RT as well (RealTime) but IMO it will look uglier this way) |
|
CI failure in logging looks unrelated |
daemon/daemon_unix.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated question; if we recursively set this cgroup, does each recursion limit the rc period/runtime further as a part of its parent? (IoW, are they cummulative?) So if we would set the daemon's --cgroup-parent=/one/two/three, then the result would be different than --cgroup-parent=/one ? Because in that case, we should only set the cgroup once (for the docker "root" cgroup)? (or is it not possible to "skip" parent cgroups?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good question that I am also curious about (and this is that time that I wish I could have asked it during the review of #23430).
Some docs are available at https://www.kernel.org/doc/html/latest/scheduler/sched-rt-group.html
From what I understand, this is not a limit but rather a hard allocation (meaning the cgroup is given that amount of CPU time exclusively). So, you can think of it as a disk space -- if your want to have 10 GB free in /some/fancy/dir, you have to have it in its parent (/some/fancy) as well.
The idea of making dockerd set this for all parent cgroups seems questionable to me, but now we have what we have, and removing this will surely break backward compatibility.
|
CI failure seem unrelated |
The CPU CFS cgroup-aware scheduler is one single kernel feature, not two, so it does not make sense to have two separate booleans (CPUCfsQuota and CPUCfsPeriod). Merge these into CPUCfs. Same for CPU realtime. For compatibility reasons, /info stays the same for now. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit 56f77d5 added code that is doing some very ugly things. In partucular, calling cgroups.FindCgroupMountpointAndRoot() and daemon.SysInfoRaw() inside a recursively-called initCgroupsPath() not not a good thing to do. This commit tries to partially untangle this by moving some expensive checks and calls earlier, in a minimally invasive way (meaning I tried hard to not break any logic, however weird it is). This also removes double call to MkdirAll (not important, but it sticks out) and renames the function to better reflect what it's doing. Finally, this wraps some of the errors returned, and fixes the init function to not ignore the error from itself. This could be reworked more radically, but at least this this commit we are calling expensive functions once, and only if necessary. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
|
@AkihiroSuda if you have time, could you have another look? I'll try to review as well |
cpuguy83
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
A couple of patches touching CPU RT and CFS setting code. Should improve performance (in CPU RT case) and help adopting the runc changes in opencontainers/runc#2411
Untangle CPU RT controller init
Commit 56f77d5 added code that is doing some very ugly things.
In partucular, calling cgroups.FindCgroupMountpointAndRoot() and
daemon.SysInfoRaw() inside a recursively-called initCgroupsPath()
is not a good thing to do.
This commit tries to partially untangle this by moving some expensive
checks and calls up in the stack, so they at least won't be called multiple
times. This is done in a minimally invasive way (meaning I
tried hard to not break any logic, however weird it is).
This also removes double call to MkdirAll (not important, but it sticks
out) and renames the function to better reflect what it's doing.
This could be reworked more radically, but at least this this commit
we are calling expensive functions once, and only if necessary.
pkg/sysinfo: rm duplicates
The CPU CFS cgroup-aware scheduler is one single kernel feature, not
two, so it does not make sense to have two separate booleans
(CPUCfsQuota and CPUCfsPeriod). Merge these into CPUCfs.
Same for CPU realtime.
For compatibility reasons, /info stays the same for now.
Closes: #39704