-
Notifications
You must be signed in to change notification settings - Fork 2.2k
fix a cgroup cpu quota/period order issue. #2044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
libcontainer/cgroups/fs/cpu.go
Outdated
| } | ||
| if cgroup.Resources.CpuQuota != 0 { | ||
| // if quota is already set, don't set it again. | ||
| if (!qs) && cgroup.Resources.CpuQuota != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just do if !qs && ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed~
865c0e8 to
cd766b0
Compare
|
Just realize that the code only address the situation for changing leaf node of a cgroup tree. When a parent node is changed, it should make it larger first.. And worse, if one want to change a node in the middle of the cgroup tree, it becoms a mission impossible... Say we have three cgroup nodes a, b, c, a is b's parent, and b is c's parent |
|
My understanding is that runc is mostly used to manage leaf node of cgroup tree, right? If so, this fix should work "most of the time"... |
|
Yes, runc only ever manages the leaf of tree. If the admin decides to misconfigure the cgroup tree to make runc crash there isn't much we can (or should) do about it. |
cd766b0 to
173d424
Compare
|
I have make changes to use a simpler strategy: set period first, if failed than retry if after setting quota. |
|
c44aec9 |
libcontainer/cgroups/fs/cpu.go
Outdated
| } | ||
| // The order of setting cfs_quota_us and cfs_period_us is significant, since cgroup child node | ||
| // should not have a higher quota/period ratio than its parent. | ||
| // Use period->quota-->period sequence to make sure no matther parent or child node is changed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matther -> matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed~!
|
One question, why not use re-order the logic on what is set first instead of this re-order bool? |
|
@crosbymichael Thanks for the comments. I can put the first version of code changes in another PR if necessary. |
|
@crosbymichael @cyphar
Need your opinion on this. |
Signed-off-by: zq-david-wang <00107082@163.com>
|
Please add tests |
When setting/updating child cpu cgroup cfs_quota_us and cfs_period_us, the order is significant because the child node should not have a higher quota/period ratio than its parent. This means if always setting period first would cause kernel error.
For example, say parent node has quota/period set to 10000/10000, and we want to change child to 1000/1000, if set period first, than after the period change and before the quota change, the ration is 10000/1000, and this setting make child node have larger quota ratio than its parent and rejected by kernel.
This patch would get current child value first, and make sure the first setting always make the quota/period ratio lower than current (valid) ratio.
EDIT:
A simpler strategy is to use a period->quota---->period sequence, if setting period failed, retry setting it after seting quota.
And there is an issue reported in kubernetes project
kubernetes/kubernetes#76704
EDIT:
To trigger the issue, we need
And when using systemd cgroup driver with kubernetes/runc, both conditions are met...
kubelet would create a parent node for pod, and set its quota according to resource limit, and when create child node via systemd cgroup driver, systemd would set peirod to 100ms and quota to a calculated value other than -1. Then further change to quota/period would be rejected by liinux kernel if child node have higher quota/period ratio after setting period and before setting quota.