-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
cgroup: Add DisableControllers= directive to disable controller in subtree #10567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
98d88c3 to
ba089e1
Compare
5ce3495 to
34821a7
Compare
|
CI failures appear unrelated:
This is still fine for review. |
That means that |
keszybz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice in general.
|
would love to review this, can you rebase on current git, and address @keszybz' points? |
|
Yeah, I've been working on this but found a small issue which I need to fix first. Probably will have an update ready for review tomorrow. :-) |
28e8fa3 to
993bb3f
Compare
|
@poettering @keszybz CI passes (pending arm64, but nothing specific there). This is ready for review. :) |
707fe2c to
a3d7631
Compare
|
@poettering Ready for next round of reviews. |
We always end up doing these together, so just colocate them and require manager state for unit_create_cgroup.
systemd currently doesn't really expend much effort in disabling controllers. unit_realize_cgroup_now *may* be able to disable a controller in the basic case when using cgroup v2, but generally won't manage as downstream dependents may still use it. This code doesn't add any logic to fix that, but it starts the process of moving to have a breadth-first version of unit_realize_cgroup_now for enabling, and a depth-first version of unit_realize_cgroup_now for disabling.
This adds a depth-first version of unit_realize_cgroup_now which can only do depth-first disabling of controllers, in preparation for the DisableController= directive.
a3d7631 to
ca8137d
Compare
|
@poettering @keszybz This PR is pretty susceptible to merge conflicts and then having to go through the whole rebase/resolve/go through CI process again, any chance you could take a look soon? :-) |
poettering
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just two nitpicks
…btree Some controllers (like the CPU controller) have a performance cost that is non-trivial on certain workloads. While this can be mitigated and improved to an extent, there will for some controllers always be some overheads associated with the benefits gained from the controller. Inside Facebook, the fix applied has been to disable the CPU controller forcibly with `cgroup_disable=cpu` on the kernel command line. This presents a problem: to disable or reenable the controller, a reboot is required, but this is quite cumbersome and slow to do for many thousands of machines, especially machines where disabling/enabling a stateful service on a machine is a matter of several minutes. Currently systemd provides some configuration knobs for these in the form of `[Default]CPUAccounting`, `[Default]MemoryAccounting`, and the like. The limitation of these is that Default*Accounting is overrideable by individual services, of which any one could decide to reenable a controller within the hierarchy at any point just by using a controller feature implicitly (eg. `CPUWeight`), even if the use of that CPU feature could just be opportunistic. Since many services are provided by the distribution, or by upstream teams at a particular organisation, it's not a sustainable solution to simply try to find and remove offending directives from these units. This commit presents a more direct solution -- a DisableControllers= directive that forcibly disallows a controller from being enabled within a subtree.
ca8137d to
c72703e
Compare
|
lgtm! thanks! |
Summary: Rebase two fixes onto the version merged upstream: systemd/systemd#10507 systemd/systemd#10567 and backport a few more: systemd/systemd#10411 systemd/systemd#10493 systemd/systemd#10757 systemd/systemd#10876 These are almost all cgroup2 related. Reviewed By: cdown Differential Revision: D13351498 fbshipit-source-id: 87c8428d48dbb0eb2ae7d34f7381fff88f83872f
| in its subtree, the controller will be removed from the subtree. This can be used to avoid child units being | ||
| able to implicitly or explicitly enable a controller. Defaults to not disabling any controllers.</para> | ||
|
|
||
| <para>It may not be possible to successfully disable a controller if the unit or any child of the unit in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this child-over-parents policy chosen? I can see that disabling and delegating a controller on the same unit is non-sensical config, however, I fail to see why children can override with their Delegate= requests.
(Posting to the commit as it's closest to the origin, not sure if this sends notifications, so pasting a handle @cdown. Let me know if this dicussion should be redirected elsewhere.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I realize. We'd need to know the subtree of delegated children in order to do the "bottom-up" disabling. (Which we in theory know (with races though) but we've commited not to interfere with Delegate=.)
See #7624 and #10473.
Some controllers (like the CPU controller) have a performance cost that
is non-trivial on certain workloads. While this can be mitigated and
improved to an extent, there will for some controllers always be some
overheads associated with the benefits gained from the controller.
Inside Facebook, the fix applied has been to disable the CPU controller
forcibly with
cgroup_disable=cpuon the kernel command line.This presents a problem: to disable or reenable the controller, a reboot
is required, but this is quite cumbersome and slow to do for many
thousands of machines, especially machines where disabling/enabling a
stateful service on a machine is a matter of several minutes.
Currently systemd provides some configuration knobs for these in the
form of
[Default]CPUAccounting,[Default]MemoryAccounting, and theirilk. The limitation of these is that Default*Accounting is overrideable
by individual services, of which any one could decide to reenable a
controller within the hierarchy at any point just by using a controller
feature implicitly (eg.
CPUWeight), even if the use of that CPUfeature could just be opportunistic. Since many services are provided by
the distribution, or by upstream teams at a particular organisation,
it's not a sustainable solution to simply try to find and remove
offending directives from these units.
This commit presents a more direct solution -- a DisableControllers=
directive that forcibly disallows a controller from being enabled within
a subtree.
Test Plan
Overall test doing
grep -v memorywith MemoryMax=5G in nomem.slice and DisableController=memory in nomemleaf.slice:I've also tested this on -.slice, where it successfully removes the entry from cgroup.subtree_control.
DisableControllers also correctly shows up in
systemctl show, showingDisableControllers=memoryfor nomem.slice, and nothing in nomem.service.