-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
Submission type
- Request for enhancement (RFE)
Details
In February this year, I along with @htejun identified a performance/latency regression in the cgroup CPU controller. While this issue has subsequently been fixed, for machines not running a new enough kernel, the fix applied has been to disable the CPU controller forcibly with cgroup_disable=cpu on the kernel command line inside Facebook.
This presents a problem: to disable or reenable the controller, a reboot is required, but this is quite cumbersome and slow to do for many thousands of machines, especially machines where disabling/enabling a stateful service on a machine is a matter of several minutes.
Currently systemd provides some configuration knobs for these in the form of [Default]CPUAccounting, [Default]MemoryAccounting, and their ilk. The limitation of these is that Default*Accounting is overrideable by individual services, of which any one could decide to reenable a controller within the hierarchy at any point just by using a controller feature implicitly (eg. CPUWeight), even if the use of that CPU feature could just be opportunistic. Since many services are provided by the distribution, or by upstream teams at a particular organisation, it's not a sustainable solution to simply try to find and remove offending directives from these units.
As such, I propose the creation of two new controls:
- Some mechanism to, within systemd, refuse to use a particular controller at all. This could perhaps be in the form of something like
Ignore{CPU,Memory,...}Accounting, where we disable it per-controller in a manner similar to how*Accountingdirectives do it right now. Since these are directly related to a controller, however, it might also just make sense to not be cgroup-agnostic about it and writeIgnoreCgroupController={cpu,memory,...}. - The
Assert{CPU,Memory,...}AccountingorAssertCgroupController={cpu,memory,...}directive, which fails loading a unit if it really needs some particular controller to be available.
This would allow systemd users to more effectively work around potential issues encountered during cgroup operation without having to reboot a machine to add cgroup_disable.