cgroup: Add DisableControllers= directive to disable controller in subtree #10567

cdown · 2018-10-29T16:37:59Z

Some controllers (like the CPU controller) have a performance cost that
is non-trivial on certain workloads. While this can be mitigated and
improved to an extent, there will for some controllers always be some
overheads associated with the benefits gained from the controller.
Inside Facebook, the fix applied has been to disable the CPU controller
forcibly with cgroup_disable=cpu on the kernel command line.

This presents a problem: to disable or reenable the controller, a reboot
is required, but this is quite cumbersome and slow to do for many
thousands of machines, especially machines where disabling/enabling a
stateful service on a machine is a matter of several minutes.

Currently systemd provides some configuration knobs for these in the
form of [Default]CPUAccounting, [Default]MemoryAccounting, and their
ilk. The limitation of these is that Default*Accounting is overrideable
by individual services, of which any one could decide to reenable a
controller within the hierarchy at any point just by using a controller
feature implicitly (eg. CPUWeight), even if the use of that CPU
feature could just be opportunistic. Since many services are provided by
the distribution, or by upstream teams at a particular organisation,
it's not a sustainable solution to simply try to find and remove
offending directives from these units.

This commit presents a more direct solution -- a DisableControllers=
directive that forcibly disallows a controller from being enabled within
a subtree.

Test Plan

Overall test doing grep -v memory with MemoryMax=5G in nomem.slice and DisableController=memory in nomemleaf.slice:

# systemctl status nomemleaf
* nomemleaf.service - Nomem Leaf Service
Loaded: loaded (/etc/systemd/system/nomemleaf.service; static; vendor preset: disabled)
Active: inactive (dead)

Nov 26 12:53:48 systemdtest systemd[1]: Starting Nomem Leaf Service...
Nov 26 12:53:48 systemdtest grep[516]: pids
Nov 26 12:53:48 systemdtest systemd[1]: nomemleaf.service: Succeeded.
Nov 26 12:53:48 systemdtest systemd[1]: Started Nomem Leaf Service.
Nov 26 12:53:48 systemdtest systemd[1]: nomemleaf.service: Consumed 1ms CPU time.

I've also tested this on -.slice, where it successfully removes the entry from cgroup.subtree_control.

DisableControllers also correctly shows up in systemctl show, showing DisableControllers=memory for nomem.slice, and nothing in nomem.service.

cdown · 2018-11-01T09:55:29Z

CI failures appear unrelated:

Rawhide test failure is caused by Fedora CI failing all builds due to lack of filesystem space #10592
i386 failure in startup ("A start job is running for…4816d1386c.device") looks transient/unrelated

This is still fine for review.

evverx · 2018-11-01T11:17:59Z

i386 failure in startup

That means that qemu turned on yesterday seems to be working :-) Anyway, it seems to be #10093, which is going to annoy everybody now that the test is really run.

keszybz

Looks very nice in general.

src/core/cgroup.c

poettering · 2018-11-20T20:04:02Z

would love to review this, can you rebase on current git, and address @keszybz' points?

cdown · 2018-11-21T00:35:26Z

Yeah, I've been working on this but found a small issue which I need to fix first. Probably will have an update ready for review tomorrow. :-)

cdown · 2018-11-26T17:12:01Z

@poettering @keszybz CI passes (pending arm64, but nothing specific there). This is ready for review. :)

src/core/dbus-cgroup.c

src/core/cgroup.c

cdown · 2018-11-27T17:24:38Z

@poettering Ready for next round of reviews.

We always end up doing these together, so just colocate them and require manager state for unit_create_cgroup.

systemd currently doesn't really expend much effort in disabling controllers. unit_realize_cgroup_now *may* be able to disable a controller in the basic case when using cgroup v2, but generally won't manage as downstream dependents may still use it. This code doesn't add any logic to fix that, but it starts the process of moving to have a breadth-first version of unit_realize_cgroup_now for enabling, and a depth-first version of unit_realize_cgroup_now for disabling.

This adds a depth-first version of unit_realize_cgroup_now which can only do depth-first disabling of controllers, in preparation for the DisableController= directive.

cdown · 2018-12-03T14:39:38Z

@poettering @keszybz This PR is pretty susceptible to merge conflicts and then having to go through the whole rebase/resolve/go through CI process again, any chance you could take a look soon? :-)

poettering

looks good, just two nitpicks

src/core/cgroup.c

…btree Some controllers (like the CPU controller) have a performance cost that is non-trivial on certain workloads. While this can be mitigated and improved to an extent, there will for some controllers always be some overheads associated with the benefits gained from the controller. Inside Facebook, the fix applied has been to disable the CPU controller forcibly with `cgroup_disable=cpu` on the kernel command line. This presents a problem: to disable or reenable the controller, a reboot is required, but this is quite cumbersome and slow to do for many thousands of machines, especially machines where disabling/enabling a stateful service on a machine is a matter of several minutes. Currently systemd provides some configuration knobs for these in the form of `[Default]CPUAccounting`, `[Default]MemoryAccounting`, and the like. The limitation of these is that Default*Accounting is overrideable by individual services, of which any one could decide to reenable a controller within the hierarchy at any point just by using a controller feature implicitly (eg. `CPUWeight`), even if the use of that CPU feature could just be opportunistic. Since many services are provided by the distribution, or by upstream teams at a particular organisation, it's not a sustainable solution to simply try to find and remove offending directives from these units. This commit presents a more direct solution -- a DisableControllers= directive that forcibly disallows a controller from being enabled within a subtree.

poettering · 2018-12-03T16:22:19Z

lgtm! thanks!

Summary: Rebase two fixes onto the version merged upstream: systemd/systemd#10507 systemd/systemd#10567 and backport a few more: systemd/systemd#10411 systemd/systemd#10493 systemd/systemd#10757 systemd/systemd#10876 These are almost all cgroup2 related. Reviewed By: cdown Differential Revision: D13351498 fbshipit-source-id: 87c8428d48dbb0eb2ae7d34f7381fff88f83872f

Werkov · 2019-08-09T16:42:42Z

man/systemd.resource-control.xml

+          in its subtree, the controller will be removed from the subtree. This can be used to avoid child units being
+          able to implicitly or explicitly enable a controller. Defaults to not disabling any controllers.</para>
+
+          <para>It may not be possible to successfully disable a controller if the unit or any child of the unit in


Why was this child-over-parents policy chosen? I can see that disabling and delegating a controller on the same unit is non-sensical config, however, I fail to see why children can override with their Delegate= requests.

(Posting to the commit as it's closest to the origin, not sure if this sends notifications, so pasting a handle @cdown. Let me know if this dicussion should be redirected elsewhere.)

Oh, I realize. We'd need to know the subtree of delegated children in order to do the "bottom-up" disabling. (Which we in theory know (with races though) but we've commited not to interfere with Delegate=.)

cdown mentioned this pull request Oct 29, 2018

parse: Add CategoricalBool and parser #10496

Closed

yuwata added the pid1 label Oct 29, 2018

cdown force-pushed the disable_controller branch 5 times, most recently from 98d88c3 to ba089e1 Compare October 31, 2018 11:44

cdown changed the title ~~cgroup: Add DisableController= directive to disable controller in subtree~~ cgroup: Add DisableControllers= directive to disable controller in subtree Oct 31, 2018

cdown force-pushed the disable_controller branch 3 times, most recently from 5ce3495 to 34821a7 Compare October 31, 2018 16:11

keszybz reviewed Nov 7, 2018

View reviewed changes

src/core/cgroup.c Outdated Show resolved Hide resolved

src/core/cgroup.c Outdated Show resolved Hide resolved

src/core/cgroup.c Outdated Show resolved Hide resolved

src/core/cgroup.c Outdated Show resolved Hide resolved

poettering added the cgroups label Nov 20, 2018

cdown mentioned this pull request Nov 21, 2018

core: skip cgroup_subtree_mask_valid update if UNIT_STUB #10876

Closed

cdown force-pushed the disable_controller branch 3 times, most recently from 28e8fa3 to 993bb3f Compare November 26, 2018 13:53

poettering requested changes Nov 26, 2018

View reviewed changes

poettering added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Nov 26, 2018

cdown force-pushed the disable_controller branch 3 times, most recently from 707fe2c to a3d7631 Compare November 27, 2018 15:50

cdown added 2 commits December 3, 2018 14:37

cgroup: Move attribute application into unit_create_cgroup

0d2d6fb

We always end up doing these together, so just colocate them and require manager state for unit_create_cgroup.

cgroup: Traverse leaves to realised cgroup to release controllers

4f6f62e

This adds a depth-first version of unit_realize_cgroup_now which can only do depth-first disabling of controllers, in preparation for the DisableController= directive.

cdown force-pushed the disable_controller branch from a3d7631 to ca8137d Compare December 3, 2018 14:38

poettering reviewed Dec 3, 2018

View reviewed changes

src/core/cgroup.c Outdated Show resolved Hide resolved

src/core/cgroup.c Show resolved Hide resolved

cdown force-pushed the disable_controller branch from ca8137d to c72703e Compare December 3, 2018 15:59

poettering added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Dec 3, 2018

poettering approved these changes Dec 3, 2018

View reviewed changes

poettering merged commit a365325 into systemd:master Dec 3, 2018

cdown mentioned this pull request Dec 4, 2018

Support disabling cgroup controller within systemd #7624

Closed

cdown deleted the disable_controller branch December 4, 2018 13:31

cdown restored the disable_controller branch December 4, 2018 13:32

Werkov reviewed Aug 9, 2019

View reviewed changes

Uh oh!

cgroup: Add DisableControllers= directive to disable controller in subtree #10567

cgroup: Add DisableControllers= directive to disable controller in subtree #10567

Uh oh!

Conversation

cdown commented Oct 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Plan

Uh oh!

cdown commented Nov 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evverx commented Nov 1, 2018

Uh oh!

keszybz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poettering commented Nov 20, 2018

Uh oh!

cdown commented Nov 21, 2018

Uh oh!

cdown commented Nov 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cdown commented Nov 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdown commented Dec 3, 2018

Uh oh!

poettering left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

poettering commented Dec 3, 2018

Uh oh!

Werkov Aug 9, 2019

Choose a reason for hiding this comment

Uh oh!

Werkov Aug 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

6 participants

cdown commented Oct 29, 2018 •

edited

Loading

cdown commented Nov 1, 2018 •

edited

Loading

cdown commented Nov 26, 2018 •

edited

Loading

cdown commented Nov 27, 2018 •

edited

Loading

Werkov Aug 9, 2019 •

edited

Loading