-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Implementing support for --cpu-rt-period and --cpu-rt-runtime #23430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bbbc519 to
8612a12
Compare
|
Thanks for working on this! |
runconfig/hostconfig_unix.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I would check this, there are potentially other ways in which we could get capabilities, so I would just check the return codes when you actually try to change stuff and try to give a helpful message if it fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove just the capabilities check? or all the param sanity checking?
Also you wont see any errors until you make the syscall to change your priority, which would likely just be a permissions error and not overly helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The capabilities check, it is a bit ugly and encodes a lot of presumed information about how capabilities are assigned which could change in future, and we don't do it anywhere else. EPERM in the syscall is pretty clear and you can provide a more helpful message there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got overly ambition trying to save people from themselves. I'll get this removed.
|
No problem, sorry for the crazy delay, been insanely caught up on other tasks. Let me know if you have any questions/problems trying to test this. I admit it's a lot of steps to get setup to test if you don't already have this kernel param enabled, then the Docker inception thing, having to make sure the container you're testing inside of has rt_runtime. You almost need this feature to test this feature :) |
8612a12 to
8b0b462
Compare
|
I've removed the capabilities check and associated unit tests, PTAL. |
|
CI Failures are from the manual changes within Once docker-archive-public/docker.engine-api#262 is merged, this PR can be updated to pull in upstream and that failure will disappear. |
e936503 to
4aa30da
Compare
|
@erikstmartin yes, don't worry about the "failing-ci" label for now, it's just that we know it's failing (in this case due to the "vendor" CI). Should not be an issue while in design-review 👍 |
4aa30da to
7f24a14
Compare
|
I've updated this PR to include a --cpu-rt-runtime flag to the daemon. The daemon needs to be able to set the cgroup value otherwise a sysadmin will need to set the value after the first container is started (after every reboot) |
3a39af7 to
fe2f46e
Compare
|
I haven't had a chance to test this, but I believe the daemon flag won't actually work correctly with newer systemd versions. Until opencontainers/runc#931 is fixed, the cpu cgroup path ends up being |
|
@jalaziz I'll double check, but i'm pretty sure I was developing this against 230. |
|
@erikstmartin For reference, we're running CoreOS with systemd v229. However, I believe the problem exists on other OSes with newer systemd versions if you use I'm not sure if the |
|
@jalaziz You're completely correct. I dont think I tried it using the systemd cgroup driver on that machine. |
269ee59 to
1b749b4
Compare
|
Latest push should handle climbing the cgroup tree and also account for init.scope in newer versions of systemd |
|
🎉 |
Pull request moby#27745 added support for the client to talk to older versions of the daemon. Various flags were added to docker 1.13 that are not compatible with older daemons. This PR adds annotations to those flags, so that they are automatically hidden if the daemon is older than docker 1.13 (API 1.25). Not all new flags affect the API (some are client-side only). The following PR's added new flags to docker 1.13 that affect the API; - moby#23430 added `--cpu-rt-period`and `--cpu-rt-runtime` - moby#27800 / moby#25317 added `--group` / `--group-add` / `--group-rm` - moby#27702 added `--network` to `docker build` - moby#25962 added `--attachable` to `docker network create` - moby#27998 added `--compose-file` to `docker stack deploy` - moby#22566 added `--stop-timeout` to `docker run` and `docker create` - moby#26061 added `--init` to `docker run` and `docker create` - moby#26941 added `--init-path` to `docker run` and `docker create` - moby#27958 added `--cpus` on `docker run` / `docker create` - moby#27567 added `--dns`, `--dns-opt`, and `--dns-search` to `docker service create` - moby#27596 added `--force` to `docker service update` - moby#27857 added `--hostname` to `docker service create` - moby#28031 added `--hosts`, `--host-add` / `--host-rm` to `docker service create` and `docker service update` - moby#28076 added `--tty` on `docker service create` / `docker service update` - moby#26421 added `--update-max-failure-ratio`, `--update-monitor` and `--rollback` on `docker service update` - moby#27369 added `--health-cmd`, `--health-interval`, `--health-retries`, `--health-timeout` and `--no-healthcheck` options to `docker service create` and `docker service update` Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
|
@erikstmartin thank you for reading my question. |
|
@IT-huang This is going to be dependent on what the parent cgroup of docker is, is it |
cpu.rt_runtime PR moby#23430 introduced a couple more flags including `--cpu-rt-runtime` to the docker daemon. It appears recent changes or merge issues may have broken this. It currently does not take the cgroup mount point into account when determining the cgroup files to write values to. This breaks docker setting its own `cpu.rt_runtime` for the daemon. This also means containers aren't able to set theirs. Also, the cgroups.FindCgroupMountpointAndRoot returns back a mount point that includes the cgroup of the currently running container when docker is run inside a docker container. this breaks the `--cpu-rt-runtime` flag when running docker in docker. A fix has been placed here, but potentially could be pulled up into libcontainer if this is a better place for it. Signed-off-by: Erik St. Martin <alakriti@gmail.com>
Pull request moby/moby#27745 added support for the client to talk to older versions of the daemon. Various flags were added to docker 1.13 that are not compatible with older daemons. This PR adds annotations to those flags, so that they are automatically hidden if the daemon is older than docker 1.13 (API 1.25). Not all new flags affect the API (some are client-side only). The following PR's added new flags to docker 1.13 that affect the API; - moby/moby#23430 added `--cpu-rt-period`and `--cpu-rt-runtime` - moby/moby#27800 / moby/moby#25317 added `--group` / `--group-add` / `--group-rm` - moby/moby#27702 added `--network` to `docker build` - moby/moby#25962 added `--attachable` to `docker network create` - moby/moby#27998 added `--compose-file` to `docker stack deploy` - moby/moby#22566 added `--stop-timeout` to `docker run` and `docker create` - moby/moby#26061 added `--init` to `docker run` and `docker create` - moby/moby#26941 added `--init-path` to `docker run` and `docker create` - moby/moby#27958 added `--cpus` on `docker run` / `docker create` - moby/moby#27567 added `--dns`, `--dns-opt`, and `--dns-search` to `docker service create` - moby/moby#27596 added `--force` to `docker service update` - moby/moby#27857 added `--hostname` to `docker service create` - moby/moby#28031 added `--hosts`, `--host-add` / `--host-rm` to `docker service create` and `docker service update` - moby/moby#28076 added `--tty` on `docker service create` / `docker service update` - moby/moby#26421 added `--update-max-failure-ratio`, `--update-monitor` and `--rollback` on `docker service update` - moby/moby#27369 added `--health-cmd`, `--health-interval`, `--health-retries`, `--health-timeout` and `--no-healthcheck` options to `docker service create` and `docker service update` Signed-off-by: Sebastiaan van Stijn <github@gone.nl> Upstream-commit: 5d2722f83db9e301c6dcbe1c562c2051a52905db Component: cli
This adds bash completion for moby/moby#23430. Signed-off-by: Harald Albers <github@albersweb.de>
This adds bash completion for moby/moby#23430. Signed-off-by: Harald Albers <github@albersweb.de> Upstream-commit: 74a5d1af86a3d0898dda79bb13282f1b039b015c Component: cli
This adds bash completion for moby/moby#23430. Signed-off-by: Harald Albers <github@albersweb.de>
These options configure the parent cgroup, not the default for containers, nor the daemon itself, so adding that information to the flag description to make this slightly more clear. relates to 56f77d5 (moby#23430) which implemented these flags. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These options configure the parent cgroup, not the default for containers, nor the daemon itself, so adding that information to the flag description to make this slightly more clear. relates to 56f77d5 (moby#23430) which implemented these flags. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> Signed-off-by: zach <Zachary.Joyner@linux.com>
| } | ||
|
|
||
| cpuRealtimeRuntime := cgroupEnabled(mountPoint, "cpu.rt_runtime_us") | ||
| if !quiet && !cpuRealtimeRuntime { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not make sense to check these two files separately -- this is a single kernel feature, so these two are either both present or not (and if one present and the other is not this is some kind of a gross kernel error).
I am trying to fix this in #41014
| s.Linux.Resources.OOMScoreAdj = &c.HostConfig.OomScoreAdj | ||
| s.Linux.Sysctl = c.HostConfig.Sysctls | ||
|
|
||
| p := *s.Linux.CgroupsPath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no need to do all of the below in case CPU RT parameters are not set, or the kernel does not support CPU RT sched.
|
|
||
| daemon.initCgroupsPath(filepath.Dir(path)) | ||
|
|
||
| _, root, err := cgroups.FindCgroupMountpointAndRoot("cpu") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this call is very expensive and ideally should only be done once.
| } | ||
|
|
||
| path = filepath.Join(root, path) | ||
| sysinfo := sysinfo.New(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very expensive way to check whether CPU RT scheduler is supported, and it's also performed way too late in the code, and also multiple times :(
- What I did
Added support for two new flags
--cpu-rt-periodand--cpu-rt-runtimecombined with either the--privilegedflag or--cap-add=sys_nicethis allows processes within a container to prioritize themselves as real-time whenCONFIG_RT_GROUP_SCHEDis enabled in the kernel. See #22380. When this option is enabled each docker container's cgroup has to havecpu.rt_runtime_usset in order for the container to be able to prioritize its threads.- How I did it
Support for changing the cgroup value already existed in runc, I updated all the necessary container configs to pass these values down to runc. I also implemented some sanity checks in the daemon.
- How to verify it
Ensure
CONFIG_RT_GROUP_SCHEDis enabled in the kernel (standard CentOS 7 install has this enabled). You can verify it's enabled by runningzcat /proc/config.gz | grep CONFIG_RT_GROUP_SCHEDor by checking for the existence of/sys/fs/cgroup/cpu.rt_runtime_usYou will also need to ensure that the system and all parent cgroups have values set for the period and runtime as this limits what the children can be set to. If you're familiar with these values you can set them appropriately, sane defaults would be
rt_period_us = 1000000andrt_runtime_us = 950000, they only need to be set once on the system.Update: You can now set the values for the parent cgroups with a new set of daemon flags. The system value will still need to be set using sysctl.
docker daemon --cpu-rt-runtime=950000Now you have a system you can test on. To validate this works start a container with the new flags (we'll add a ulimit just to test that works as expected too)
docker run --it --cpu-rt-runtime=95000 --ulimit rtprio=99 --cap-add=sys_nice debian:jessieIf you receive an error like this, it's because your parent cgroups dont have
rt_runtime_usset, or it's a lower value than you're trying to set for your container. Also ensure they are set for your container/sys/fs/cgroup/docker/<container id>/cpu.rt_runtime_us(after the container has been started). if you're testing this inside a container.Once inside the container you can attempt to change the priority of your shell
chrt -r -p 99 1Then validate by calling
chrt -p 1You can also test the following error conditions.
sys_nicecapability or beingprivilegedresults in error (no point setting your runtime if you dont have permission to change your priority)- Description for the changelog
Support for --cpu-rt-period and --cpu-rt-runtime flags, allowing containers to run real-time threads when CONFIG_RT_GROUP_SCHED is enabled in the kernel.
- A picture of a cute animal (not mandatory but encouraged)