Skip to content

Conversation

@kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented May 22, 2020

A couple of patches touching CPU RT and CFS setting code. Should improve performance (in CPU RT case) and help adopting the runc changes in opencontainers/runc#2411

Untangle CPU RT controller init

Commit 56f77d5 added code that is doing some very ugly things.
In partucular, calling cgroups.FindCgroupMountpointAndRoot() and
daemon.SysInfoRaw() inside a recursively-called initCgroupsPath()
is not a good thing to do.

This commit tries to partially untangle this by moving some expensive
checks and calls up in the stack, so they at least won't be called multiple
times. This is done in a minimally invasive way (meaning I
tried hard to not break any logic, however weird it is).

This also removes double call to MkdirAll (not important, but it sticks
out) and renames the function to better reflect what it's doing.

This could be reworked more radically, but at least this this commit
we are calling expensive functions once, and only if necessary.

pkg/sysinfo: rm duplicates

The CPU CFS cgroup-aware scheduler is one single kernel feature, not
two, so it does not make sense to have two separate booleans
(CPUCfsQuota and CPUCfsPeriod). Merge these into CPUCfs.

Same for CPU realtime.

For compatibility reasons, /info stays the same for now.

Closes: #39704

@kolyshkin
Copy link
Contributor Author

@thaJeztah PTAL

@thaJeztah thaJeztah added kind/refactor PR's that refactor, or clean-up code status/2-code-review labels May 25, 2020
@thaJeztah
Copy link
Member

I seem to recall there was some "setup" needed for this functionality in #23430 (parent cgroup for this to be created so that child containers got a share of this?) don't recall exactly

@thaJeztah
Copy link
Member

Oh, perhaps I'm recalling #29846, where I was looking at that 🤔

@kolyshkin
Copy link
Contributor Author

Oh, perhaps I'm recalling #29846, where I was looking at that 

I think I'm not breaking what was fixed in there, I'm just moving this check earlier in the code.

@thaJeztah
Copy link
Member

@kolyshkin Looks like there's a failure:

[2020-05-26T02:04:17.430Z] === Failed
[2020-05-26T02:04:17.430Z] === FAIL: runconfig TestValidateResources (0.00s)
[2020-05-26T02:04:17.430Z]     hostconfig_test.go:290: Expected valid configuration Your kernel does not support CPU real-time scheduler
[2020-05-26T02:04:17.430Z] 

@thaJeztah
Copy link
Member

Perhaps would be good to update TestValidateResources to use sub-tests while were changing it (so that if one test-case fails, it's easier to find which one

@kolyshkin
Copy link
Contributor Author

hostconfig_test.go:290: Expected valid configuration Your kernel does not support CPU real-time scheduler

My bad, fixed:

-       if hc.Resources.CPURealtimePeriod != 0 || hc.Resources.CPURealtimeRuntime != 0 && !si.CPURealtime {
+       if (hc.Resources.CPURealtimePeriod != 0 || hc.Resources.CPURealtimeRuntime != 0) && !si.CPURealtime {

@kolyshkin
Copy link
Contributor Author

Ah

daemon/daemon_unix.go:1718:23: method initCpuRtController should be initCPURtController (golint)

This looks too opinionated to me, but fixed.

(technically, we should change Rt to RT as well (RealTime) but IMO it will look uglier this way)

@kolyshkin
Copy link
Contributor Author

CI failure in logging looks unrelated

Comment on lines 1707 to 1703
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated question; if we recursively set this cgroup, does each recursion limit the rc period/runtime further as a part of its parent? (IoW, are they cummulative?) So if we would set the daemon's --cgroup-parent=/one/two/three, then the result would be different than --cgroup-parent=/one ? Because in that case, we should only set the cgroup once (for the docker "root" cgroup)? (or is it not possible to "skip" parent cgroups?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good question that I am also curious about (and this is that time that I wish I could have asked it during the review of #23430).

Some docs are available at https://www.kernel.org/doc/html/latest/scheduler/sched-rt-group.html

From what I understand, this is not a limit but rather a hard allocation (meaning the cgroup is given that amount of CPU time exclusively). So, you can think of it as a disk space -- if your want to have 10 GB free in /some/fancy/dir, you have to have it in its parent (/some/fancy) as well.

The idea of making dockerd set this for all parent cgroups seems questionable to me, but now we have what we have, and removing this will surely break backward compatibility.

@thaJeztah thaJeztah added this to the 20.03.0 milestone Jun 25, 2020
@kolyshkin
Copy link
Contributor Author

CI failure seem unrelated

[2020-06-24T15:11:54.201Z] #25 [frozen-images 3/3] RUN /download-frozen-image-v2.sh /build         buil...
[2020-06-24T15:11:54.201Z] #25 21.56 curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to docker-images-prod.s3.amazonaws.com:443 
[2020-06-24T15:11:54.201Z] #25 ERROR: executor failed running [/bin/sh -c /download-frozen-image-v2.sh /build         buildpack-deps:jessie@sha256:dd86dced7c9cd2a724e779730f0a53f93b7ef42228d4344b25ce9a42a1486251         busybox:latest@sha256:bbc3a03235220b170ba48a157dd097dd1379299370e1ed99ce976df0355d24f0         busybox:glibc@sha256:0b55a30394294ab23b9afd58fab94e61a923f5834fba7ddbae7f8e0c11ba85e6         debian:jessie@sha256:287a20c5f73087ab406e6b364833e3fb7b3ae63ca0eb3486555dc27ed32c6e60         hello-world:latest@sha256:be0cd392e45be79ffeffa6b05338b98ebb16c87b255f48e297ec7f98e123905c]: runc did not terminate sucessfully

The CPU CFS cgroup-aware scheduler is one single kernel feature, not
two, so it does not make sense to have two separate booleans
(CPUCfsQuota and CPUCfsPeriod). Merge these into CPUCfs.

Same for CPU realtime.

For compatibility reasons, /info stays the same for now.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit 56f77d5 added code that is doing some very ugly things.
In partucular, calling cgroups.FindCgroupMountpointAndRoot() and
daemon.SysInfoRaw() inside a recursively-called initCgroupsPath()
not not a good thing to do.

This commit tries to partially untangle this by moving some expensive
checks and calls earlier, in a minimally invasive way (meaning I
tried hard to not break any logic, however weird it is).

This also removes double call to MkdirAll (not important, but it sticks
out) and renames the function to better reflect what it's doing.

Finally, this wraps some of the errors returned, and fixes the init
function to not ignore the error from itself.

This could be reworked more radically, but at least this this commit
we are calling expensive functions once, and only if necessary.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

Another CI failure, also seems unrelated

[2020-06-27T00:17:28.419Z] /usr/local/cli/docker: error pulling image configuration: Get https://docker-images-prod.s3.amazonaws.com/registry-v2/docker/registry/v2/blobs/sha256/7a/7a2331af229207e4ebade5cdaf5c86d971aea59adbb37e126ec8da602021f0bb/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA2KUBRXV6BQ6YTP6Q%2F20200627%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200627T000129Z&X-Amz-Expires=1200&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEDsaCXVzLWVhc3QtMSJHMEUCIG46%2F3u09p5heILsyt8zGwDXH4nvnxFZ7B68vYThh%2BFzAiEAvzdER3%2Bwzvt8zACAF2%2F6Y4QHjsqrlNK8%2BVw1CRo9%2BA0qvQMIxP%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAAGgw3MTAwMTUwNDA4OTIiDBwauq4xP6ISmagH7CqRAzLSe5TT8gMiH8cwK3wDv5bvGQx0SeiEocxZEaToUMexpu95S0zFtLofCotem74fzlzUP5eVsQhsCZw%2B4Cx9MkhhifMJKkEt42%2BTLRffmkUxSDpQZSd%2F7juk4pSNUvvpugkVREI1W52QmQ85IRiHjn0L8SXgdJL1lz4uCxsUFsOUYbvJRiRZYP77jL93mGu5QoDAIwnO%2FMtDnJv%2BulJYEoHHS2wACTMKVohd2rNhbd4NXmEc8pwc0o6h1TVQvNopTC1af6hPFsWKbSYEWHvz4N0lBbq5gWWILc29T7S%2FAt%2B%2BZWGT6D9VaOApZh6uNxdVXl9n6xgFzCvnX27xBNOBaqAFNbu%2FlWIVDSN%2Bvqyp02kN6wteSGkwE7dHEoEPeGrY3jjvO3kRSQ%2BB0vGz1ge%2BrigHrmlESHayRZfE6I%2BDik0yc7grPvjGAKEVj8O82tLSNo%2FJ%2BqS%2F2GriajsKVsmPmhbdRc01PoBtFPStnxlJpknHx6HyTcN5xNeAVRgskTd7afZHM4Zfy3%2FGuLlOHsJV4y%2BuMNeA2fcFOusBGBLi6rA4f6LzFKoaWhpJw6MxAVcGlQ8w7Jbri1aWiMQn%2FS38P2ZEilbcZkPv%2BTgW4Pi5bW0L5AyUE491APKyPP%2BsR8tG1T5R%2B8z6v8829reS1wpHeH%2Fnz%2BljiJyM%2BInswPTw3nAW96ZS35XJYD%2FFeiR2d7TLWngL0EdZQC8Xtfu0ta3rcSnjALW7Bdnsec2KRjmdwPjwhzhubxRoaM%2BYnE4dnswjciESwNPEtbK5it5J%2FvHayao2IDSvNFN7Y1WYRu7fJ%2FB5Em5RFd4%2B%2FZpIoa4cUaVA7swLg6t3RqlKCAZSQKva3n0t%2Beckyw%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=211f17cd2a787d5d57cbb6f15cf26becf169eea4c104b65a8aeba57a41fbcc4f: net/http: TLS handshake timeout.

@tonistiigi tonistiigi requested a review from AkihiroSuda July 2, 2020 18:29
@thaJeztah
Copy link
Member

@AkihiroSuda if you have time, could you have another look? I'll try to review as well

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cpuguy83 cpuguy83 merged commit 260c26b into moby:master Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/refactor PR's that refactor, or clean-up code status/2-code-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants