libct/cg/sd: fix SkipDevices for systemd#2958
Merged
cyphar merged 2 commits intoopencontainers:masterfrom May 28, 2021
Merged
Conversation
cyphar
reviewed
May 23, 2021
Commit 108ee85 adds SkipDevices flag, which is used by kubernetes to create cgroups for pods. Unfortunately the above commit falls short, and systemd DevicePolicy and DeviceAllow properties are still set, which requires kubernetes to set "allow everything" rule. This commit fixes this: if SkipDevices flag is set, we return Device* properties to allow all devices. Fixes: 108ee85 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
fb538d6 to
752e7a8
Compare
Member
|
Would it be possible to have some tests? |
Contributor
Author
I started working on something that mimics the kubelet behavior... |
99fde82 to
b9347d1
Compare
Member
|
CentOS failure seems legit -- SELinux? |
Member
|
Once this is merged I will send out the v1.0.0 vote (it would be nice to do the release -- as @h-vetinari suggested -- on the 3rd of June since that's the 5-year anniversary of 1.0.0-rc1). |
The idea is to mimic what kubelet is doing, with minimum amount of code. First, create a slice with SkipDevices=true. It should have access to all devices. Next, create a scope within the above slice, allowing access to /dev/full only. Check that within that scope we can only access /dev/full and not other devices (such as /dev/null). Repeat the test with SkipDevices=false, make sure we can not access any devices (as they are disallowed by a parent cgroup). This is done only to assess the test correctness. NOTE that cgroup v1 and v2 behave differently for SkipDevices=false case, and thus the check is different. Cgroup v1 returns EPERM on writing to devices.allow, so cgroup manager's Set() fails, and we check for a particular error from m.Set(). Cgroup v2 allows to create a child cgroup, but denies access to any device (despite access being enabled) -- so we check the error from the shell script running in that cgroup. Again, this is only about SkipDevices=false case. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
b9347d1 to
0e16e7c
Compare
Contributor
Author
|
Test case fixed, PR description updated. |
mrunalp
approved these changes
May 26, 2021
AkihiroSuda
approved these changes
May 27, 2021
cyphar
approved these changes
May 28, 2021
Member
cyphar
left a comment
There was a problem hiding this comment.
LGTM. Will send the v1.0.0 vote out today.
Maks1mS
pushed a commit
to stplr-dev/stplr
that referenced
this pull request
Mar 14, 2026
This PR contains the following updates: | Package | Type | Update | Change | OpenSSF | |---|---|---|---|---| | [github.com/opencontainers/runc](https://github.com/opencontainers/runc) | require | patch | `v1.4.0` → `v1.4.1` | [](https://securityscorecards.dev/viewer/?uri=github.com/opencontainers/runc) | --- >⚠️ **Warning** > > Some dependencies could not be looked up. Check the [Dependency Dashboard](issues/23) for more information. --- ### Release Notes <details> <summary>opencontainers/runc (github.com/opencontainers/runc)</summary> ### [`v1.4.1`](https://github.com/opencontainers/runc/blob/HEAD/CHANGELOG.md#100---2021-06-22) [Compare Source](opencontainers/runc@v1.4.0...v1.4.1) > A wizard is never late, nor is he early, he arrives precisely when he means > to. As runc follows Semantic Versioning, we will endeavour to not make any breaking changes without bumping the major version number of runc. However, it should be noted that Go API usage of runc's internal implementation (libcontainer) is *not* covered by this policy. ##### Removed - Removed libcontainer/configs.Device\* identifiers (deprecated since rc94, use libcontainer/devices). ([#​2999](opencontainers/runc#2999)) - Removed libcontainer/system.RunningInUserNS function (deprecated since rc94, use libcontainer/userns). ([#​2999](opencontainers/runc#2999)) ##### Deprecated - The usage of relative paths for mountpoints will now produce a warning (such configurations are outside of the spec, and in future runc will produce an error when given such configurations). ([#​2917](opencontainers/runc#2917), [#​3004](opencontainers/runc#3004)) ##### Fixed - cgroupv2: devices: rework the filter generation to produce consistent results with cgroupv1, and always clobber any existing eBPF program(s) to fix `runc update` and avoid leaking eBPF programs (resulting in errors when managing containers). ([#​2951](opencontainers/runc#2951)) - cgroupv2: correctly convert "number of IOs" statistics in a cgroupv1-compatible way. ([#​2965](opencontainers/runc#2965), [#​2967](opencontainers/runc#2967), [#​2968](opencontainers/runc#2968), [#​2964](opencontainers/runc#2964)) - cgroupv2: support larger than 32-bit IO statistics on 32-bit architectures. - cgroupv2: wait for freeze to finish before returning from the freezing code, optimize the method for checking whether a cgroup is frozen. ([#​2955](opencontainers/runc#2955)) - cgroups/systemd: fixed "retry on dbus disconnect" logic introduced in rc94 - cgroups/systemd: fixed returning "unit already exists" error from a systemd cgroup manager (regression in rc94). ([#​2997](opencontainers/runc#2997), [#​2996](opencontainers/runc#2996)) ##### Added - cgroupv2: support SkipDevices with systemd driver. ([#​2958](opencontainers/runc#2958), [#​3019](opencontainers/runc#3019)) - cgroup1: blkio: support BFQ weights. ([#​3010](opencontainers/runc#3010)) - cgroupv2: set per-device io weights if BFQ IO scheduler is available. ([#​3022](opencontainers/runc#3022)) ##### Changed - cgroup/systemd: return, not ignore, stop unit error from Destroy. ([#​2946](opencontainers/runc#2946)) - Fix all golangci-lint failures. ([#​2781](opencontainers/runc#2781), [#​2962](opencontainers/runc#2962)) - Make `runc --version` output sane even when built with `go get` or otherwise outside of our build scripts. ([#​2962](opencontainers/runc#2962)) - cgroups: set SkipDevices during runc update (so we don't modify cgroups at all during `runc update`). ([#​2994](opencontainers/runc#2994)) <!-- minor releases --> [Unreleased]: opencontainers/runc@v1.3.0-rc.1...HEAD [1.3.0]: opencontainers/runc@v1.3.0-rc.2...v1.3.0 [1.2.0]: opencontainers/runc@v1.2.0-rc.1...v1.2.0 [1.1.0]: opencontainers/runc@v1.1.0-rc.1...v1.1.0 [1.0.0]: https://github.com/opencontainers/runc/releases/tag/v1.0.0 <!-- 1.0.z patch releases --> [Unreleased 1.0.z]: opencontainers/runc@v1.0.3...release-1.0 [1.0.3]: opencontainers/runc@v1.0.2...v1.0.3 [1.0.2]: opencontainers/runc@v1.0.1...v1.0.2 [1.0.1]: opencontainers/runc@v1.0.0...v1.0.1 <!-- 1.1.z patch releases --> [Unreleased 1.1.z]: opencontainers/runc@v1.1.15...release-1.1 [1.1.15]: opencontainers/runc@v1.1.14...v1.1.15 [1.1.14]: opencontainers/runc@v1.1.13...v1.1.14 [1.1.13]: opencontainers/runc@v1.1.12...v1.1.13 [1.1.12]: opencontainers/runc@v1.1.11...v1.1.12 [1.1.11]: opencontainers/runc@v1.1.10...v1.1.11 [1.1.10]: opencontainers/runc@v1.1.9...v1.1.10 [1.1.9]: opencontainers/runc@v1.1.8...v1.1.9 [1.1.8]: opencontainers/runc@v1.1.7...v1.1.8 [1.1.7]: opencontainers/runc@v1.1.6...v1.1.7 [1.1.6]: opencontainers/runc@v1.1.5...v1.1.6 [1.1.5]: opencontainers/runc@v1.1.4...v1.1.5 [1.1.4]: opencontainers/runc@v1.1.3...v1.1.4 [1.1.3]: opencontainers/runc@v1.1.2...v1.1.3 [1.1.2]: opencontainers/runc@v1.1.1...v1.1.2 [1.1.1]: opencontainers/runc@v1.1.0...v1.1.1 [1.1.0-rc.1]: opencontainers/runc@v1.0.0...v1.1.0-rc.1 <!-- 1.2.z patch releases --> [Unreleased 1.2.z]: opencontainers/runc@v1.2.9...release-1.2 [1.2.9]: opencontainers/runc@v1.2.8...v1.2.9 [1.2.8]: opencontainers/runc@v1.2.7...v1.2.8 [1.2.7]: opencontainers/runc@v1.2.6...v1.2.7 [1.2.6]: opencontainers/runc@v1.2.5...v1.2.6 [1.2.5]: opencontainers/runc@v1.2.4...v1.2.5 [1.2.4]: opencontainers/runc@v1.2.3...v1.2.4 [1.2.3]: opencontainers/runc@v1.2.2...v1.2.3 [1.2.2]: opencontainers/runc@v1.2.1...v1.2.2 [1.2.1]: opencontainers/runc@v1.2.0...v1.2.1 [1.2.0-rc.3]: opencontainers/runc@v1.2.0-rc.2...v1.2.0-rc.3 [1.2.0-rc.2]: opencontainers/runc@v1.2.0-rc.1...v1.2.0-rc.2 [1.2.0-rc.1]: opencontainers/runc@v1.1.0...v1.2.0-rc.1 <!-- 1.3.z patch releases --> [Unreleased 1.3.z]: opencontainers/runc@v1.3.4...release-1.3 [1.3.4]: opencontainers/runc@v1.3.3...v1.3.4 [1.3.3]: opencontainers/runc@v1.3.2...v1.3.3 [1.3.2]: opencontainers/runc@v1.3.1...v1.3.2 [1.3.1]: opencontainers/runc@v1.3.0...v1.3.1 [1.3.0]: opencontainers/runc@v1.3.0-rc.2...v1.3.0 [1.3.0-rc.2]: opencontainers/runc@v1.3.0-rc.1...v1.3.0-rc.2 [1.3.0-rc.1]: opencontainers/runc@v1.2.0...v1.3.0-rc.1 <!-- 1.4.z patch releases --> [Unreleased 1.4.z]: opencontainers/runc@v1.4.1...release-1.4 [1.4.1]: opencontainers/runc@v1.4.0...v1.4.1 [1.4.0]: opencontainers/runc@v1.4.0-rc.3...v1.4.0 [1.4.0-rc.3]: opencontainers/runc@v1.4.0-rc.2...v1.4.0-rc.3 [1.4.0-rc.2]: opencontainers/runc@v1.4.0-rc.1...v1.4.0-rc.2 [1.4.0-rc.1]: opencontainers/runc@v1.3.0...v1.4.0-rc.1 <!-- 1.5.z patch releases --> [Unreleased 1.5.z]: opencontainers/runc@v1.5.0-rc.1...release-1.5 [1.5.0-rc.1]: opencontainers/runc@v1.4.0...v1.5.0-rc.1 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At 12:00 AM through 04:59 AM and 10:00 PM through 11:59 PM, Monday through Friday ( * 0-4,22-23 * * 1-5 ), Only on Sunday and Saturday ( * * * * 0,6 ) (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My41OS40IiwidXBkYXRlZEluVmVyIjoiNDMuNTkuNCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiS2luZC9EZXBlbmRlbmNpZXMiXX0=--> Reviewed-on: https://altlinux.space/stapler/stplr/pulls/361 Co-authored-by: Renovate Bot <stapler-helper-bot@noreply.altlinux.space> Co-committed-by: Renovate Bot <stapler-helper-bot@noreply.altlinux.space>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
libct/cg/sd: fix SkipDevices for systemd
Commit 108ee85 (PR libct/cgroups: add SkipDevices to Resources #2490) adds
SkipDevicesflag, which is used by kubernetesto create cgroups for pods.
Unfortunately the above commit falls short, and systemd
DevicePolicyandDeviceAllowproperties are still set, which requires kubernetes to set"allow everything" rule.
This commit fixes this: if
SkipDevicesflag is set, we returnDevice*properties to allow all devices.libct/cg/sd: add SkipDevices unit test
The idea is to mimic what kubelet is doing, with minimum amount of code.
First, create a slice with
SkipDevices=true. It should have access toall devices.
Next, create a scope within the above slice, allowing access to
/dev/fullonly.
Check that within that scope we can only access
/dev/fulland not otherdevices (such as
/dev/null).Repeat the test with
SkipDevices=false, make sure we can not access anydevices (as they are disallowed by a parent cgroup). This is done only
to assess the test correctness.
NOTE that cgroup v1 and v2 behave differently for
SkipDevices=falsecase, and thus the check is different. Cgroup v1 returns
EPERMonwriting to
devices.allow, so cgroup manager'sSet()fails, and we checkfor a particular error from
m.Set(). Cgroup v2 allows to create a childcgroup, but denies access to any device (despite access being enabled)
-- so we check the error from the shell script running in that cgroup.
Again, this is only about
SkipDevices=falsecase.Previous discussions on topic: #2490 (comment), kubernetes/kubernetes#92862 (comment)
Fixes: 108ee85