-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Add default sysctls to allow ping sockets and privileged ports with no capabilities #41030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Seems to be erroring in user namespace, need to exclude this case apparently. |
daemon/oci_linux.go
Outdated
| s.Linux.Sysctl["net.ipv4.ping_group_range"] = "0 2147483647" | ||
| // allow opening any port less than 1024 without CAP_NET_BIND_SERVICE | ||
| s.Linux.Sysctl["net.ipv4.ip_unprivileged_port_start"] = "0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are custom values already set at this point? If so, should we skip these if a custom value was set? (not sure if someone would have any reason to)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosifkit looking over my shoulder while I'm cheering for this glorious PR also asked if one or both of these should be specifically excluded when using --network=host? (it's not clear from the code if this is already the case 🙈)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! 👍 (not sure what it does when manually specifying --sysctl 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it "works" just fine? 😅
$ docker run --init -it --rm --network host --sysctl net.ipv4.ip_unprivileged_port_start=0 --user 1000:1000 tianon/network-toolbox nc -vvv -l -p 123
Listening on [0.0.0.0] (family 2, port 123)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and persistently adjusts the value for my host: 🥳
$ cat /proc/sys/net/ipv4/ip_unprivileged_port_start
0There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did it change your host's configuration? 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL; race condition in comments.
Looks like that's a .... "feature"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, better exclude net=host too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow-up on this thread, @kolyshkin separately fixed the "--sysctl=net.* + --net=host changes the host's settings" part of this discussion in runc itself via opencontainers/runc#2706 (issue opencontainers/runc#2693). 😄 🎉
$ docker run --init -it --rm --network host --sysctl net.ipv4.ip_unprivileged_port_start=0 --user 1000:1000 tianon/network-toolbox nc -vvv -l -p 123
docker: Error response from daemon: OCI runtime create failed: sysctl "net.ipv4.ip_unprivileged_port_start" not allowed in host network namespace: unknown.d733554 to
241fbd8
Compare
|
Looks like userns is still failing; Could that be because of the order in which options are applied (both Lines 983 to 999 in 07e6b84
Or is it something in how those tests are setup? |
|
Also wondering if we should set the defaults inside |
241fbd8 to
2e2d3e8
Compare
|
I think |
Yes, agreed. Should we make |
2e2d3e8 to
fc7b8d7
Compare
|
For host networking, it would be all sysctls starting with "net.*" that should be blocked. Thats independent of this PR but I think it does make sense. Maybe it should happen in runc though, which is where non namespaced sysctls are blocked now? |
d3e1917 to
bd4bd48
Compare
|
Looks like there's a panic on ppc64le; |
|
That error seems to come from runc; @kolyshkin @AkihiroSuda |
|
I am just adding checks for whether the sysctl node exists, which should work around this, although it was mainly designed around old kernels. However it shouldn't panic. |
65bcfa9 to
ed291d9
Compare
…o capabilities Currently default capability CAP_NET_RAW allows users to open ICMP echo sockets, and CAP_NET_BIND_SERVICE allows binding to ports under 1024. Both of these are safe operations, and Linux now provides ways that these can be set, per container, to be allowed without any capabilties for non root users. Enable these by default. Users can revert to the previous behaviour by overriding the sysctl values explicitly. Signed-off-by: Justin Cormack <justin.cormack@docker.com>
ed291d9 to
dae652e
Compare
|
The panic is not from runc, but from the test case itself, here: moby/integration/build/build_test.go Lines 379 to 383 in 0a82354
It panics right there, line 383. The panic fix is diff --git a/integration/build/build_test.go b/integration/build/build_test.go
index edbfb37935..cd682a5e58 100644
--- a/integration/build/build_test.go
+++ b/integration/build/build_test.go
@@ -378,7 +378,7 @@ RUN cat somefile`
imageIDs, err := getImageIDsFromBuild(out.Bytes())
assert.NilError(t, err)
- assert.Check(t, is.Equal(3, len(imageIDs)))
+ assert.Assert(t, is.Equal(3, len(imageIDs)))
image, _, err := apiclient.ImageInspectWithRaw(context.Background(), imageIDs[2])
assert.NilError(t, err) |
|
oh, thanks! I must've read the output wrong 😅 |
kolyshkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thaJeztah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
(asked @kolyshkin to double-check the approach for sysctlExists() 🤗)
As @dfangl pointed out: Binding port 53 might require root permissions for binding port < 1024 depending on the Docker version moby/moby#41030
As @dfangl pointed out: Binding port 53 might require root permissions for binding port < 1024 depending on the Docker version moby/moby#41030
In this PR grafana/agent#6817 the setcap was added for the alloy/agent binary for some reason unknown to me. Since the docker engine allows binding to unpriviliged ports already for a long time via moby/moby#41030, which was implemented in the Docker engine v20.10.x and also other container runtimes. This should solve the following issues. Fixes grafana#117 Fixes grafana#303
In this PR grafana/agent#6817 the setcap was added for the alloy/agent binary for some reason unknown to me. Since the docker engine allows binding to unpriviliged ports already for a long time via moby/moby#41030, which was implemented in the Docker engine v20.10.x and also other container runtimes. This should solve the following issues. Fixes grafana#117 Fixes grafana#303
In this PR grafana/agent#6817 the setcap was added for the alloy/agent binary for some reason unknown to me. Since the docker engine allows binding to unpriviliged ports already for a long time via moby/moby#41030, which was implemented in the Docker engine v20.10.x and also other container runtimes. This should solve the following issues. Fixes #177 Fixes #303
In this PR grafana/agent#6817 the setcap was added for the alloy/agent binary for some reason unknown to me. Since the docker engine allows binding to unpriviliged ports already for a long time via moby/moby#41030, which was implemented in the Docker engine v20.10.x and also other container runtimes. This should solve the following issues. Fixes grafana#177 Fixes grafana#303
|
For docker compose file use this instead ...
sysctls:
net.ipv4.ip_unprivileged_port_start: 0
user: "${UID}:${GID}"
... |
|
@lukanoluwatobi this option is set by default on current versions of the engine (20.10 and up), so I don't think there should be a need to manually pass that option. Was there a reason you had to set it manually? |
fixes #8460
fixes #38664
Currently default capability CAP_NET_RAW allows users to open ICMP echo
sockets, and CAP_NET_BIND_SERVICE allows binding to ports under 1024.
Both of these are safe operations, and Linux now provides ways that
these can be set, per container, to be allowed without any capabilties
for non root users. Enable these by default. Users can revert to the
previous behaviour by overriding the sysctl values explicitly.
Signed-off-by: Justin Cormack justin.cormack@docker.com
(picture by https://twitter.com/CountrysideBen)