seccomp: Block AF_ALG in default socket policy#13327
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the default Linux seccomp profile to reduce kernel attack surface by denying socket(AF_ALG, ...) (Linux crypto API) in addition to the already-denied AF_VSOCK, implemented via range-based argument filters on the socket syscall.
Changes:
- Replaces the single
socket(arg0 != AF_VSOCK)rule with three allow rules to excludeAF_ALGandAF_VSOCK. - Introduces
< AF_ALG,== AF_ALG+1, and> AF_VSOCKcomparisons to cover all other socket domains.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add a comment explaining the purpose of the socket rules and noting that on 32-bit x86, socket() goes through socketcall(2) which is allowed unconditionally, so these arg filters only apply to the direct socket syscall. Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
samuelkarp
left a comment
There was a problem hiding this comment.
I tested this and the new seccomp rule appears to work.
When testing with ctr run:
- The default is to use the image-provided UID/GID, so to force a non-root user use
--user 1000:1000(or similar) "noNewPrivileges": trueis default in the generated OCI config and this appears to prevent the PoC, so--allow-new-privsshould be used- seccomp is disabled by default, but can be enabled with
--seccomp - The
docker.io/library/python:3image is convenient for running the published PoC
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I also got moby/profiles#21 ready tonight. |
Separate PR is good; I see you already opened #13330 |
| _ [38]byte = [unix.AF_ALG]byte{} | ||
| _ [40]byte = [unix.AF_VSOCK]byte{} | ||
| _ [1]byte = [unix.AF_VSOCK - unix.AF_ALG - 1]byte{} | ||
| ) |
There was a problem hiding this comment.
Is this assertion needed?
These constants are not platform-dependent
https://cs.opensource.google/go/x/sys/+/refs/tags/v0.44.0:unix/zerrors_linux.go
There was a problem hiding this comment.
No, I meant it as "code as documentation" but reading it back I agree that's probably not that readable.
Removed it and turned it into a proper comment near the actual rules.
AF_ALG (address family 38) exposes the Linux kernel crypto API to userspace via socket(2). Containers have no legitimate need for this interface under the default profile, and leaving it accessible widens the kernel attack surface unnecessarily (see https://copy.fail/). Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
|
/cherry-pick release/2.3 |
|
@AkihiroSuda: new pull request created: #13406 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@AkihiroSuda: new pull request created: #13407 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@AkihiroSuda: new pull request created: #13408 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@AkihiroSuda: new pull request created: #13409 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@dmcgowan @AkihiroSuda @vvoland Have you seen this revert: moby/profiles@3c28324 ? |
|
The revert was for that part: #13330 - not this PR. |
Addresses
CVE-2026-31431.Note: This doesn't block usage via
socketcall.AF_ALG (address family 38) exposes the Linux kernel crypto API to userspace via socket(2). Containers have no legitimate need for this interface under the default profile, and leaving it accessible widens the kernel attack surface unnecessarily (see https://copy.fail/).
The previous socket rule used a single "arg0 != AF_VSOCK" condition. Adding a second OpNotEqual for AF_ALG does not work because seccomp evaluates multiple argument conditions within a single rule as a logical AND against the same argument index.
Instead, restructure the socket allowlist into three range-based rules that cover every domain except AF_ALG (38) and AF_VSOCK (40):
Domains 38 and 40 match none of the three rules and fall through to the default SCMP_ACT_ERRNO action.
Port of moby/profiles#20