cgroups: ebpf: use link.Anchor to check for BPF_F_REPLACE support#4548
cgroups: ebpf: use link.Anchor to check for BPF_F_REPLACE support#4548thaJeztah merged 3 commits intoopencontainers:mainfrom
Conversation
In v0.13.0, cilium/ebpf stopped supporting setting BPF_F_REPLACE as an explicit flag and instead requires us to use link.Anchor to specify where the program should be attached. Commit 216175a ("Upgrade Cilium's eBPF library version to 0.16") did update this correctly for the actual attaching logic, but when checking for kernel support we still passed BPF_F_REPLACE. This would result in a generic error being returned, which our feature-support checking logic would treat as being an error the indicates that BPF_F_REPLACE *is* supported, resulting in a regression on pre-5.6 kernels. It turns out that our debug logging saying that this unexpected error was happening was being output as a result of this change, but nobody noticed... Fixes: 216175a ("Upgrade Cilium's eBPF library version to 0.16") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
It is possible for LinkAttachProgram to return ErrNotSupported if program attachment is not supported at all (which doesn't matter in this case), but it seems possible that upstream will start returning ErrNotSupported for BPF_F_REPLACE at some point so it's best to make sure we don't cause additional regressions here. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
If we get an unexpected error here, it is probably because of a library or kernel change that could cause our detection logic to be invalid. As a result, these warnings should be louder so users have a chance to tell us about them sooner (or so we might notice them before doing a release, as happened with the 1.2.0 regression). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
|
It turns out we've had this debug logging message since #4397 was merged (which indicated that we had a regression on #3008), but nobody noticed: |
|
Nice catch, thanks for spotting this. If there's anything in particular I can do to help let me know. |
kolyshkin
left a comment
There was a problem hiding this comment.
LGTM except a minor nit
| } | ||
| // attach_flags test succeeded. | ||
| if !errors.Is(err, unix.EBADF) { | ||
| logrus.Debugf("checking for BPF_F_REPLACE: got unexpected (not EBADF or EINVAL) error: %v", err) |
There was a problem hiding this comment.
nit: the message is not entirely correct now (I guess we can just drop the (not EBADF or EINVAL) part).
|
Looks like in order to catch this in CI we'd need a combination of kernel < v5.6 and cgroup v2? This is why it went unnoticed -- we don't have that (and it's hard to get in CI -- something like Ubuntu 20.04 with an explicit switch to cgroup v2 and kernel not upgraded to HWE one). |
|
Yeah, though we would've caught that something was going wrong if the debug messages had been warnings (because all systems were returning the wrong error since the update patch). |
In v0.13.0, cilium/ebpf stopped supporting setting BPF_F_REPLACE as an
explicit flag and instead requires us to use link.Anchor to specify
where the program should be attached.
Commit 216175a ("Upgrade Cilium's eBPF library version to 0.16")
did update this correctly for the actual attaching logic, but when
checking for kernel support we still passed BPF_F_REPLACE. This would
result in a generic error being returned, which our feature-support
checking logic would treat as being an error the indicates that
BPF_F_REPLACE is supported, resulting in a regression on pre-5.6
kernels.
It turns out that our debug logging saying that this unexpected error
was happening was being output as a result of this change, but nobody
noticed...
Fixes: 216175a ("Upgrade Cilium's eBPF library version to 0.16")
Fixes #3008
Signed-off-by: Aleksa Sarai cyphar@cyphar.com