[v1.18] loader: XDP attach type fallback logic#44499
Conversation
[ upstream commit 62f4856 ] The commit adds retry logic in case the XDP program loading failed with the `invalid argument` error. The error might indicate that the network interface is configured with a jumbo MTU, so we can retry loading after setting the `BPF_F_XDP_HAS_FRAGS` flag and hope that the NIC driver is XDP Fragment aware. Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit 7c633a7 ] Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
[ upstream commit c861775 ] Due to changes in newer kernels and the cilium/ebpf library, XDP programs will in future be loaded as ebpf.AttachXDP. However, older Cilium versions will have already created links with ebpf.AttachNone programs. The kernel does not allow us to change the program of a link if its attach type does not match. This means that we can only use the new XDP attach type when a link is newly created. This commit adds logic which detects errors on link update and attempts to load and attach with the other attach type instead. So when upgrading from an older version to a newer version, new links are created as XDP attach type, but existing links will remain using the AttachNone. On downgrade, all links will be created with AttachNone, and existing links will continue to use AttachXDP. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> Co-authored-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com>
This commit is unique to release branches. ebpf-go will now return AttachXDP as the attach type of XDP programs by default. This is something the Cilium versions that want to upgrade to/from versions using the new ebpf-go release need to be aware of. This commit restores the old behaviour of the library on top of having the retry loop added in the previous commit, making sure we don't use the new attach type unless strictly necessary. Signed-off-by: Timo Beckers <timo@isovalent.com>
|
/test |
|
@ti-mo @dylandreimerink @viktor-kurchenko Looks like this PR broke GKE across the board for v1.18 - at least reverting it allows Cilium to start up again. Given that |
I won't be able to look today. So, I don't mind to revert it and investigate after. |
|
It seems this isn't limited to GKE: updating to Cilium 1.18.8 also broke Talos v1.12.4. As a result, pods are failing to start with the error: |
Cilium 1.18.8 breaks Talos due to XDP attach fallback logic (cilium/cilium#44499) causing BPF dead code elimination probe failures and hive startup timeout panics on both nodes. Upgrade to 1.19.x and apply required migration changes: - CiliumLoadBalancerIPPool apiVersion v2alpha1 -> v2 - Explicitly enable mesh authentication (default changed in 1.19) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Once this PR is merged, a GitHub action will update the labels of these PRs: