-
Notifications
You must be signed in to change notification settings - Fork 3.7k
CI: Cilium E2E Upgrade: no-interrupted-connections #37520
Description
That workflow is failing very often on main: https://github.com/cilium/cilium/actions/workflows/tests-e2e-upgrade.yaml?query=event%3Aschedule
[-] Scenario [no-interrupted-connections/no-interrupted-connections]
🟥 Pod test-conn-disrupt-client-74c554bc56-6fjrc flow was interrupted (restart count does not match 0 != 1)
This happens during the upgrade from v1.17.
I checked one of those failures and didn't see any packet drops from Cilium or from XFRM (which is disabled anyway, but I'm a bit parano). So seems like the drops are from somewhere else in the kernel.
First occurrence is on Feb. 2nd at 2:05pm. That's a Sunday. Pull requests merged on Thursday & Friday are: https://github.com/cilium/cilium/pulls?page=1&q=is%3Apr+merged%3A2025-01-30..2025-02-02+is%3Aclosed+-label%3Akind%2Fbackports.
The list of configs affected are: 7, 8, 12, 15, 19, 20, 21, 22, 23. I went through the list and compared the configs. They all have BPF NodePort enabled (but not necessarily the rest of KPR's options) and they are all on bpf or bpf-next kernels. That puts #37406 as a good suspect among the candidates. I sent a revert at #37485 and confirmed by running the workflow 10 times that it fixes the flake.