Skip to content

CI: Cilium E2E Upgrade: no-interrupted-connections #37520

@pchaigno

Description

@pchaigno

That workflow is failing very often on main: https://github.com/cilium/cilium/actions/workflows/tests-e2e-upgrade.yaml?query=event%3Aschedule

  [-] Scenario [no-interrupted-connections/no-interrupted-connections]
  🟥 Pod test-conn-disrupt-client-74c554bc56-6fjrc flow was interrupted (restart count does not match 0 != 1)

This happens during the upgrade from v1.17.

I checked one of those failures and didn't see any packet drops from Cilium or from XFRM (which is disabled anyway, but I'm a bit parano). So seems like the drops are from somewhere else in the kernel.

First occurrence is on Feb. 2nd at 2:05pm. That's a Sunday. Pull requests merged on Thursday & Friday are: https://github.com/cilium/cilium/pulls?page=1&q=is%3Apr+merged%3A2025-01-30..2025-02-02+is%3Aclosed+-label%3Akind%2Fbackports.

The list of configs affected are: 7, 8, 12, 15, 19, 20, 21, 22, 23. I went through the list and compared the configs. They all have BPF NodePort enabled (but not necessarily the rest of KPR's options) and they are all on bpf or bpf-next kernels. That puts #37406 as a good suspect among the candidates. I sent a revert at #37485 and confirmed by running the workflow 10 times that it fixes the flake.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/CIContinuous Integration testing issue or flakeci/flakeThis is a known failure that occurs in the tree. Please investigate me!dependenciesPull requests that update a dependency filestaleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions