Skip to content

cilium, bigtcp: Rework initialization flow#43891

Merged
joestringer merged 5 commits intomainfrom
pr/alice/bigtcp
Jan 26, 2026
Merged

cilium, bigtcp: Rework initialization flow#43891
joestringer merged 5 commits intomainfrom
pr/alice/bigtcp

Conversation

@gentoo-root
Copy link
Copy Markdown
Contributor

@gentoo-root gentoo-root commented Jan 20, 2026

Make the initialization flow of BIG TCP more robust by exposing all the logic explicitly in startBIGTCP(). The robustness changes include:

  1. On errors, revert changes to the original values, rather than defaults. There are devices for which gso_max_size=65536 is too big.

  2. In error handling flow, modify IPv6 values first, similarly to how it's done during the configuration. Modifying gso_max_size also affects gso_ipv4_max_size when setting values below 64k, so it should be done before IPv4.

  3. Return errors from startBIGTCP(). Previously, it would always return nil.

  4. In error handling flow, go over the devices in the reverse order, because there might be weird dependencies between them, e.g., tso_max_size of one device depends on gso_max_size of another.

  5. Fallback to older kernels' defaults when probing for potentially unsupported parameters.

  6. Revert the change from commit fcdbf6d ("cilium, bigtcp: Allow raising GRO/GSO size without BIG TCP"), that would set gso_max_size=64k regardless of tso_max_size, which might be smaller, failing the operation in that case. Restore the old logic (in non-BIG TCP, keep values lower than 64k as is), but make it more robust: instead of hiding the check inside SetGROGSOIPv6MaxSize() and pretending that it set 64k, let startBIGTCP() check it explicily, whether lowering to 64k is needed. At the same time, store the lowest value among all netdevs to be used by the Cilium tunnel netdev.

Fixes: #43737

Also fix another bug: block BIG TCP with dsrDispatch=geneve (when no kernel support is present) and dsrDispatch=ipip (as there is no pending kernel support yet).

Fixes: #43938

Make BIG TCP initialization flow more robust and fix bugs.

@gentoo-root gentoo-root added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Jan 20, 2026
@gentoo-root
Copy link
Copy Markdown
Contributor Author

/test

@gentoo-root
Copy link
Copy Markdown
Contributor Author

/ci-kubespray

@gentoo-root gentoo-root marked this pull request as ready for review January 21, 2026 10:52
@gentoo-root gentoo-root requested a review from a team as a code owner January 21, 2026 10:52
@gentoo-root gentoo-root requested a review from ldelossa January 21, 2026 10:52
@aanm aanm added the needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch label Jan 21, 2026
@aanm aanm enabled auto-merge January 21, 2026 15:49
@gentoo-root gentoo-root force-pushed the pr/alice/bigtcp branch 2 times, most recently from b2c14f0 to c2cc375 Compare January 22, 2026 20:06
@gentoo-root
Copy link
Copy Markdown
Contributor Author

/test

@gentoo-root
Copy link
Copy Markdown
Contributor Author

/ci-kubespray

@joestringer joestringer added the release-blocker/1.19 This issue will prevent the release of the next version of Cilium. label Jan 26, 2026
@joestringer joestringer moved this from Proposed to Active in Release blockers Jan 26, 2026
While we're at it, add references to the upstream kernel commits
required for the feature checks to pass.

Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
1. On errors, revert changes to the original values, rather than
defaults. There are devices for which gso_max_size=65536 is too big.

2. In error handling flow, modify IPv6 values first, similarly to how
it's done during the configuration. Modifying gso_max_size also affects
gso_ipv4_max_size when setting values below 64k, so it should be done
before IPv4.

3. In error handling flow, go over the devices in the reverse order,
because there might be weird dependencies between them, e.g.,
tso_max_size of one device depends on gso_max_size of another.

Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
Adjust the configuration flow for the startBIGTCP() to use a more typical
detect, modify, update pattern.

1. Move loop for device GSO limit detection into startBIGTCP()

2. Modify the configuration at the end upon successful config.

3. Change the configuration to the default if no devices are selected.

Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
Make the initialization flow of BIG TCP more robust by exposing all the
logic explicitly in startBIGTCP(). The robustness changes include:

1. Return errors from startBIGTCP(). Previously, it would always return
nil.

2. Fallback to older kernels' defaults when probing for potentially
unsupported parameters.

3. Revert the change from commit fcdbf6d ("cilium, bigtcp: Allow
raising GRO/GSO size without BIG TCP"), that would set gso_max_size=64k
regardless of tso_max_size, which might be smaller, failing the
operation in that case. Restore the old logic (in non-BIG TCP, keep
values lower than 64k as is), but make it more robust: instead of hiding
the check inside SetGROGSOIPv6MaxSize() and pretending that it set 64k,
let startBIGTCP() check it explicily, whether lowering to 64k is needed.
At the same time, store the lowest value among all netdevs to be used
by the Cilium tunnel netdev.

Fixes: #43737
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
BIG TCP initialization code refuses to proceed with enabling the feature
if Cilium is set to tunneling mode, but the admin doesn't declare kernel
support for BIG TCP for VXLAN and GENEVE tunnels. However, tunneling
mode isn't the only case when a GENEVE tunnel can be created. Another
case is dsrDispatch=geneve.

Currently, BIG TCP proceeds to increase gso_max_size and gro_max_size,
but the following creation of the GENEVE tunnel fails. Detect this
configuration in advance and block BIG TCP.

Also block BIG TCP in dsrDispatch=ipip, because IPIP tunnels don't
support gso_max_size > 64k either.

Fixes: #43938
Reported-by: Chris Bannister <c.bannister@gmail.com>
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Copy link
Copy Markdown
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM. As part of review I split the first commit into a few smaller pieces, so I'll push those shortly and merge. There's no diff.

@aanm aanm added this pull request to the merge queue Jan 26, 2026
@joestringer joestringer removed the request for review from ldelossa January 26, 2026 23:55
@joestringer joestringer removed this pull request from the merge queue due to a manual request Jan 26, 2026
@joestringer joestringer merged commit c0c752b into main Jan 26, 2026
37 checks passed
@joestringer joestringer deleted the pr/alice/bigtcp branch January 26, 2026 23:56
@github-project-automation github-project-automation bot moved this from Active to Done in Release blockers Jan 26, 2026
@joestringer joestringer added backport-pending/1.19 The backport for Cilium 1.19.x for this PR is in progress. and removed needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Jan 27, 2026
@github-actions github-actions bot added backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. and removed backport-pending/1.19 The backport for Cilium 1.19.x for this PR is in progress. labels Jan 27, 2026
@cilium-release-bot cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. release-blocker/1.19 This issue will prevent the release of the next version of Cilium. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

Archived in project
Status: Released

4 participants