Backport for v1.19 release blockers 2026-01-26#44025
Merged
aanm merged 17 commits intocilium:v1.19from Jan 27, 2026
Merged
Conversation
[ upstream commit 93eb490 ] The datapath doesn't need write access for this map. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit e382a33 ] This add back a check to skip some endpoints that should not receive any traffic in the ClusterMesh context. Before the fixed commit this filtering was done through the resource / ParseEndpointSliceV1 but this was removed so that everything can be kept it in the backend maps while the backends not supposed to serve new connections would be excluded directly by the datapath. However as ClusterMesh doesn't retain any info on the Endpoint conditions, those are essentially lost and we would propagate them like any fully ready backends unfortunately. Fixes: 6f41c98 ("loadbalancer: Keep non-serving terminating backends") Signed-off-by: Arthur Outhenin-Chalandre <git@mrfreezeex.fr> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 9b05119 ] Adds entry to the upgrade notes for Cilium 1.19 relating to the introduction of wildcard service entries and the underlying behaviour change this brings to the datapath. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 161a51b ] This creates two unique selector caches inside the policy repository. We do this since the performance after changes in v1.16 caused the selector cache to become a lot slower since more data is injected. This split takes advantage of the fact that in a cluster where pods, identities and unique policies are churned consistently at a given rate r - we know that not all aspects are equally expensive. If we then assume these will have the cardinality c, we can do some calculations. We know that identities and the policies are cluster level and will churn at the same rate r, and has the same cardinality c. However, we know that on a given node, the churn of identities used by local endpoints is pretty small, most likely much smaller than r. The same with the cardinality of those identities, that can be at most the same number as there are endpoints on a node. We then also know, based on the assumption that not all policies match all pods, that not all l3/l4 {to,from}Endpoints are used by all endpoints. We can then assume that on a given node, not all c l3/l4 policies are being used. The same we can assume for the churn then, that the change of l3/l4 policies in use at a given time on the node, is much smaller than r. This also includes policies like FQDN and toCIDR policies, and other types of policies that end up in the selector cache today - but the concept is exactly the same. Based on this - we can split the selector cache in two, where; One is used to index: - policies from their endpointSelector (high churn rate, high cardinality) - identities used by local endpoints (low churn rate, low cardinality) And the other one to index: - internal l3/l4 policies used by local endpoints (low churn rate, low cardinality) - all identities (high churn rate, high cardinality) All this together, means that we go from a single selector cache that indexes a lot of different high churn and high cardinality items - to two distinct selector caches that index one high cardinality high churn aspect, and one that is low cardinality low churn. Fixes: 2e2c6c5 ("policy: determine subject identities via SelectorCache") Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Odin Ugedal <ougedal@palantir.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 2326ca1 ] Signed-off-by: Odin Ugedal <odin@ugedal.com> Signed-off-by: Odin Ugedal <ougedal@palantir.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 683f3ff ] Signed-off-by: Odin Ugedal <ougedal@palantir.com> Signed-off-by: Odin Ugedal <odin@ugedal.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 015d5f9 ] Signed-off-by: phuhung273 <phuhung.tranpham@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 74c1bde ] This commit adds a test that when we have two EndpointSlices in different namespaces with the same name, that these do not cause any collisions or problems. Specifically, we test the scenario where two EndpointSlices are setup, then one is modified by removing an address. This should be correctly picked up and the backend removed from the BPF maps. This test is a regression test for cilium#43999. Signed-off-by: Emily Shepherd <emily@redcoat.dev> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 5c146c3 ] We previously did not include the Namespace in EndpointSliceNames, relying on the fact that EndpointSlices that use generateName- are unlikely to have name collisions, even across namespaces. While this is usually the case, there is no requirement for EndpointSlice managers to use generateName, and there are examples of controllers that do not (for example the master kubernetes service's EndpointSlice is always called "kubernetes"). Including the namespace in EndpointSliceNames guarantees collisions cannot occur. See cilium#43999 for further discussion of this bug. Signed-off-by: Emily Shepherd <emily@redcoat.dev> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 236126f ] While we're at it, add references to the upstream kernel commits required for the feature checks to pass. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit fb3e790 ] 1. On errors, revert changes to the original values, rather than defaults. There are devices for which gso_max_size=65536 is too big. 2. In error handling flow, modify IPv6 values first, similarly to how it's done during the configuration. Modifying gso_max_size also affects gso_ipv4_max_size when setting values below 64k, so it should be done before IPv4. 3. In error handling flow, go over the devices in the reverse order, because there might be weird dependencies between them, e.g., tso_max_size of one device depends on gso_max_size of another. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 5138f56 ] Adjust the configuration flow for the startBIGTCP() to use a more typical detect, modify, update pattern. 1. Move loop for device GSO limit detection into startBIGTCP() 2. Modify the configuration at the end upon successful config. 3. Change the configuration to the default if no devices are selected. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit c5bd3bd ] Make the initialization flow of BIG TCP more robust by exposing all the logic explicitly in startBIGTCP(). The robustness changes include: 1. Return errors from startBIGTCP(). Previously, it would always return nil. 2. Fallback to older kernels' defaults when probing for potentially unsupported parameters. 3. Revert the change from commit fcdbf6d ("cilium, bigtcp: Allow raising GRO/GSO size without BIG TCP"), that would set gso_max_size=64k regardless of tso_max_size, which might be smaller, failing the operation in that case. Restore the old logic (in non-BIG TCP, keep values lower than 64k as is), but make it more robust: instead of hiding the check inside SetGROGSOIPv6MaxSize() and pretending that it set 64k, let startBIGTCP() check it explicily, whether lowering to 64k is needed. At the same time, store the lowest value among all netdevs to be used by the Cilium tunnel netdev. Fixes: cilium#43737 Signed-off-by: Alice Mikityanska <alice@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit c0c752b ] BIG TCP initialization code refuses to proceed with enabling the feature if Cilium is set to tunneling mode, but the admin doesn't declare kernel support for BIG TCP for VXLAN and GENEVE tunnels. However, tunneling mode isn't the only case when a GENEVE tunnel can be created. Another case is dsrDispatch=geneve. Currently, BIG TCP proceeds to increase gso_max_size and gro_max_size, but the following creation of the GENEVE tunnel fails. Detect this configuration in advance and block BIG TCP. Also block BIG TCP in dsrDispatch=ipip, because IPIP tunnels don't support gso_max_size > 64k either. Fixes: cilium#43938 Reported-by: Chris Bannister <c.bannister@gmail.com> Signed-off-by: Alice Mikityanska <alice@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 54157c0 ] [ Backporter's notes: Resolved conflict in cmdref caused by commit 37103a7 ("bpf: switch POLICY_ACCOUNTING to load time config") being merged upstream but not backported to v1.19 ] Setting this to 2 by default might be too aggressive as it is has been brought up that this may lead to issues in instances with high amounts of ephemeral connections (i.e. multiplying the overhead). Therefore we rework this prior to v1.19 such that cilium can be configured to use any valid value for endpoints or leave it empty to simply inherit from host settings. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 462b18e ] This commit adds a new tutorial documenting how to configure and use the Generic IP Options packet tracing feature. The tutorial covers: - Configuring `bpf.monitorTraceIPOption` via Helm in a Kind environment. - Manually verifying the feature using `nping` to inject valid 2-byte Trace IDs. - Observing extracted Trace IDs using `cilium monitor` - Filtering flows by Trace ID using the `hubble observe --ip-trace-id` command. - Documentation of current BPF limitations regarding strict payload lengths (2, 4, or 8 bytes). Signed-off-by: Ben Bigdelle <bigdelle@google.com> Signed-off-by: Joe Stringer <joe@cilium.io>
Member
Author
|
/test |
Contributor
|
LGTM for the patches from #43999. As I wrote them, I checked and tested that they work as-is if applied to v1.19. |
julianwiedmann
approved these changes
Jan 27, 2026
aanm
approved these changes
Jan 27, 2026
MrFreezeex
approved these changes
Jan 27, 2026
ajmmm
approved these changes
Jan 27, 2026
gentoo-root
approved these changes
Jan 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is all of the outstanding backports for v1.19 as of today, but the last few are release blockers for v1.19.0-rc.1.
cmdrefintroduced by commit 37103a7("bpf: switch POLICY_ACCOUNTING to load time config"), resolved by
re-generating the cmdref docs.
Once this PR is merged, a GitHub action will update the labels of these PRs: