Merged
Conversation
[ upstream commit 3221c72 ] This is a preparation for the upcoming new BGPv2-native CLI and metrics. In BGPv2, the peer is identified by the explicit name instead of IP address. The reason for this spec is that makes it easier for users to identify dynamically discovered peers on CLI or metrics. However, the CLI and metrics output are based on the information retrieved from the RouterManager which doesn't have a peer name => peer IP address correspondence. Having this mapping in the RouterManager is tricky because the peer discovery is done by the reconcilers. The solution here is keeping the peer name in the GoBGP's Peer object. While GoBGP still identifies peer with IP address, we have a Description field that accepts arbitrary string. We can keep the peer name there. For the sake of the future extensibility, the name is encoded as a JSON string. Whenever we want to encode additional metadata, we can introduce a new JSON field. The agent API field is extended with name: ``` $ cilium-dbg bgp peer -o json [ { "configured-hold-time-seconds": 90, "configured-keep-alive-time-seconds": 30, "connect-retry-time-seconds": 120, "ebgp-multihop-ttl": 1, "families": [ { "afi": "ipv6", "safi": "unicast" }, { "afi": "ipv4", "safi": "unicast" } ], ... "name": "peer0", "peer-address": "10.0.0.1", "peer-port": 179, "remote-capabilities": [], "session-state": "active" } ] ``` Note that cilium-cli is not yet updated with this change because we will introduce a completely new CLI output shortly, so it doesn't make sense to augment the current output with name. The TestGetPeerStatus test case used to identify peer with IP + ASN. Change it to leverage this API. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 4966dd7 ] The current RouterManager.GetPeer and Router.GetPeerState API returns the API model directly. In the new CLI, we will rely on the new internal model we introduced in the previous commit. Since, we still need to keep the legacy output around for a while, mark it as legacy and keep it as is. We'll introduce a new implementation in the subsequent commits. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit ad0ae86 ] Introduce an equivalent of models.BgpPeer to the types package. This is a preparation for getting rid of the agent API and moving on to the Hive-Script-based implementation. Currently, the RouterManager's GetPeer call directly returns the models.BgpPeer format, but in the Hive-Script-based implementation, we won't rely on the OpenAPI-generated types. Currently, we only define the fields required for implemeting the CLI summary output. For the fields we need for the detailed output or metrics will be added later. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit babbb8a ] Implement a new RouterManager.GetPeers and Router.GetPeers. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 8d68afd ] Introduce a new bgp/peers Hive Script command which will become a backend of the new `cilium-dbg bgp peers` command. Note that the name is colliding with the existing command which is only used in the script test. We will change test side to use the new command in the subsequent commits. The new command reflects BGPv2 structure natively. It identifies instance and peer with the name configured by the user. ASN or IP address will not be showed in the default output format. Old CLI output ``` Local AS Peer AS Peer Address Session Uptime Family Received Advertised 65000 65000 172.19.0.1:179 established 22m25s ipv6/unicast 0 0 ipv4/unicast 4 0 65000 65000 172.19.0.200:179 active 0s ipv6/unicast 0 0 ipv4/unicast 0 0 65001 65001 172.19.0.100:179 active 0s ipv6/unicast 0 0 ipv4/unicast 0 0 ``` New CLI output ``` Instance Peer Session State Uptime Family Received Accepted Advertised instance0 peer0 established 22m35s ipv4-unicast 4 4 0 ipv6-unicast 0 0 0 peer1 active - ipv4-unicast - - - ipv6-unicast - - - instance1 peer0 active - ipv4-unicast - - - ipv6-unicast - - - ``` An optional flag --no-uptime hides uptime value. This eliminates any non-deterministic output and makes script test assertion easier. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 26d721f ] [ backporter's notes: Adjust output of peering-multi-instance.txtar ] Use the new bgp/peers instead of the implementation for tests. Adjust the expected output and use --no-uptime flag. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit c0d2d72 ] Some e2e configs specify a different encryption algorithm (cbc-aes-sha256). Have the e2e-upgrade workflow respect this. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit ed982b6 ] We need this to query adj-rib in the new GetRoutes API. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 6f6c4bc ] The current RouterManager.GetPeer API returns the API model directly. In the new CLI, we will rely on the new internal model. Since, we still need to keep the legacy output around for a while, mark it as legacy and keep it as is. We'll introduce a new implementation in the subsequent commits. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 9c4e60a ] - Remove unnecessary extra P from the function name - Fix the bug in the Instance and Peer deduplication logic - Don't sort slice including header Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit ecd377f ] Introduce a new GetRoutes API on BGPRouterManager which returns BGPv2-native result. The result contains the Instance name that the route is retrieved from, and the Neighbor name for adj-rib-in and adj-rib-out. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 436d64a ] Introduce bgp/routes to query BGPRouterManager.GetRoutes and render a structured table for loc-rib and adj-rib tables. Example output: loc-rib ``` Instance Prefix NextHop Best Age instance0 10.0.0.0/32 0.0.0.0 true 10s 10.0.0.1/32 0.0.0.0 true 10s instance1 10.0.0.2/32 0.0.0.0 true 10s ``` adj-rib ``` Instance Peer Prefix NextHop Age instance0 peer0 10.244.0.0/24 10.99.0.110 10s 10.96.50.104/32 10.99.0.110 10s peer1 10.244.0.0/24 10.99.0.110 10s 10.96.50.104/32 10.99.0.110 10s instance1 peer2 10.244.0.0/24 10.99.0.110 10s 10.96.50.104/32 10.99.0.110 10s ``` It has an optional flags -o (output to file) and --no-age (disable Age output to make the command output predictable). Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 195de19 ] The adj-rib-in test requires the peer GoBGP to advertise the route. Add a simple command to advertise route from test GoBGP instance. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit bcc36f9 ] Add a script test scenario that tests the output of the bgp/peers and bgp/routes commands. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit e7a22f0 ] The snippet contains a |CHART_VERSION| directive that is not substituted when generating the docs, because it's under a "code-block" directive instead of a "parsed-literal". Fix the directive, adjust backslashes accordingly, and remove the redundant "--version" argument (already generated when expanding |CHART_VERSION|). Trim trailing white spaces in the file. Fixes: 63bfe7d ("Added GKE-to-GKE Clustermesh Preparation guide") Signed-off-by: Quentin Monnet <qmo@qmon.net> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 6547a37 ] The snippet contains a |CHART_VERSION| directive that is not substituted when generating the docs, because it's under a "code-block" directive instead of a "parsed-literal". Fix the directive, adjust backslashes accordingly, and remove the redundant "--version" argument (already generated when expanding |CHART_VERSION|). Trim trailing white spaces in the file. Fixes: b76f928 ("docs: add Helm configuration instructions for metrics") Signed-off-by: Quentin Monnet <qmo@qmon.net> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 920027d ] For "parsed-literal" blocks, we need a double-backslash at the end of the line to make a single "\" appear in the generated HTML docs. With a single one in the source, Sphinx remove the line breaks and leaves multiple spaces (from the indentation on the next line) instead. Go through multiple locations (all that I could find) in the docs where we have parsed-literal blocks with single backslashes to mark a command split, and adjust with double backslashes instead. For .../sbom.rst, also add the missing indentation marking the continuation of the command line. Trim trailing white spaces, if any, in all edited files. Signed-off-by: Quentin Monnet <qmo@qmon.net> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 87fac93 ] Signed-off-by: Antony Reynaud <antony.reynaud@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
…nges [ upstream commit 17953f5 ] When only the bandwidth.Priority annotation changes (without EgressBandwidth/IngressBandwidth changing), UpdateBandwidthPolicy was not being called, preventing the priority update from taking effect. Fixes the condition to also trigger on annoChangedPriority. Signed-off-by: zbb88888 <jmdxjsjgcxy@gmail.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 038d3ee ] The newly introduced loadbalancer healthserver script test that tests the health for services with proxy redirection sometimes fails with the following error. ``` scripttest.go:72: (command "* cmp healthserver-proxy.expected healthserver.after" failed, retrying in 500ms...) diff healthserver-proxy.expected healthserver.after --- healthserver-proxy.expected +++ healthserver.after Content-Type=application/json Date=<omitted> X-Content-Type-Options=nosniff -X-Load-Balancing-Endpoint-Weight=1 +X-Load-Balancing-Endpoint-Weight=3 --- -{"service":{"namespace":"test","name":"echo"},"localEndpoints":1} +{"service":{"namespace":"test","name":"echo"},"localEndpoints":3} ``` See https://github.com/cilium/cilium/actions/runs/21952595101/job/63407305531 Also reproduced locally when executing the test in a loop. It seems that asynchronous (k8s) event processing can lead to situations where the proxy redirection port (set with `svc/set-proxy-redirect` test helper) is reset. Even the retry to fetch and compare will always see the same outdated data. Therefore it seems to be safer to include the call to `svc/set-proxy-redirect test/echo 1000` into the same hive script test section so that it gets retried too in case of a failure (`* `) prefix. See https://docs.cilium.io/en/stable/contributing/development/hive/#command-reference for more info. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit bc0db61 ] The list in FURTHER_READINGS.rst hasn't been updated in nearly four years, and doesn't reflect well the activity (in terms of publication) around the project. It makes no sense keeping it at the root of the directory. Following a discussion with Joe, we agreed that updating this list would take time and would lead to the same issue in a few years from now, so it's probably best to point to external, more relevant resources or, to be more precise, lists of resources. So we remove this document, and in the documentation, we point instead to the list of links on ebpf.io, and the blog and case studies on cilium.io, where we hope readers will find what they're looking for; at least, they should find more recent reading materials. Signed-off-by: Quentin Monnet <qmo@qmon.net>
[ upstream commit 6292f7d ] This commit adds the new TLSRoute hostname intersection test added to Gateway API in v1.5, which tests behavior that Cilium has had subtly incorrect for some time. This adds a custom sorter for hostnames, which enables the hostname intersection calculation to be correct. This also required changing some details in how the model lists backends, in `model.TLSBackends()`. Signed-off-by: Nick Young <nick@isovalent.com>
[ upstream commit 4aed766 ] This commit fixes a bug with Gateway API reconciliation, where TLSRoutes were allowed to attach to HTTPS listeners. By the Gateway API spec, they are only allowed to attach to TLS listeners. Similarly, this updates HTTPRoute and GRPCRoute processing to not be able to attach to TLS listeners - also not allowed by the spec. Signed-off-by: Nick Young <nick@isovalent.com>
[ upstream commit af3d557 ] The receive function could stall because of the send to the unbuffered result channel and prevent the goroutine from terminating. This commit fixes that by following the same approach as StreamWatcher (which the meshEndpointSliceWatcher is inspired from) to avoid this leak. Reported-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Arthur Outhenin-Chalandre <git@mrfreezeex.fr> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 4d5b227 ] This is required when the OwnerReferencesPermissionEnforcement admission plugin is set, because the operator creates derived Services resources owned by the corresponding ingress, with the blockOwnerDeletion flag set. Signed-off-by: Arthur Outhenin-Chalandre <git@mrfreezeex.fr> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit d259c76 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 308d172 ] This commit removes the !changes.Empty() condition to avoid the bug when the bpf map is no change but we still need to update the envoy network policy. When there is SNI network policy with FQDN network policy, we will redirect egress all traffic to the envoy. The identity could change with wildcard FQDN policy and bpf map will keep the same. that will cause the enovy network policy not getting updated. For example,we could have the following identities in the beginning 1677721 fqdn:sts.*.amazonaws.com reserved:world 16777220 fqdn:*.amazonaws.com reserved:world When the DNS resolves the IP for sts.*.amazonaws.com, we will generate the new identity 16777223 fqdn:*.*.amazonaws.com fqdn:sts.*.amazonaws.com reserved:world If we have the SNI network policy for the pod, that will make the bpf map look like the following. root@kind-worker8:/home/cilium# cilium bpf policy get 2782 POLICY DIRECTION LABELS (source:key[=value]) PORT/PROTO PROXY PORT AUTH TYPE BYTES PACKETS PREFIX LEVEL Allow Ingress ANY ANY NONE disabled 0 0 0 0 Allow Ingress reserved:host ANY NONE disabled 0 0 0 0 Allow Egress ANY 443/TCP 13379 disabled 5904 33 24 0 With the current check logic, there is no change to the map. Then we will skip updating the envoy network policy causing envoy holding the stale identity and block the traffic. Signed-off-by: Liyi Huang <liyi.huang@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 9087bd9 ] On certain environments, where StatefulSets are used, the re-usage of a vethpair with the same name can occur. This can cause some concurrency issues and the setDown function is executed for the "new" veth pair, which uses the same name as the "older" veth pair. To prevent this from happening we should also check if the ifindex matches the veth pair fetched by netlink. Fixes: 6633ca8 ("datapath,endpoint: explicitly remove TC filters during endpoint teardown") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Member
Author
|
/test |
christarazi
approved these changes
Feb 25, 2026
youngnick
approved these changes
Feb 25, 2026
julianwiedmann
approved these changes
Feb 25, 2026
Artyop
approved these changes
Feb 25, 2026
MrFreezeex
approved these changes
Feb 25, 2026
xtineskim
approved these changes
Feb 25, 2026
liyihuang
approved these changes
Feb 26, 2026
aanm
approved these changes
Mar 2, 2026
giorio94
reviewed
Mar 3, 2026
| preallocatedIPsPerPool preAllocatePerPool | ||
| pendingIPsPerPool *pendingAllocationsPerPool | ||
|
|
||
| poolsMutex lock.Mutex |
Member
There was a problem hiding this comment.
@YutaroHayakawa @christarazi I don't think that the conflict resolution is correct here, because it introduces a new mutex that isn't used anywhere else, hence providing no actual synchronization. AFAIU, the [multiPoolManager.capacity] method below should lock the already existing [m.mutex] instead. Could you please take a look?
Member
There was a problem hiding this comment.
Sorry missed this somehow, here's the fix: #44777.
Thanks for catching this.
3 tasks
christarazi
added a commit
that referenced
this pull request
Mar 13, 2026
The backport conflict resolution incorrectly introduced a new poolsMutex that isn't used anywhere else, hence providing no actual synchronization. Use the existing m.mutex instead, matching the upstream fix. Fixes: db32834 ("ipam: Fix concurrent map access to multipool map") Fixes: #44517 Reported-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Mar 16, 2026
The backport conflict resolution incorrectly introduced a new poolsMutex that isn't used anywhere else, hence providing no actual synchronization. Use the existing m.mutex instead, matching the upstream fix. Fixes: db32834 ("ipam: Fix concurrent map access to multipool map") Fixes: #44517 Reported-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ip get -l reserved:host#44443 (@aanm)Once this PR is merged, a GitHub action will update the labels of these PRs: