Skip to content

Fix connectivity issue if nodes share the same name across the clustermesh and wireguard is enabled#24785

Merged
ldelossa merged 1 commit intocilium:mainfrom
giorio94:mio/clustermesh-wireguard-same-node-name
Apr 20, 2023
Merged

Fix connectivity issue if nodes share the same name across the clustermesh and wireguard is enabled#24785
ldelossa merged 1 commit intocilium:mainfrom
giorio94:mio/clustermesh-wireguard-same-node-name

Conversation

@giorio94
Copy link
Copy Markdown
Member

@giorio94 giorio94 commented Apr 6, 2023

Currently, the wireguard subsystem in the cilium agent caches information about the known peers by node name only. This can lead to conflicts in case of clustermesh, if nodes in different clusters have the same name, causing in turn connectivity issues. Hence, let's switch to identify peers by full name (i.e., cluster-name/node-name) to ensure uniqueness.

Fixes: #24227
Reported-by: @oulinbao

Fix connectivity issue if nodes share the same name across the clustermesh and wireguard is enabled

@giorio94 giorio94 added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. needs-backport/1.11 labels Apr 6, 2023
@giorio94 giorio94 force-pushed the mio/clustermesh-wireguard-same-node-name branch 2 times, most recently from 3fea3e9 to e03cff2 Compare April 7, 2023 09:08
@giorio94
Copy link
Copy Markdown
Member Author

giorio94 commented Apr 7, 2023

/test

@giorio94 giorio94 marked this pull request as ready for review April 7, 2023 09:08
@giorio94 giorio94 requested review from a team as code owners April 7, 2023 09:08
@giorio94 giorio94 requested review from a team, NikAleksandrov and brb and removed request for a team April 7, 2023 09:08
@giorio94
Copy link
Copy Markdown
Member Author

giorio94 commented Apr 7, 2023

/ci-eks

Hit unrelated flake #24774

@giorio94
Copy link
Copy Markdown
Member Author

giorio94 commented Apr 7, 2023

The ConformanceEKS test hit again the same unrelated flake. I'm not rerunning it, since it wouldn't test anything wireguard-related in any case.

@pchaigno pchaigno requested review from YutaroHayakawa and removed request for NikAleksandrov April 18, 2023 10:25
Currently, the wireguard subsystem in the cilium agent caches
information about the known peers by node name only. This can lead to
conflicts in case of clustermesh, if nodes in different clusters have
the same name, causing in turn connectivity issues. Hence, let's switch
to identify peers by full name (i.e., cluster-name/node-name) to ensure
uniqueness. This modification does not introduce issues during upgrades,
since the node ID is not propagated to the datapath.

Fixes: cilium#24227
Reported-by: @oulinbao <oulinbao@163.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/clustermesh-wireguard-same-node-name branch from e03cff2 to 296fc49 Compare April 19, 2023 10:06
@giorio94
Copy link
Copy Markdown
Member Author

giorio94 commented Apr 19, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sPolicyTestExtended Validate toEntities KubeAPIServer Allows connection to KubeAPIServer

Failure Output

FAIL: Found 1 io.cilium/app=operator logs matching list of errors that must be investigated:

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/1895/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

@giorio94
Copy link
Copy Markdown
Member Author

k8s-1.26-kernel-net-next hit unrelated flake: #22578 (comment)

@giorio94
Copy link
Copy Markdown
Member Author

/test-1.25-5.4

@giorio94
Copy link
Copy Markdown
Member Author

/test-1.26-4.19

@giorio94
Copy link
Copy Markdown
Member Author

/test-1.27-net-next

Copy link
Copy Markdown
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@giorio94
Copy link
Copy Markdown
Member Author

test-1.27-net-next failures:

@giorio94
Copy link
Copy Markdown
Member Author

Marking as ready to merge, since reviews are in, and test failures are due to known flakes (and unrelated, as wireguard is not enabled).

@giorio94 giorio94 added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 20, 2023
@ldelossa ldelossa merged commit 7398de6 into cilium:main Apr 20, 2023
@nbusseneau nbusseneau mentioned this pull request Apr 20, 2023
4 tasks
@nbusseneau nbusseneau mentioned this pull request Apr 20, 2023
7 tasks
@nbusseneau nbusseneau mentioned this pull request Apr 20, 2023
15 tasks
@sayboras sayboras added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.13 labels Apr 26, 2023
@giorio94 giorio94 deleted the mio/clustermesh-wireguard-same-node-name branch June 14, 2024 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

cilium agent can not find all wireguard peers in clustermesh context

10 participants