-
Notifications
You must be signed in to change notification settings - Fork 3.7k
wireguard: connectivity issues with ipv6-only clusters #23917
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
While running the connectivity tests on an IPv6-only cluster with wireguard encryption, I've noticed different failures which affect pod to pod communication. For instance:
❌ 1/1 tests failed (2/4 actions), 30 tests skipped, 9 scenarios skipped:
Test [no-policies]:
❌ no-policies/pod-to-pod/curl-1: cilium-test/client-7b78db77d5-2cpmk (fd00:10:242:1::a9b2) -> cilium-test/echo-other-node-78f77b57f8-qnpq7 (fd00:10:242::786a:8080)
❌ no-policies/pod-to-pod/curl-3: cilium-test/client2-78f748dd67-w6dhk (fd00:10:242:1::69e4) -> cilium-test/echo-other-node-78f77b57f8-qnpq7 (fd00:10:242::786a:8080)
The issue relates with fragmentation, as confirmed by executing a ping with increasing packet size (ping -s 1360 works, while ping -s 1361 fails and triggers fragmentation, although it shouldn't). Pod interfaces have MTU set (through the default route) to 1420, cilium_wg0 has MTU 1420, while the host interface has MTU 1500.
More specifically, the problem is triggered by the padding added by wireguard to align the packet to 16 bytes [1], which though, should be limited by the MTU to prevent the occurrence of fragmentation. Still, this does not happen, since
here [2] the MTU is detected based on the wrong device (eth0 rather than cilium_wg0). This happens since dev is correctly set to cilium_wg0 [3] after bpf_redirect is performed by Cilium, while skb_dst(skb)->dev does not seem to get updated.
[1]: https://lxr.missinglinkelectronics.com/linux+v5.19/drivers/net/wireguard/send.c#L141
[2]: https://lxr.missinglinkelectronics.com/linux+v5.19/drivers/net/wireguard/device.c#L171
[3]: https://lxr.missinglinkelectronics.com/linux+v5.19/net/core/filter.c#L2110
Cilium Version
Recent version on master
Kernel Version
Linux 6.1.0-3-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.8-1 (2023-01-29) x86_64 GNU/Linux
Kubernetes Version
Client Version: v1.26.1
Kustomize Version: v4.5.7
Server Version: v1.25.3
Sysdump
No response
Relevant log output
No response
Anything else?
No response
Code of Conduct
- I agree to follow this project's Code of Conduct