-
Notifications
You must be signed in to change notification settings - Fork 3.7k
BPF masquerading does not masquerade traffic to remote node's ExternalIP #17177
Description
Bug report
We have a Cilium cluster with native routing (no encapsulation). When starting a pod on any node, the node it runs on is reachable (e.g. ping). However, other nodes external IP is not reachable. So if pod1 runs on node1, ping external_ip_of_node1 works, ping external_ip_of_node2 does not.
Each node has an internal interface (dummy0) and an external interface (external). All nodes are connected internally using a WireGuard interface (wg0) at the OS level (not managed by Cilium) so that all cluster traffic is encrypted. All external interfaces are reachable from each node.
General Information
- Cilium version (run
cilium version)
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), wait-for-node-init (init), clean-cilium-state (init)
Client: 1.10.3 4145278 2021-07-15T16:11:03+02:00 go version go1.16.6 linux/amd64
Daemon: 1.10.3 4145278 2021-07-15T16:11:03+02:00 go version go1.16.6 linux/amd64
- Kernel version (run
uname -a)
Linux dev-0001 5.8.0-63-generic #71-Ubuntu SMP Tue Jul 13 15:59:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Orchestration system version in use (e.g.
kubectl version, ...)
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3+k3s1", GitCommit:"1d1f220fbee9cdeb5416b76b707dde8c231121f2", GitTreeState:"clean", BuildDate:"2021-07-22T20:52:14Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3+k3s1", GitCommit:"1d1f220fbee9cdeb5416b76b707dde8c231121f2", GitTreeState:"clean", BuildDate:"2021-07-22T20:52:14Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
- Generate and upload a system zip:
curl -sLO https://git.io/cilium-sysdump-latest.zip && python cilium-sysdump-latest.zip
How to reproduce the issue
- Start a network multitool pod on any node (https://github.com/Praqma/Network-MultiTool)
- Exec into a shell of the pod
- Ping external address:
ping 1.1.1.1works - Ping own node's external address: works
- Ping other node's external address: does not work
pinging external system
tcpdump
17:42:40.538395 IP 10.100.2.123 > 1.1.1.1: ICMP echo request, id 51, seq 1, length 64
17:42:40.538453 IP NODE1 > 1.1.1.1: ICMP echo request, id 61439, seq 1, length 64
17:42:40.543845 IP 1.1.1.1 > NODE1: ICMP echo reply, id 61439, seq 1, length 64
17:42:40.543892 IP 1.1.1.1 > 10.100.2.123: ICMP echo reply, id 51, seq 1, length 64
cilium monitor
Policy verdict log: flow 0x0 local EP ID 333, remote ID world, proto 1, egress, action allow, match all, 10.100.2.123 -> 1.1.1.1 EchoRequest
-> stack flow 0x0 identity 237465->world state new ifindex 0 orig-ip 0.0.0.0: 10.100.2.123 -> 1.1.1.1 EchoRequest
Some NAT seems to be happening here
pinging own node
tcpdump
17:38:29.191152 IP 10.100.2.123 > NODE1: ICMP echo request, id 47, seq 1, length 64
17:38:29.191219 IP NODE1 > 10.100.2.123: ICMP echo reply, id 47, seq 1, length 64
cilium monitor
-> stack flow 0x0 identity 237465->host state new ifindex 0 orig-ip 0.0.0.0: 10.100.2.123 -> NODE1 EchoRequest
-> endpoint 333 flow 0x0 identity host->237465 state reply ifindex 0 orig-ip NODE1: NODE1 -> 10.100.2.123 EchoReply
works
pinging other node
tcpdump
17:39:02.437350 IP 10.100.2.123 > $NODE2: ICMP echo request, id 48, seq 1, length 64
cilium monitor
Policy verdict log: flow 0x0 local EP ID 333, remote ID remote-node, proto 1, egress, action allow, match all, 10.100.2.123 -> NODE2 EchoRequest
-> stack flow 0x0 identity 237465->remote-node state new ifindex 0 orig-ip 0.0.0.0: 10.100.2.123 -> NODE2 EchoRequest
no response (and no NAT!)