-
Notifications
You must be signed in to change notification settings - Fork 3.7k
--exclude-local-address incompatible with eBPF Host Routing #41241
Description
Is there an existing issue for this?
- I have searched the existing issues
Version
equal or higher than v1.17.0 and lower than v1.18.0
What happened?
Description
We enabled eBPF Host Routing and noticed that on nodes where Cilium Agents had --exclude-local-address="169.254.20.10/31" (for the NodeLocal DNSCache), all pods' DNS requests started failing.
Disabling eBPF Host Routing solves the issue. If we keep eBPF Host Routing enabled and remove --exclude-local-address, the issue is gone as well.
What we see on the pod when the issue is happening:
# dig google.com
;; communications error to 169.254.20.10#53: timed out
;; communications error to 169.254.20.10#53: timed out
[...]
cilium monitor is showing:
[...] xx drop (FIB lookup failed, 4) flow 0x13a0ac9f to endpoint 0, ifindex 30, file bpf_lxc.c:1364, , identity 560589->16777241: 10.132.84.141:38658 -> 169.254.20.10:53 udp [...]
Looking further at the code, when an address is part of --exclude-local-address, it doesn't get added to the lxc_map (= ENDPOINT_MAP in the bpf code) as a local host entry (which is expected):
cilium/daemon/cmd/hostips-sync.go
Lines 150 to 153 in 5e4e4bf
| if option.Config.IsExcludedLocalAddress(addr.Addr) { | |
| continue | |
| } | |
| addIdentity(addr.Addr.AsSlice(), nil, identity.ReservedIdentityHost, labels.LabelHost) |
cilium/pkg/maps/lxcmap/lxcmap.go
Lines 212 to 224 in 492bfe8
| // SyncHostEntry checks if a host entry exists in the lxcmap and adds one if needed. | |
| // Returns boolean indicating if a new entry was added and an error. | |
| func SyncHostEntry(ip net.IP) (bool, error) { | |
| key := NewEndpointKey(ip) | |
| value, err := LXCMap(nil).Lookup(key) | |
| if err != nil || value.(*EndpointInfo).Flags&EndpointFlagHost == 0 { | |
| err = AddHostEntry(ip) | |
| if err == nil { | |
| return true, nil | |
| } | |
| } | |
| return false, err | |
| } |
The problem is that this seems to cause confusion in the eBPF Host Routing code. This lookup fails:
Line 1203 in 7a6622f
| ep = __lookup_ip4_endpoint(daddr); |
and so the code continues down to the fib_redirect section:
Line 1312 in 7a6622f
| ret = fib_redirect_v4(ctx, ETH_HLEN, ip4, false, false, ext_err, &oif); |
Which fails with the FIB lookup failed, 4 error shown above. The error code 4 seems to correspond to BPF_FIB_LKUP_RET_NOT_FWDED:
cilium/bpf/include/linux/bpf.h
Line 6965 in 7a6622f
| BPF_FIB_LKUP_RET_NOT_FWDED, /* packet is not forwarded */ |
I believe this BPF_FIB_LKUP_RET_NOT_FWDED error is returned because the route is local:
ip route get 169.254.20.10 from 10.132.84.141 iif lxc04d54872a51f
local 169.254.20.10 from 10.132.84.141 dev lo table local
cache <local> iif lxc04d54872a51f
So the packet is never delivered and we get DNS query timeouts.
Possible remediation
I have a POC which seems to solve the issue: DataDog@844f1c8
The idea is that we can keep track of excluded local addresses in a new map which can be consulted by the eBPF Host Routing code. When a destination ip is in that map, we avoid the fib lookup codepath and simply pass the packet to the kernel's stack. This way nothing changes identity-wise, the only change is related to routing.
Let me know if this solution is on the right path, I can open a PR
How can we reproduce the issue?
- Enable eBPF Host Routing
- Set
--exclude-local-address="169.254.20.10/31" - Try to reach
169.254.20.10from a pod and confirm that there is a timeout
Cilium Version
1.17.6
Kernel Version
6.8.0
Kubernetes Version
v1.33.1
Regression
No response
Sysdump
No response
Relevant log output
Anything else?
No response
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct