When Cilium (v1.8) is configured with global.devices set, global.autoDirectNodeRoutes=true, and global.masquerade=true, multi-node connectivity seems to fail. In particular, attempts to resolve DNS names fail with the following errors:
$ dig kubernetes.default.svc.cluster.local @10.0.1.157
;; reply from unexpected source: 192.168.36.12#53, expected 10.0.1.157#53
;; reply from unexpected source: 192.168.36.12#53, expected 10.0.1.157#53
The query reaches the second node with kube-dns as expected, but then the answer leaves the second node with source IP address 192.168.36.12 instead of the 10.0.1.157 expected by the client. So masquerading seems to fail (which is why setting global.masquerade=false works around the bug).
@brb added a few more informations:
One more data point: when we do NOT set --device(s), a reply from bpf_lxc does not enter nat/POSTROUTING chain (in which the MASQ rule is installed).
tc filter del dev enp0s8 ingress fixes the issue.
Related: #11969
When Cilium (v1.8) is configured with
global.devicesset,global.autoDirectNodeRoutes=true, andglobal.masquerade=true, multi-node connectivity seems to fail. In particular, attempts to resolve DNS names fail with the following errors:The query reaches the second node with kube-dns as expected, but then the answer leaves the second node with source IP address
192.168.36.12instead of the10.0.1.157expected by the client. So masquerading seems to fail (which is why settingglobal.masquerade=falseworks around the bug).@brb added a few more informations:
Related: #11969