Is there an existing issue for this?
What happened?
We use cilium vxlan and don't enable nodePort/externalIPs. So we depend kube-proxy handle service external ip.
I found when access service external ip (backend is a pod in another node) in pod, source ip will change to node ip after kube-proxy masquerade.
Because cilium_host ip addr scope is link, kernel ip masquerade wouldn't choose cilium_host even destination ip is another pod, package will send to server pod through cilium_vxlan and source ip is node ip.
Server pod will send reply package to client pod's node (not through cilium_vxlan), maybe network device (like router) didn't recognize server pod ip, it will drop the package.
- client pod ip -> service external ip
- client pod ip -> server pod ip (DNAT)
- client node ip -> server pod ip (Masquerade) through cilium_vxlan
- server pod ip -> client node ip (direct route, not through cilium_vxlan)
I try to change cilium_host ip addr scope to global, it works.
So I think we should remove scope link in init.sh to fix this issue.
Cilium Version
v1.10.15
Kernel Version
5.4.119
Kubernetes Version
v1.22.5
Sysdump
KubeProxyReplacement Details:
Status: Partial
Session Affinity: Enabled
Services:
- ClusterIP: Enabled
- NodePort: Disabled
- LoadBalancer: Disabled
- externalIPs: Disabled
- HostPort: Disabled
Relevant log output
No response
Anything else?
client pod is 172.17.0.3
client node is 192.168.2.219
client cilium_host is 172.17.0.28
external ip is 119.28.229.32
server pod is 172.17.0.177
$ ip a show cilium_host
4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether da:5d:4c:d9:e4:7e brd ff:ff:ff:ff:ff:ff
inet 172.17.0.28/32 scope link cilium_host
valid_lft forever preferred_lft forever
inet6 fe80::d85d:4cff:fed9:e47e/64 scope link
valid_lft forever preferred_lft forever
iptables
-A KUBE-SERVICES -d 119.28.229.32/32 -p tcp -m comment --comment "default/kubernetes-extranet:https loadbalancer IP" -m tcp --dport 443 -j KUBE-FW-JRLK3S5QDR5VE4WN
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
-A KUBE-FW-JRLK3S5QDR5VE4WN -m comment --comment "default/kubernetes-extranet:https loadbalancer IP" -j KUBE-MARK-MASQ
-A KUBE-FW-JRLK3S5QDR5VE4WN -m comment --comment "default/kubernetes-extranet:https loadbalancer IP" -j KUBE-SVC-JRLK3S5QDR5VE4WN
-A KUBE-FW-JRLK3S5QDR5VE4WN -m comment --comment "default/kubernetes-extranet:https loadbalancer IP" -j KUBE-MARK-DROP
$ conntrack -E -d 119.28.229.32
[NEW] tcp 6 120 SYN_SENT src=172.17.0.3 dst=119.28.229.32 sport=42614 dport=443 [UNREPLIED] src=172.17.0.177 dst=192.168.2.219 sport=443 dport=34294
After remove scope link
$ ip a del 172.17.0.28/32 dev cilium_host
$ ip a add 172.17.0.28/32 dev cilium_host
$ ip a show cilium_host
4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether da:5d:4c:d9:e4:7e brd ff:ff:ff:ff:ff:ff
inet 172.17.0.28/32 scope global cilium_host
valid_lft forever preferred_lft forever
inet6 fe80::d85d:4cff:fed9:e47e/64 scope link
valid_lft forever preferred_lft forever
$ conntrack -E -d 119.28.229.32
[NEW] tcp 6 120 SYN_SENT src=172.17.0.3 dst=119.28.229.32 sport=43612 dport=443 [UNREPLIED] src=172.17.0.177 dst=172.17.0.28 sport=443 dport=27012
- client pod ip -> service external ip
- client pod ip -> server pod ip (DNAT)
- client's cilium_host ip -> server pod ip (Masquerade) through cilium_vxlan
- server pod ip -> client's cilium_host ip through cilium_vxlan
We can see kernel ip masquerade choose cilium_host's ip 172.17.0.28 instead of node ip 192.168.2.219, and reply package also through cilium_vxlan.
Code of Conduct
Is there an existing issue for this?
What happened?
We use cilium vxlan and don't enable nodePort/externalIPs. So we depend kube-proxy handle service external ip.
I found when access service external ip (backend is a pod in another node) in pod, source ip will change to node ip after kube-proxy masquerade.
Because cilium_host ip addr scope is link, kernel ip masquerade wouldn't choose cilium_host even destination ip is another pod, package will send to server pod through cilium_vxlan and source ip is node ip.
Server pod will send reply package to client pod's node (not through cilium_vxlan), maybe network device (like router) didn't recognize server pod ip, it will drop the package.
I try to change cilium_host ip addr scope to global, it works.
So I think we should remove scope link in init.sh to fix this issue.
Cilium Version
v1.10.15
Kernel Version
5.4.119
Kubernetes Version
v1.22.5
Sysdump
KubeProxyReplacement Details:
Status: Partial
Session Affinity: Enabled
Services:
Relevant log output
No response
Anything else?
client pod is 172.17.0.3
client node is 192.168.2.219
client cilium_host is 172.17.0.28
external ip is 119.28.229.32
server pod is 172.17.0.177
iptables
After remove
scope linkWe can see kernel ip masquerade choose cilium_host's ip
172.17.0.28instead of node ip192.168.2.219, and reply package also through cilium_vxlan.Code of Conduct