-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Connection to Nodeport service failing on a dual-stack GKE cluster with Ubuntu 24.04 (picks link-local address for IPV6_DIRECT_ROUTING) #36752
Description
Is there an existing issue for this?
- I have searched the existing issues
Version
equal or higher than v1.15.11 and lower than v1.16.0
What happened?
We're experiencing NodePort service issues on Dual-stackGKE clusters running Ubuntu 24.04. The connection to a NodePort service fails in some cases when IPv6 Node IP address is connect to the service. The issue seems to be happening because the eBPF program SNAT's the src IPv6 address to the link-local IP address on the node instead of global Node IP and the packet gets dropped when the backend for the NodePort service is not running on the node that received the request. The issue disappears when cilium-agent is restarted (sometimes a few times) and also does not occur on nodes running older Ubuntu versions (22.04).
How can we reproduce the issue?
-
Create a GKE dual stack cluster with a minimum of 3 nodes.
-
Create a Node Port service
apiVersion: v1
kind: Service
metadata:
name: nodeport-svc
spec:
type: NodePort
ipFamilyPolicy: RequireDualStack
ipFamilies:
- IPv4
- IPv6
ports:
- port: 8090
name: port-8090
targetPort: 8090
protocol: TCP
nodePort: 30007
selector:
app: porter
- Create a backend for the service
apiVersion: apps/v1
kind: Deployment
metadata:
name: svc-backend
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: porter
template:
metadata:
labels:
app: porter
spec:
containers:
- image: gcr.io/kubernetes-e2e-test-images/agnhost:2.2
imagePullPolicy: Always
name: porter
env:
- name: SERVE_PORT_8090
value: SERVE_PORT_8090
args:
- porter
ports:
- name: serve-8090
containerPort: 8090
- Create a client pod. Ensure that it is scheduled on a different node than the backend pod.
apiVersion: v1
kind: Pod
metadata:
name: clientpod
namespace: default
labels:
app: synth-test
spec:
restartPolicy: Never
nodeName: <NODE_NAME>
containers:
- image: praqma/network-multitool
imagePullPolicy: Always
name: multitool
securityContext:
capabilities:
add:
- NET_RAW
- Get the Node IP's (v4 and v6) . Exec to the client pod and use CURL to connect to the NodePort Service using the Node Ip's.
kubectl describe nodes | grep -i "internalip\\|externalip\\|addresses" -A5
kubectl exec -it clientpod -- /bin/bash
# Inside the pod
# Node running client pod
CLIENT_NODE_IPv4=10.208.120.5
CLIENT_NODE_IPv6=2600:1900:4001:83a:0:2:0:0
# Node running backend pod
BACKEND_NODE_IPv4=10.208.120.4
BACKEND_NODE_IPv6=2600:1900:4001:83a:0:1:0:0
# Node not running any pods
OTHER_NODE_IPv4=10.208.120.3
OTHER_NODE_IPv6=2600:1900:4001:83a:0:0:0:0
# Send curl request to connect to the NodePort service using Node IP's
date && curl --connect-timeout 3 $CLIENT_NODE_IPv4:30007 && echo -e "${GREEN} curl from CLIENT_NODE_IPv4 to node port 30007 success${NC}" || echo -e "${RED} curl from CLIENT_NODE_IPv4 to node port 30007 fail${NC}" && date
date && curl --connect-timeout 3 [$CLIENT_NODE_IPv6]:30007 && echo -e "${GREEN} curl from CLIENT_NODE_IPv6 to node port 30007 success${NC}" || echo -e "${RED} curl from CLIENT_NODE_IPv6 to node port 30007 fail${NC}" && date
date && curl --connect-timeout 3 $BACKEND_NODE_IPv4:30007 && echo -e "${GREEN} curl from BACKEND_NODE_IPv4 to node port 30007 success${NC}" || echo -e "${RED} curl from BACKEND_NODE_IPv4 to node port 30007 fail${NC}" && date
date && curl --connect-timeout 3 [$BACKEND_NODE_IPv6]:30007 && echo -e "${GREEN} curl from BACKEND_NODE_IPv6 to node port 30007 success${NC}" || echo -e "${RED} curl from BACKEND_NODE_IPv6 to node port 30007 fail${NC}" && date
date && curl --connect-timeout 3 $OTHER_NODE_IPv4:30007 && echo -e "${GREEN} curl from OTHER_NODE_IPv4 to node port 30007 success${NC}" || echo -e "${RED} curl from OTHER_NODE_IPv4 to node port 30007 fail${NC}" && date
date && curl --connect-timeout 3 [$OTHER_NODE_IPv6]:30007 && echo -e "${GREEN} curl from OTHER_NODE_IPv6 to node port 30007 success${NC}" || echo -e "${RED} curl from OTHER_NODE_IPv6 to node port 30007 fail${NC}" && date
# The CURL request to OTHER_NODE_IPv6 times out consistently while rest of the curl requests succeed
SERVE_PORT_8090 curl from CLIENT_NODE_IPv4 to node port 30007 success
Sat Dec 21 01:48:23 UTC 2024
Sat Dec 21 01:48:23 UTC 2024
SERVE_PORT_8090 curl from CLIENT_NODE_IPv6 to node port 30007 success
Sat Dec 21 01:48:23 UTC 2024
Sat Dec 21 01:48:23 UTC 2024
SERVE_PORT_8090 curl from BACKEND_NODE_IPv4 to node port 30007 success
Sat Dec 21 01:48:23 UTC 2024
Sat Dec 21 01:48:23 UTC 2024
SERVE_PORT_8090 curl from BACKEND_NODE_IPv6 to node port 30007 success
Sat Dec 21 01:48:23 UTC 2024
Sat Dec 21 01:48:23 UTC 2024
SERVE_PORT_8090 curl from OTHER_NODE_IPv4 to node port 30007 success
Sat Dec 21 01:48:23 UTC 2024
Sat Dec 21 01:48:23 UTC 2024
curl: (28) Connection timeout after 3000 ms
curl from OTHER_NODE_IPv6 to node port 30007 fail
Sat Dec 21 01:48:26 UTC 2024
- Exec to the cilium-agent running on the node dropping the packet (OTHER_NODE_IPv6 from Replacing set id with method #5) and check cilium-monitor.
cilium monitor --type drop
Listening for events on 2 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
time="2024-12-21T01:53:04Z" level=info msg="Initializing dissection cache..." subsys=monitor
xx drop (FIB lookup failed, 4) flow 0x0 to endpoint 0, ifindex 2, file nodeport.h:1292, , identity unknown->unknown: [fe80::4001:aff:fed0:7803]:59806 -> [2600:1900:4001:83a:0:1:0:4]:8090 tcp SYN
xx drop (FIB lookup failed, 4) flow 0x0 to endpoint 0, ifindex 2, file nodeport.h:1292, , identity unknown->unknown: [fe80::4001:aff:fed0:7803]:59806 -> [2600:1900:4001:83a:0:1:0:4]:8090 tcp SYN
xx drop (FIB lookup failed, 4) flow 0x0 to endpoint 0, ifindex 2, file nodeport.h:1292, , identity unknown->unknown: [fe80::4001:aff:fed0:7803]:59806 -> [2600:1900:4001:83a:0:1:0:4]:8090 tcp SYN
Cilium monitor reports a drop because of fib lookup failure. The log is coming from tail_nodeport_nat_egress_ipv6().
- Deploy a pwru pod on the node dropping the packet (OTHER_NODE_IPv6 from Replacing set id with method #5)
apiVersion: v1
kind: Pod
metadata:
name: pwru
spec:
nodeName: gke-ds-nodeport-testing--default-pool-aeb0b5f2-njq2
containers:
- image: docker.io/cilium/pwru:latest
name: pwru
volumeMounts:
- mountPath: /sys/kernel/debug
name: sys-kernel-debug
securityContext:
privileged: true
command: ["/bin/sh"]
args: ["-c", "pwru 'tcp and (dst port 30007 or dst port 8090)'"]
volumes:
- name: sys-kernel-debug
hostPath:
path: /sys/kernel/debug
type: DirectoryOrCreate
hostNetwork: true
hostPID: true
Noticed that the packet is getting SNAT'd to the link-local IP (fe80::4001:aff:fed0:7803) instead of the global Node IP (2600:1900:4001:83a::/128) and the request gets dropped as the backend is not on the same node.
k logs -f pwru
2024/12/21 00:34:04 Attaching kprobes (via kprobe-multi)...
1754 / 1754 [------------------------------------------------------------------------] 100.00% ? p/s2024/12/21 00:34:04 Attached (ignored 0)
2024/12/21 00:34:04 Listening for events..
SKB CPU PROCESS NETNS MARK/x IFACE PROTO MTU LEN TUPLE FUNC
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 80 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) ipv6_gro_receive
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 80 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) tcp6_gro_receive
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 80 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) tcp_gro_receive
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 80 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) skb_defer_rx_timestamp
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 80 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) tc_run
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 80 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) tcf_classify
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 94 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) skb_ensure_writable
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 94 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) skb_ensure_writable
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 94 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:30007(tcp) skb_ensure_writable
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 94 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:30007(tcp) inet_proto_csum_replace_by_diff
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 94 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:30007(tcp) skb_ensure_writable
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 94 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:30007(tcp) inet_proto_csum_replace4
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 94 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:30007(tcp) skb_ensure_writable
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 94 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp) skb_ensure_writable
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 94 [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp) skb_ensure_writable
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 94 [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp) skb_ensure_writable
0xffff8c88c45a4700 0 <empty>:0 4026531840 0 ens4:2 0x86dd 1460 94 [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp) inet_proto_csum_replace_by_diff
0xffff8c88c45a4700 0 <empty>:0 4026531840 300 ens4:2 0x86dd 1460 80 [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp) kfree_skb_reason(SKB_DROP_REASON_TC_INGRESS)
0xffff8c88c45a4700 0 <empty>:0 4026531840 300 ens4:2 0x86dd 1460 80 [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp) skb_release_head_state
0xffff8c88c45a4700 0 <empty>:0 4026531840 300 ens4:2 0x86dd 1460 80 [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp) skb_release_data
0xffff8c88c45a4700 0 <empty>:0 4026531840 300 ens4:2 0x86dd 1460 80 [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp) skb_free_head
0xffff8c88c45a4700 0 <empty>:0 4026531840 300 ens4:2 0x86dd 1460 80 [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp) kfree_skbmem
IP Address config on ens4 interface
ip addr show dev ens4
2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP group default qlen 1000
link/ether 42:01:0a:d0:78:03 brd ff:ff:ff:ff:ff:ff
altname enp0s4
inet 10.208.120.3/32 metric 100 scope global dynamic ens4
valid_lft 3335sec preferred_lft 3335sec
inet6 2600:1900:4001:83a::/128 scope global dynamic noprefixroute
valid_lft 2975sec preferred_lft 2975sec
inet6 fe80::4001:aff:fed0:7803/64 scope link
valid_lft forever preferred_lft forever
Cilium Version
Client: 1.15.6 940a32ec47 2024-11-05T01:45:09+00:00 go version go1.22.8 linux/amd64
Daemon: 1.15.6 940a32ec47 2024-11-05T01:45:09+00:00 go version go1.22.8 linux/amd64
Kernel Version
6.8.0-1014-gke #18-Ubuntu SMP Thu Nov 14 02:31:14 UTC 2024 x86_64 GNU/Linux
Kubernetes Version
Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.32.0-gke.1066000
Regression
No response
Sysdump
No response
Relevant log output
Anything else?
No response
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct