Skip to content

Connection to Nodeport service failing on a dual-stack GKE cluster with Ubuntu 24.04 (picks link-local address for IPV6_DIRECT_ROUTING) #36752

@pravk03

Description

@pravk03

Is there an existing issue for this?

  • I have searched the existing issues

Version

equal or higher than v1.15.11 and lower than v1.16.0

What happened?

We're experiencing NodePort service issues on Dual-stackGKE clusters running Ubuntu 24.04. The connection to a NodePort service fails in some cases when IPv6 Node IP address is connect to the service. The issue seems to be happening because the eBPF program SNAT's the src IPv6 address to the link-local IP address on the node instead of global Node IP and the packet gets dropped when the backend for the NodePort service is not running on the node that received the request. The issue disappears when cilium-agent is restarted (sometimes a few times) and also does not occur on nodes running older Ubuntu versions (22.04).

How can we reproduce the issue?

  1. Create a GKE dual stack cluster with a minimum of 3 nodes.

  2. Create a Node Port service

apiVersion: v1
kind: Service
metadata:
  name: nodeport-svc
spec:
  type: NodePort
  ipFamilyPolicy: RequireDualStack
  ipFamilies:
  - IPv4
  - IPv6
  ports:
  - port: 8090
    name: port-8090
    targetPort: 8090
    protocol: TCP
    nodePort: 30007
  selector:
    app: porter

  1. Create a backend for the service
apiVersion: apps/v1
kind: Deployment
metadata:
  name: svc-backend
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: porter
  template:
    metadata:
      labels:
        app: porter
    spec:
      containers:
        - image: gcr.io/kubernetes-e2e-test-images/agnhost:2.2
          imagePullPolicy: Always
          name: porter
          env:
          - name: SERVE_PORT_8090
            value: SERVE_PORT_8090
          args:
          - porter
          ports:
          - name: serve-8090
            containerPort: 8090
  1. Create a client pod. Ensure that it is scheduled on a different node than the backend pod.
apiVersion: v1
kind: Pod
metadata:
  name: clientpod
  namespace: default
  labels:
    app: synth-test
spec:
  restartPolicy: Never
  nodeName: <NODE_NAME>
  containers:
  - image: praqma/network-multitool
    imagePullPolicy: Always
    name: multitool
    securityContext:
      capabilities:
        add:
        - NET_RAW
  1. Get the Node IP's (v4 and v6) . Exec to the client pod and use CURL to connect to the NodePort Service using the Node Ip's.
kubectl describe nodes | grep -i "internalip\\|externalip\\|addresses" -A5
kubectl exec -it clientpod -- /bin/bash

# Inside the pod

 # Node running client pod 
CLIENT_NODE_IPv4=10.208.120.5
CLIENT_NODE_IPv6=2600:1900:4001:83a:0:2:0:0
 # Node running backend pod 
BACKEND_NODE_IPv4=10.208.120.4
BACKEND_NODE_IPv6=2600:1900:4001:83a:0:1:0:0
 # Node not running any pods
OTHER_NODE_IPv4=10.208.120.3
OTHER_NODE_IPv6=2600:1900:4001:83a:0:0:0:0

# Send curl request to connect to the NodePort service using Node IP's
date && curl --connect-timeout 3 $CLIENT_NODE_IPv4:30007 && echo -e "${GREEN} curl from CLIENT_NODE_IPv4 to node port 30007 success${NC}" || echo -e "${RED} curl from CLIENT_NODE_IPv4 to node port 30007 fail${NC}" && date
date && curl --connect-timeout 3 [$CLIENT_NODE_IPv6]:30007 && echo -e "${GREEN} curl from CLIENT_NODE_IPv6 to node port 30007 success${NC}" || echo -e "${RED} curl from CLIENT_NODE_IPv6 to node port 30007 fail${NC}" && date
date && curl --connect-timeout 3 $BACKEND_NODE_IPv4:30007 && echo -e "${GREEN} curl from BACKEND_NODE_IPv4 to node port 30007 success${NC}" || echo -e "${RED} curl from BACKEND_NODE_IPv4 to node port 30007 fail${NC}" && date
date && curl --connect-timeout 3 [$BACKEND_NODE_IPv6]:30007 && echo -e "${GREEN} curl from BACKEND_NODE_IPv6 to node port 30007 success${NC}" || echo -e "${RED} curl from BACKEND_NODE_IPv6 to node port 30007 fail${NC}" && date
date && curl --connect-timeout 3 $OTHER_NODE_IPv4:30007 && echo -e "${GREEN} curl from OTHER_NODE_IPv4 to node port 30007 success${NC}" || echo -e "${RED} curl from OTHER_NODE_IPv4 to node port 30007 fail${NC}" && date
date && curl --connect-timeout 3 [$OTHER_NODE_IPv6]:30007 && echo -e "${GREEN} curl from OTHER_NODE_IPv6 to node port 30007 success${NC}" || echo -e "${RED} curl from OTHER_NODE_IPv6 to node port 30007 fail${NC}" && date


# The CURL request to OTHER_NODE_IPv6 times out consistently while rest of the curl requests succeed 
SERVE_PORT_8090 curl from CLIENT_NODE_IPv4 to node port 30007 success
Sat Dec 21 01:48:23 UTC 2024
Sat Dec 21 01:48:23 UTC 2024
SERVE_PORT_8090 curl from CLIENT_NODE_IPv6 to node port 30007 success
Sat Dec 21 01:48:23 UTC 2024
Sat Dec 21 01:48:23 UTC 2024
SERVE_PORT_8090 curl from BACKEND_NODE_IPv4 to node port 30007 success
Sat Dec 21 01:48:23 UTC 2024
Sat Dec 21 01:48:23 UTC 2024
SERVE_PORT_8090 curl from BACKEND_NODE_IPv6 to node port 30007 success
Sat Dec 21 01:48:23 UTC 2024
Sat Dec 21 01:48:23 UTC 2024
SERVE_PORT_8090 curl from OTHER_NODE_IPv4 to node port 30007 success
Sat Dec 21 01:48:23 UTC 2024
Sat Dec 21 01:48:23 UTC 2024
curl: (28) Connection timeout after 3000 ms
 curl from OTHER_NODE_IPv6 to node port 30007 fail
Sat Dec 21 01:48:26 UTC 2024
  1. Exec to the cilium-agent running on the node dropping the packet (OTHER_NODE_IPv6 from Replacing set id with method #5) and check cilium-monitor.
cilium monitor --type drop
Listening for events on 2 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
time="2024-12-21T01:53:04Z" level=info msg="Initializing dissection cache..." subsys=monitor
xx drop (FIB lookup failed, 4) flow 0x0 to endpoint 0, ifindex 2, file nodeport.h:1292, , identity unknown->unknown: [fe80::4001:aff:fed0:7803]:59806 -> [2600:1900:4001:83a:0:1:0:4]:8090 tcp SYN
xx drop (FIB lookup failed, 4) flow 0x0 to endpoint 0, ifindex 2, file nodeport.h:1292, , identity unknown->unknown: [fe80::4001:aff:fed0:7803]:59806 -> [2600:1900:4001:83a:0:1:0:4]:8090 tcp SYN
xx drop (FIB lookup failed, 4) flow 0x0 to endpoint 0, ifindex 2, file nodeport.h:1292, , identity unknown->unknown: [fe80::4001:aff:fed0:7803]:59806 -> [2600:1900:4001:83a:0:1:0:4]:8090 tcp SYN

Cilium monitor reports a drop because of fib lookup failure. The log is coming from tail_nodeport_nat_egress_ipv6().

  1. Deploy a pwru pod on the node dropping the packet (OTHER_NODE_IPv6 from Replacing set id with method #5)
apiVersion: v1
kind: Pod
metadata:
  name: pwru
spec:
  nodeName: gke-ds-nodeport-testing--default-pool-aeb0b5f2-njq2
  containers:
  - image: docker.io/cilium/pwru:latest
    name: pwru
    volumeMounts:
    - mountPath: /sys/kernel/debug
      name: sys-kernel-debug
    securityContext:
      privileged: true
    command: ["/bin/sh"]
    args: ["-c", "pwru 'tcp and (dst port 30007 or dst port 8090)'"]
  volumes:
  - name: sys-kernel-debug
    hostPath:
      path: /sys/kernel/debug
      type: DirectoryOrCreate
  hostNetwork: true
  hostPID: true

Noticed that the packet is getting SNAT'd to the link-local IP (fe80::4001:aff:fed0:7803) instead of the global Node IP (2600:1900:4001:83a::/128) and the request gets dropped as the backend is not on the same node.

k logs -f pwru
2024/12/21 00:34:04 Attaching kprobes (via kprobe-multi)...
1754 / 1754 [------------------------------------------------------------------------] 100.00% ? p/s2024/12/21 00:34:04 Attached (ignored 0)
2024/12/21 00:34:04 Listening for events..
SKB                CPU PROCESS          NETNS      MARK/x        IFACE       PROTO  MTU   LEN   TUPLE FUNC
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  80    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) ipv6_gro_receive
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  80    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) tcp6_gro_receive
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  80    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) tcp_gro_receive
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  80    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) skb_defer_rx_timestamp
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  80    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) tc_run
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  80    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) tcf_classify
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  94    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) skb_ensure_writable
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  94    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a::]:30007(tcp) skb_ensure_writable
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  94    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:30007(tcp) skb_ensure_writable
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  94    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:30007(tcp) inet_proto_csum_replace_by_diff
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  94    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:30007(tcp) skb_ensure_writable
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  94    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:30007(tcp) inet_proto_csum_replace4
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  94    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:30007(tcp) skb_ensure_writable
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  94    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp)  skb_ensure_writable
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  94    [2600:1900:4001:83a:0:2:0:5]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp)  skb_ensure_writable
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  94    [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp)    skb_ensure_writable
0xffff8c88c45a4700 0   <empty>:0        4026531840 0             ens4:2      0x86dd 1460  94    [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp)    inet_proto_csum_replace_by_diff
0xffff8c88c45a4700 0   <empty>:0        4026531840 300           ens4:2      0x86dd 1460  80    [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp)    kfree_skb_reason(SKB_DROP_REASON_TC_INGRESS)
0xffff8c88c45a4700 0   <empty>:0        4026531840 300           ens4:2      0x86dd 1460  80    [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp)    skb_release_head_state
0xffff8c88c45a4700 0   <empty>:0        4026531840 300           ens4:2      0x86dd 1460  80    [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp)    skb_release_data
0xffff8c88c45a4700 0   <empty>:0        4026531840 300           ens4:2      0x86dd 1460  80    [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp)    skb_free_head
0xffff8c88c45a4700 0   <empty>:0        4026531840 300           ens4:2      0x86dd 1460  80    [fe80::4001:aff:fed0:7803]:58228->[2600:1900:4001:83a:0:1:0:4]:8090(tcp)    kfree_skbmem

IP Address config on ens4 interface

ip  addr show dev ens4
2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP group default qlen 1000
    link/ether 42:01:0a:d0:78:03 brd ff:ff:ff:ff:ff:ff
    altname enp0s4
    inet 10.208.120.3/32 metric 100 scope global dynamic ens4
       valid_lft 3335sec preferred_lft 3335sec
    inet6 2600:1900:4001:83a::/128 scope global dynamic noprefixroute 
       valid_lft 2975sec preferred_lft 2975sec
    inet6 fe80::4001:aff:fed0:7803/64 scope link 
       valid_lft forever preferred_lft forever

Cilium Version

Client: 1.15.6 940a32ec47 2024-11-05T01:45:09+00:00 go version go1.22.8 linux/amd64
Daemon: 1.15.6 940a32ec47 2024-11-05T01:45:09+00:00 go version go1.22.8 linux/amd64

Kernel Version

6.8.0-1014-gke #18-Ubuntu SMP Thu Nov 14 02:31:14 UTC 2024 x86_64 GNU/Linux

Kubernetes Version

Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.32.0-gke.1066000

Regression

No response

Sysdump

No response

Relevant log output

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.area/loadbalancingImpacts load-balancing and Kubernetes service implementationsfeature/ipv6Relates to IPv6 protocol supportkind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.needs/triageThis issue requires triaging to establish severity and next steps.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions