Skip to content

Cilium pod in CrashLoopBackOff on IPv6 only clusters (v1.7.5+) #12201

@rolinh

Description

@rolinh

Bug report

General Information

  • Cilium version (run cilium version): v1.7.5 and v1.8.0-rc3
  • Kernel version (run uname -a): 5.4.47-1-lts
  • Orchestration system version in use (e.g. kubectl version, Mesos, ...): Kubernetes v1.18.2 deployed via kind
Cilium pod logs in v1.7.5: (click to expand)

level=info msg="Skipped reading configuration file" reason="Config File \"ciliumd\" Not Found in \"[/root]\"" subsys=daemon
level=info msg="  --access-log=''" subsys=daemon
level=info msg="  --agent-labels=''" subsys=daemon
level=info msg="  --allow-icmp-frag-needed='true'" subsys=daemon
level=info msg="  --allow-localhost='auto'" subsys=daemon
level=info msg="  --annotate-k8s-node='true'" subsys=daemon
level=info msg="  --auto-create-cilium-node-resource='true'" subsys=daemon
level=info msg="  --auto-direct-node-routes='false'" subsys=daemon
level=info msg="  --blacklist-conflicting-routes='true'" subsys=daemon
level=info msg="  --bpf-compile-debug='false'" subsys=daemon
level=info msg="  --bpf-ct-global-any-max='262144'" subsys=daemon
level=info msg="  --bpf-ct-global-tcp-max='524288'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-any='1m0s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-tcp='6h0m0s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-tcp-fin='10s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-tcp-syn='1m0s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-service-any='1m0s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-service-tcp='6h0m0s'" subsys=daemon
level=info msg="  --bpf-nat-global-max='841429'" subsys=daemon
level=info msg="  --bpf-policy-map-max='16384'" subsys=daemon
level=info msg="  --bpf-root=''" subsys=daemon
level=info msg="  --certificates-directory='/var/run/cilium/certs'" subsys=daemon
level=info msg="  --cgroup-root=''" subsys=daemon
level=info msg="  --cluster-id='0'" subsys=daemon
level=info msg="  --cluster-name='default'" subsys=daemon
level=info msg="  --clustermesh-config='/var/lib/cilium/clustermesh/'" subsys=daemon
level=info msg="  --cmdref=''" subsys=daemon
level=info msg="  --config=''" subsys=daemon
level=info msg="  --config-dir='/tmp/cilium/config-map'" subsys=daemon
level=info msg="  --conntrack-garbage-collector-interval='0'" subsys=daemon
level=info msg="  --conntrack-gc-interval='0s'" subsys=daemon
level=info msg="  --container-runtime=''" subsys=daemon
level=info msg="  --container-runtime-endpoint='map[]'" subsys=daemon
level=info msg="  --datapath-mode='veth'" subsys=daemon
level=info msg="  --debug='false'" subsys=daemon
level=info msg="  --debug-verbose=''" subsys=daemon
level=info msg="  --device='undefined'" subsys=daemon
level=info msg="  --disable-cnp-status-updates='false'" subsys=daemon
level=info msg="  --disable-conntrack='false'" subsys=daemon
level=info msg="  --disable-endpoint-crd='false'" subsys=daemon
level=info msg="  --disable-envoy-version-check='false'" subsys=daemon
level=info msg="  --disable-ipv4='false'" subsys=daemon
level=info msg="  --disable-k8s-services='false'" subsys=daemon
level=info msg="  --egress-masquerade-interfaces=''" subsys=daemon
level=info msg="  --enable-auto-protect-node-port-range='true'" subsys=daemon
level=info msg="  --enable-endpoint-health-checking='true'" subsys=daemon
level=info msg="  --enable-endpoint-routes='false'" subsys=daemon
level=info msg="  --enable-external-ips='true'" subsys=daemon
level=info msg="  --enable-health-checking='true'" subsys=daemon
level=info msg="  --enable-host-reachable-services='false'" subsys=daemon
level=info msg="  --enable-ipsec='false'" subsys=daemon
level=info msg="  --enable-ipv4='false'" subsys=daemon
level=info msg="  --enable-ipv6='true'" subsys=daemon
level=info msg="  --enable-k8s-api-discovery='false'" subsys=daemon
level=info msg="  --enable-k8s-endpoint-slice='true'" subsys=daemon
level=info msg="  --enable-k8s-event-handover='false'" subsys=daemon
level=info msg="  --enable-l7-proxy='true'" subsys=daemon
level=info msg="  --enable-local-node-route='true'" subsys=daemon
level=info msg="  --enable-node-port='true'" subsys=daemon
level=info msg="  --enable-policy='default'" subsys=daemon
level=info msg="  --enable-remote-node-identity='true'" subsys=daemon
level=info msg="  --enable-selective-regeneration='true'" subsys=daemon
level=info msg="  --enable-tracing='false'" subsys=daemon
level=info msg="  --enable-well-known-identities='false'" subsys=daemon
level=info msg="  --enable-xt-socket-fallback='true'" subsys=daemon
level=info msg="  --encrypt-interface=''" subsys=daemon
level=info msg="  --encrypt-node='false'" subsys=daemon
level=info msg="  --endpoint-interface-name-prefix='lxc+'" subsys=daemon
level=info msg="  --endpoint-queue-size='25'" subsys=daemon
level=info msg="  --endpoint-status=''" subsys=daemon
level=info msg="  --envoy-log=''" subsys=daemon
level=info msg="  --exclude-local-address=''" subsys=daemon
level=info msg="  --fixed-identity-mapping='map[]'" subsys=daemon
level=info msg="  --flannel-manage-existing-containers='false'" subsys=daemon
level=info msg="  --flannel-master-device=''" subsys=daemon
level=info msg="  --flannel-uninstall-on-exit='false'" subsys=daemon
level=info msg="  --force-local-policy-eval-at-source='true'" subsys=daemon
level=info msg="  --host-reachable-services-protos=''" subsys=daemon
level=info msg="  --http-403-msg=''" subsys=daemon
level=info msg="  --http-idle-timeout='0'" subsys=daemon
level=info msg="  --http-max-grpc-timeout='0'" subsys=daemon
level=info msg="  --http-request-timeout='3600'" subsys=daemon
level=info msg="  --http-retry-count='3'" subsys=daemon
level=info msg="  --http-retry-timeout='0'" subsys=daemon
level=info msg="  --identity-allocation-mode='crd'" subsys=daemon
level=info msg="  --identity-change-grace-period='5s'" subsys=daemon
level=info msg="  --install-iptables-rules='true'" subsys=daemon
level=info msg="  --ip-allocation-timeout='2m0s'" subsys=daemon
level=info msg="  --ipam=''" subsys=daemon
level=info msg="  --ipsec-key-file=''" subsys=daemon
level=info msg="  --iptables-lock-timeout='5s'" subsys=daemon
level=info msg="  --ipv4-cluster-cidr-mask-size='8'" subsys=daemon
level=info msg="  --ipv4-node='auto'" subsys=daemon
level=info msg="  --ipv4-pod-subnets=''" subsys=daemon
level=info msg="  --ipv4-range='auto'" subsys=daemon
level=info msg="  --ipv4-service-loopback-address='169.254.42.1'" subsys=daemon
level=info msg="  --ipv4-service-range='auto'" subsys=daemon
level=info msg="  --ipv6-cluster-alloc-cidr='f00d::/64'" subsys=daemon
level=info msg="  --ipv6-node='auto'" subsys=daemon
level=info msg="  --ipv6-pod-subnets=''" subsys=daemon
level=info msg="  --ipv6-range='auto'" subsys=daemon
level=info msg="  --ipv6-service-range='auto'" subsys=daemon
level=info msg="  --ipvlan-master-device='undefined'" subsys=daemon
level=info msg="  --k8s-api-server=''" subsys=daemon
level=info msg="  --k8s-force-json-patch='false'" subsys=daemon
level=info msg="  --k8s-heartbeat-timeout='30s'" subsys=daemon
level=info msg="  --k8s-kubeconfig-path=''" subsys=daemon
level=info msg="  --k8s-namespace='kube-system'" subsys=daemon
level=info msg="  --k8s-require-ipv4-pod-cidr='false'" subsys=daemon
level=info msg="  --k8s-require-ipv6-pod-cidr='false'" subsys=daemon
level=info msg="  --k8s-service-cache-size='128'" subsys=daemon
level=info msg="  --k8s-watcher-endpoint-selector='metadata.name!=kube-scheduler,metadata.name!=kube-controller-manager,metadata.name!=etcd-operator,metadata.name!=gcp-controller-manager'" subsys=daemon
level=info msg="  --k8s-watcher-queue-size='1024'" subsys=daemon
level=info msg="  --keep-bpf-templates='false'" subsys=daemon
level=info msg="  --keep-config='false'" subsys=daemon
level=info msg="  --kube-proxy-replacement='partial'" subsys=daemon
level=info msg="  --kvstore=''" subsys=daemon
level=info msg="  --kvstore-connectivity-timeout='2m0s'" subsys=daemon
level=info msg="  --kvstore-lease-ttl='15m0s'" subsys=daemon
level=info msg="  --kvstore-opt='map[]'" subsys=daemon
level=info msg="  --kvstore-periodic-sync='5m0s'" subsys=daemon
level=info msg="  --label-prefix-file=''" subsys=daemon
level=info msg="  --labels=''" subsys=daemon
level=info msg="  --lib-dir='/var/lib/cilium'" subsys=daemon
level=info msg="  --log-driver=''" subsys=daemon
level=info msg="  --log-opt='map[level:info]'" subsys=daemon
level=info msg="  --log-system-load='false'" subsys=daemon
level=info msg="  --masquerade='true'" subsys=daemon
level=info msg="  --max-controller-interval='0'" subsys=daemon
level=info msg="  --metrics=''" subsys=daemon
level=info msg="  --monitor-aggregation='medium'" subsys=daemon
level=info msg="  --monitor-aggregation-flags='all'" subsys=daemon
level=info msg="  --monitor-aggregation-interval='5s'" subsys=daemon
level=info msg="  --monitor-queue-size='0'" subsys=daemon
level=info msg="  --mtu='0'" subsys=daemon
level=info msg="  --nat46-range='0:0:0:0:0:FFFF::/96'" subsys=daemon
level=info msg="  --node-port-bind-protection='true'" subsys=daemon
level=info msg="  --node-port-mode='snat'" subsys=daemon
level=info msg="  --node-port-range=''" subsys=daemon
level=info msg="  --policy-queue-size='100'" subsys=daemon
level=info msg="  --policy-trigger-interval='1s'" subsys=daemon
level=info msg="  --pprof='false'" subsys=daemon
level=info msg="  --preallocate-bpf-maps='false'" subsys=daemon
level=info msg="  --prefilter-device='undefined'" subsys=daemon
level=info msg="  --prefilter-mode='native'" subsys=daemon
level=info msg="  --prepend-iptables-chains='true'" subsys=daemon
level=info msg="  --prometheus-serve-addr=''" subsys=daemon
level=info msg="  --proxy-connect-timeout='1'" subsys=daemon
level=info msg="  --read-cni-conf=''" subsys=daemon
level=info msg="  --restore='true'" subsys=daemon
level=info msg="  --sidecar-http-proxy='false'" subsys=daemon
level=info msg="  --sidecar-istio-proxy-image='cilium/istio_proxy'" subsys=daemon
level=info msg="  --single-cluster-route='false'" subsys=daemon
level=info msg="  --skip-crd-creation='false'" subsys=daemon
level=info msg="  --socket-path='/var/run/cilium/cilium.sock'" subsys=daemon
level=info msg="  --sockops-enable='false'" subsys=daemon
level=info msg="  --state-dir='/var/run/cilium'" subsys=daemon
level=info msg="  --tofqdns-dns-reject-response-code='refused'" subsys=daemon
level=info msg="  --tofqdns-enable-dns-compression='true'" subsys=daemon
level=info msg="  --tofqdns-enable-poller='false'" subsys=daemon
level=info msg="  --tofqdns-enable-poller-events='true'" subsys=daemon
level=info msg="  --tofqdns-endpoint-max-ip-per-hostname='50'" subsys=daemon
level=info msg="  --tofqdns-max-deferred-connection-deletes='10000'" subsys=daemon
level=info msg="  --tofqdns-min-ttl='0'" subsys=daemon
level=info msg="  --tofqdns-pre-cache=''" subsys=daemon
level=info msg="  --tofqdns-proxy-port='0'" subsys=daemon
level=info msg="  --tofqdns-proxy-response-max-delay='100ms'" subsys=daemon
level=info msg="  --trace-payloadlen='128'" subsys=daemon
level=info msg="  --tunnel='vxlan'" subsys=daemon
level=info msg="  --version='false'" subsys=daemon
level=info msg="  --write-cni-conf-when-ready=''" subsys=daemon
level=info msg="     _ _ _" subsys=daemon
level=info msg=" ___|_| |_|_ _ _____" subsys=daemon
level=info msg="|  _| | | | | |     |" subsys=daemon
level=info msg="|___|_|_|_|___|_|_|_|" subsys=daemon
level=info msg="Cilium 1.7.5 f524ca028 2020-06-12T14:10:36+02:00 go version go1.13.12 linux/amd64" subsys=daemon
level=info msg="cilium-envoy  version: a8f292139e923b205525feb2c8a4377005904776/1.13.2/Modified/RELEASE/BoringSSL" subsys=daemon
level=info msg="clang (7.0.0) and kernel (5.4.47) versions: OK!" subsys=linux-datapath
level=info msg="linking environment: OK!" subsys=linux-datapath
level=info msg="bpf_requirements check: OK!" subsys=linux-datapath
level=info msg="bpf_features check: OK!" subsys=linux-datapath
level=info msg="Detected mounted BPF filesystem at /sys/fs/bpf" subsys=bpf
level=info msg="Valid label prefix configuration:" subsys=labels-filter
level=info msg=" - :io.kubernetes.pod.namespace" subsys=labels-filter
level=info msg=" - :io.cilium.k8s.namespace.labels" subsys=labels-filter
level=info msg=" - :app.kubernetes.io" subsys=labels-filter
level=info msg=" - !:io.kubernetes" subsys=labels-filter
level=info msg=" - !:kubernetes.io" subsys=labels-filter
level=info msg=" - !:.*beta.kubernetes.io" subsys=labels-filter
level=info msg=" - !:k8s.io" subsys=labels-filter
level=info msg=" - !:pod-template-generation" subsys=labels-filter
level=info msg=" - !:pod-template-hash" subsys=labels-filter
level=info msg=" - !:controller-revision-hash" subsys=labels-filter
level=info msg=" - !:annotation.*" subsys=labels-filter
level=info msg=" - !:etcd_node" subsys=labels-filter
level=info msg="Using auto-derived device for BPF node port" interface=eth0 subsys=daemon
level=info msg="Initializing daemon" subsys=daemon
level=info msg="Detected MTU 1500" subsys=mtu
level=info msg="Restored services from maps" failed=0 restored=0 subsys=service
level=info msg="Removing stale endpoint interfaces" subsys=daemon
level=info msg="Establishing connection to apiserver" host="https://[fd00:10:96::1]:443" subsys=k8s
level=info msg="Connected to apiserver" subsys=k8s
level=info msg="Retrieved node information from kubernetes" nodeName=kind-control-plane subsys=k8s
level=info msg="Received own node information from API server" ipAddr.ipv4="<nil>" ipAddr.ipv6="fc00:f853:ccd:e793::3" nodeName=kind-control-plane subsys=k8s v4Prefix="<nil>" v6Prefix="fd00:10:244::/80"
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=CiliumNetworkPolicy/v2 subsys=k8s
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=CiliumClusterwideNetworkPolicy/v2 subsys=k8s
level=info msg="Updating CRD (CustomResourceDefinition)..." name=v2.CiliumEndpoint subsys=k8s
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=v2.CiliumEndpoint subsys=k8s
level=info msg="Updating CRD (CustomResourceDefinition)..." name=v2.CiliumNode subsys=k8s
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=v2.CiliumNode subsys=k8s
level=info msg="Updating CRD (CustomResourceDefinition)..." name=v2.CiliumIdentity subsys=k8s
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=v2.CiliumIdentity subsys=k8s
level=info msg="k8s mode: Allowing localhost to reach local endpoints" subsys=daemon
level=info msg="Initializing node addressing" subsys=daemon
level=info msg="Restored router address from node_config" file=/var/run/cilium/state/globals/node_config.h ipv4="<nil>" ipv6="fd00:10:244::4533" subsys=node
level=info msg="Initializing hostscope IPAM" subsys=ipam v4Prefix="<nil>" v6Prefix="fd00:10:244::/80"
level=info msg="Restoring endpoints..." subsys=daemon
level=info msg="Envoy: Starting xDS gRPC server listening on /var/run/cilium/xds.sock" subsys=envoy-manager
level=info msg="No old endpoints found." subsys=daemon
level=info msg="Addressing information:" subsys=daemon
level=info msg="  Cluster-Name: default" subsys=daemon
level=info msg="  Cluster-ID: 0" subsys=daemon
level=info msg="  Local node-name: kind-control-plane" subsys=daemon
level=info msg="  Node-IPv6: fc00:f853:ccd:e793::3" subsys=daemon
level=info msg="  IPv6 allocation prefix: fd00:10:244::/80" subsys=daemon
level=info msg="  IPv6 router address: fd00:10:244::4533" subsys=daemon
level=info msg="  Local IPv6 addresses:" subsys=daemon
level=info msg="  - fc00:f853:ccd:e793::3" subsys=daemon
level=info msg="  - fc00:f853:ccd:e793::3" subsys=daemon
level=info msg="  - fe80::50d8:edff:fe54:b72e" subsys=daemon
level=info msg="  External-Node IPv4: <nil>" subsys=daemon
level=info msg="  Internal-Node IPv4: <nil>" subsys=daemon
level=info msg="Annotating k8s node" subsys=daemon v4CiliumHostIP.IPv4="<nil>" v4Prefix="<nil>" v4healthIP.IPv4="<nil>" v6CiliumHostIP.IPv6="fd00:10:244::4533" v6Prefix="fd00:10:244::/80" v6healthIP.IPv6="fd00:10:244::21cc"
level=info msg="Adding local node to cluster" subsys=nodediscovery
level=info msg="Initializing identity allocator" subsys=identity-cache
level=info msg="Cluster-ID is not specified, skipping ClusterMesh initialization" subsys=daemon
level=info msg="Setting up base BPF datapath" subsys=datapath-loader
level=info msg="Setting sysctl net.core.bpf_jit_enable=1" subsys=datapath-loader
level=warning msg="Failed to sysctl -w" error="could not open the sysctl file /proc/sys/net/core/bpf_jit_enable: open /proc/sys/net/core/bpf_jit_enable: no such file or directory" subsys=datapath-loader sysParamName=net.core.bpf_jit_enable sysParamValue=1
level=info msg="Setting sysctl net.ipv4.conf.all.rp_filter=0" subsys=datapath-loader
level=info msg="Setting sysctl kernel.unprivileged_bpf_disabled=1" subsys=datapath-loader
level=info msg="Setting sysctl net.ipv6.conf.all.disable_ipv6=0" subsys=datapath-loader
level=info msg="Serving cilium node monitor v1.2 API at unix:///var/run/cilium/monitor1_2.sock" subsys=monitor-agent
level=info msg="Starting IP identity watcher" subsys=ipcache
level=info msg="Adding new proxy port rules for cilium-dns-egress:40707" proxy port name=cilium-dns-egress subsys=proxy
level=info msg="Validating configured node address ranges" subsys=daemon
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon

How to reproduce the issue

  1. Create a cluster using kind by following this documentation and use this configuration for kind:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
networking:
  disableDefaultCNI: true
  ipFamily: ipv6
  1. Deploy cilium via Helm with global.ipv4.enabled=false and global.ipv6.enabled=true:
 helm install cilium cilium/cilium --set global.tag=v1.7.5 --namespace kube-system --set global.nodeinit.enabled=true --set global.kubeProxyReplacement=partial --set global.hostServices.enabled=false --set global.externalIPs.enabled=true --set global.nodePort.enabled=true --set global.hostPort.enabled=true --set global.pullPolicy=IfNotPresent --set global.ipv6.enabled=true --set global.ipv4.enabled=false

Note that I also reproduced this issue with latest master 62e4558bf81826b950ec2a75ca468553f67784ca + patches from #12198 and #12197 (without these patches, cilium panic as described here).

Metadata

Metadata

Assignees

Labels

kind/bugThis is a bug in the Cilium logic.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions