PR #10724 but only affected the "kernel" ginkgo test job.
Jenkins link
test_results_Cilium-PR-Ginkgo-Tests-Kernel_97_BDD-Test-PR.zip
Saw unable to allocate memory while invoking TC to load a BPF program. Did we just run out of memory on the node? What's different for the "Cilium-tests-with-kernel" target?
Observed in the following tests:
Stacktrace
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Kernel/k8s-1.17-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:357
Found a "JoinEP: " in Cilium Logs
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Kernel/k8s-1.17-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:546
Standard Output
⚠️ Found a "JoinEP: " in logs
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 3
⚠️ Number of "level=warning" in logs: 82
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
Top 5 errors/warnings:
Error while rewriting endpoint BPF program
Command execution failed
endpoint regeneration failed
Unable to flush Cilium iptables chain
Unable to delete Cilium iptables chain
Cilium pods: [cilium-426dw cilium-cdh9w]
Netpols loaded:
CiliumNetworkPolicies loaded: 202004012306k8spolicytestbasictestchecksallkindofkubernetespoli::l7-policy
Endpoint Policy Enforcement:
Pod Ingress Egress
app1-68cb4f68c5-xqkfv
app2-776cf4f9c6-6vjq2
app3-65f9dc989c-nhxzd
coredns-767d4c6dd7-f8js6
app1-68cb4f68c5-frv2x
Cilium agent 'cilium-426dw': Status: Ok Health: Ok Nodes "" ContinerRuntime: Kubernetes: Ok KVstore: Ok Controllers: Total 34 Failed 0
Cilium agent 'cilium-cdh9w': Status: Ok Health: Ok Nodes "" ContinerRuntime: Kubernetes: Ok KVstore: Ok Controllers: Total 14 Failed 0
Standard Error
STEP: Installing Cilium
STEP: Installing DNS Deployment
STEP: Restarting DNS Pods
STEP: Performing Cilium preflight check
STEP: Performing Cilium status preflight check
STEP: Performing Cilium controllers preflight check
STEP: Performing Cilium health check
STEP: Performing Cilium service preflight check
STEP: Performing K8s service preflight check
STEP: Waiting for cilium-operator to be ready
STEP: Waiting for kube-dns to be ready
STEP: Running kube-dns preflight check
STEP: Performing K8s service preflight check
STEP: Deleting namespace 202004012306k8spolicytestbasictestchecksallkindofkubernetespoli
STEP: Creating namespace 202004012306k8spolicytestbasictestchecksallkindofkubernetespoli
STEP: Testing L3/L4 rules
STEP: Testing L7 Policy
=== Test Finished at 2020-04-01T23:07:11Z====
===================== TEST FAILED =====================
cmd: kubectl get pods -o wide --all-namespaces
Exitcode: 0
Stdout:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
202004012306k8spolicytestbasictestchecksallkindofkubernetespoli app1-68cb4f68c5-frv2x 2/2 Running 0 46s 10.10.0.158 k8s1 <none> <none>
202004012306k8spolicytestbasictestchecksallkindofkubernetespoli app1-68cb4f68c5-xqkfv 2/2 Running 0 46s 10.10.0.146 k8s1 <none> <none>
202004012306k8spolicytestbasictestchecksallkindofkubernetespoli app2-776cf4f9c6-6vjq2 1/1 Running 0 46s 10.10.0.199 k8s1 <none> <none>
202004012306k8spolicytestbasictestchecksallkindofkubernetespoli app3-65f9dc989c-nhxzd 1/1 Running 0 46s 10.10.0.24 k8s1 <none> <none>
kube-system cilium-426dw 1/1 Running 0 98s 192.168.36.11 k8s1 <none> <none>
kube-system cilium-cdh9w 1/1 Running 0 98s 192.168.36.12 k8s2 <none> <none>
kube-system cilium-operator-56b6658d98-k6ktd 1/1 Running 0 98s 192.168.36.12 k8s2 <none> <none>
kube-system coredns-767d4c6dd7-f8js6 1/1 Running 0 83s 10.10.0.192 k8s1 <none> <none>
kube-system etcd-k8s1 1/1 Running 0 3m52s 192.168.36.11 k8s1 <none> <none>
kube-system kube-apiserver-k8s1 1/1 Running 0 3m52s 192.168.36.11 k8s1 <none> <none>
kube-system kube-controller-manager-k8s1 1/1 Running 0 3m52s 192.168.36.11 k8s1 <none> <none>
kube-system kube-proxy-gzklg 1/1 Running 0 2m19s 192.168.36.12 k8s2 <none> <none>
kube-system kube-proxy-vx8dp 1/1 Running 0 3m56s 192.168.36.11 k8s1 <none> <none>
kube-system kube-scheduler-k8s1 1/1 Running 0 3m52s 192.168.36.11 k8s1 <none> <none>
kube-system log-gatherer-bw68z 1/1 Running 0 104s 192.168.36.12 k8s2 <none> <none>
kube-system log-gatherer-tl4vm 1/1 Running 0 104s 192.168.36.11 k8s1 <none> <none>
kube-system registry-adder-7tjtv 1/1 Running 0 2m13s 192.168.36.12 k8s2 <none> <none>
kube-system registry-adder-kg7dm 1/1 Running 0 2m13s 192.168.36.11 k8s1 <none> <none>
Stderr:
Fetching command output from pods [cilium-426dw cilium-cdh9w]
cmd: kubectl exec -n kube-system cilium-426dw -- cilium service list
Exitcode: 0
Stdout:
ID Frontend Service Type Backend
1 10.96.0.1:443 ClusterIP 1 => 192.168.36.11:6443
2 10.96.0.10:9153 ClusterIP 1 => 10.10.0.192:9153
3 10.96.0.10:53 ClusterIP 1 => 10.10.0.192:53
4 10.103.25.228:69 ClusterIP 1 => 10.10.0.158:69
2 => 10.10.0.146:69
5 10.103.25.228:80 ClusterIP 1 => 10.10.0.158:80
2 => 10.10.0.146:80
Stderr:
cmd: kubectl exec -n kube-system cilium-426dw -- cilium endpoint list
Exitcode: 0
Stdout:
ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS
ENFORCEMENT ENFORCEMENT
232 Disabled Disabled 17917 k8s:id=app3 f00d::a0a:0:0:df56 10.10.0.24 ready
k8s:io.cilium.k8s.policy.cluster=default
k8s:io.cilium.k8s.policy.serviceaccount=default
k8s:io.kubernetes.pod.namespace=202004012306k8spolicytestbasictestchecksallkindofkubernetespoli
k8s:zgroup=testapp
341 Disabled Disabled 4 reserved:health f00d::a0a:0:0:ce46 10.10.0.40 ready
359 Disabled Disabled 27845 k8s:appSecond=true f00d::a0a:0:0:c256 10.10.0.199 ready
k8s:id=app2
k8s:io.cilium.k8s.policy.cluster=default
k8s:io.cilium.k8s.policy.serviceaccount=app2-account
k8s:io.kubernetes.pod.namespace=202004012306k8spolicytestbasictestchecksallkindofkubernetespoli
k8s:zgroup=testapp
1046 Enabled Disabled 5731 k8s:id=app1 f00d::a0a:0:0:7ba9 10.10.0.158 ready
k8s:io.cilium.k8s.policy.cluster=default
k8s:io.cilium.k8s.policy.serviceaccount=app1-account
k8s:io.kubernetes.pod.namespace=202004012306k8spolicytestbasictestchecksallkindofkubernetespoli
k8s:zgroup=testapp
1512 Enabled Disabled 5731 k8s:id=app1 f00d::a0a:0:0:2cce 10.10.0.146 ready
k8s:io.cilium.k8s.policy.cluster=default
k8s:io.cilium.k8s.policy.serviceaccount=app1-account
k8s:io.kubernetes.pod.namespace=202004012306k8spolicytestbasictestchecksallkindofkubernetespoli
k8s:zgroup=testapp
1964 Disabled Disabled 21580 k8s:io.cilium.k8s.policy.cluster=default f00d::a0a:0:0:1302 10.10.0.192 ready
k8s:io.cilium.k8s.policy.serviceaccount=coredns
k8s:io.kubernetes.pod.namespace=kube-system
k8s:k8s-app=kube-dns
Stderr:
cmd: kubectl exec -n kube-system cilium-cdh9w -- cilium service list
Exitcode: 0
Stdout:
ID Frontend Service Type Backend
1 10.96.0.10:53 ClusterIP 1 => 10.10.0.192:53
2 10.96.0.10:9153 ClusterIP 1 => 10.10.0.192:9153
3 10.96.0.1:443 ClusterIP 1 => 192.168.36.11:6443
4 10.103.25.228:69 ClusterIP 1 => 10.10.0.158:69
2 => 10.10.0.146:69
5 10.103.25.228:80 ClusterIP 1 => 10.10.0.158:80
2 => 10.10.0.146:80
Stderr:
cmd: kubectl exec -n kube-system cilium-cdh9w -- cilium endpoint list
Exitcode: 0
Stdout:
ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS
ENFORCEMENT ENFORCEMENT
856 Disabled Disabled 4 reserved:health f00d::a0a:100:0:ed42 10.10.1.67 ready
Stderr:
===================== Exiting AfterFailed =====================
Deeper dive
In the cilium-426dw pod logs, it seems we were unable to allocate memory:
2020-04-01T23:05:55.590103474Z level=error msg="Command execution failed" cmd="[tc filter replace dev lxc7589f7ffe1d2 ingress prio 1 handle 1 bpf da obj 1964_next/bpf_lxc.o sec from-container]" error="exit status 1" subsys=datapath-loader
2020-04-01T23:05:55.591054639Z level=warning msg="Map object 'cilium_ct_any4_global' rejected: Cannot allocate memory (12)!" subsys=datapath-loader
2020-04-01T23:05:55.591102703Z level=warning msg=" - Type: 9" subsys=datapath-loader
2020-04-01T23:05:55.591141993Z level=warning msg=" - Identifier: 0" subsys=datapath-loader
2020-04-01T23:05:55.591178808Z level=warning msg=" - Pinning: 2" subsys=datapath-loader
2020-04-01T23:05:55.591215415Z level=warning msg=" - Size key: 14" subsys=datapath-loader
2020-04-01T23:05:55.59125154Z level=warning msg=" - Size value: 56" subsys=datapath-loader
2020-04-01T23:05:55.591289115Z level=warning msg=" - Max elems: 262144" subsys=datapath-loader
2020-04-01T23:05:55.59132558Z level=warning msg=" - Flags: 0" subsys=datapath-loader
2020-04-01T23:05:55.591360971Z level=warning subsys=datapath-loader
2020-04-01T23:05:55.591398127Z level=warning msg="Error loading maps into kernel!" subsys=datapath-loader
2020-04-01T23:05:55.591435091Z level=warning msg="Error fetching ELF ancillary data!" subsys=datapath-loader
2020-04-01T23:05:55.591472201Z level=warning msg="Unable to load program" subsys=datapath-loader
2020-04-01T23:05:55.59484751Z level=warning msg="JoinEP: Failed to load program" containerID=d1eb4eebc4 datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1964 error="Failed to load tc filter: exit status 1" file-path=1964_next/bpf_lxc.o identity=21580 ipv4=10.10.0.192 ipv6="f00d::a0a:0:0:1302" k8sPodName=kube-system/coredns-767d4c6dd7-f8js6 subsys=datapath-loader veth=lxc7589f7ffe1d2
PR #10724 but only affected the "kernel" ginkgo test job.
Jenkins link
test_results_Cilium-PR-Ginkgo-Tests-Kernel_97_BDD-Test-PR.zip
Saw
unable to allocate memorywhile invoking TC to load a BPF program. Did we just run out of memory on the node? What's different for the "Cilium-tests-with-kernel" target?Observed in the following tests:
K8sPolicyTest Basic Test checks all kind of Kubernetes policiesK8sDatapathConfig Encapsulation Check connectivity with VXLAN encapsulationStacktrace
Standard Output
Standard Error
Deeper dive
In the
cilium-426dwpod logs, it seems we were unable to allocate memory: