Skip to content

CI: K8sPolicyTest Basic Test checks all kind of Kubernetes policies: Found a JoinEP: Unable to allocate memory #10821

@joestringer

Description

@joestringer

PR #10724 but only affected the "kernel" ginkgo test job.
Jenkins link
test_results_Cilium-PR-Ginkgo-Tests-Kernel_97_BDD-Test-PR.zip

Saw unable to allocate memory while invoking TC to load a BPF program. Did we just run out of memory on the node? What's different for the "Cilium-tests-with-kernel" target?

Observed in the following tests:

Stacktrace

/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Kernel/k8s-1.17-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:357
Found a "JoinEP: " in Cilium Logs
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Kernel/k8s-1.17-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:546

Standard Output

⚠️  Found a "JoinEP: " in logs
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 3
⚠️  Number of "level=warning" in logs: 82
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
Top 5 errors/warnings:
Error while rewriting endpoint BPF program
Command execution failed
endpoint regeneration failed
Unable to flush Cilium iptables chain
Unable to delete Cilium iptables chain
Cilium pods: [cilium-426dw cilium-cdh9w]
Netpols loaded: 
CiliumNetworkPolicies loaded: 202004012306k8spolicytestbasictestchecksallkindofkubernetespoli::l7-policy 
Endpoint Policy Enforcement:
Pod                        Ingress   Egress
app1-68cb4f68c5-xqkfv                
app2-776cf4f9c6-6vjq2                
app3-65f9dc989c-nhxzd                
coredns-767d4c6dd7-f8js6             
app1-68cb4f68c5-frv2x                
Cilium agent 'cilium-426dw': Status: Ok  Health: Ok Nodes "" ContinerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 34 Failed 0
Cilium agent 'cilium-cdh9w': Status: Ok  Health: Ok Nodes "" ContinerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 14 Failed 0

Standard Error

STEP: Installing Cilium
STEP: Installing DNS Deployment
STEP: Restarting DNS Pods
STEP: Performing Cilium preflight check
STEP: Performing Cilium status preflight check
STEP: Performing Cilium controllers preflight check
STEP: Performing Cilium health check
STEP: Performing Cilium service preflight check
STEP: Performing K8s service preflight check
STEP: Waiting for cilium-operator to be ready
STEP: Waiting for kube-dns to be ready
STEP: Running kube-dns preflight check
STEP: Performing K8s service preflight check
STEP: Deleting namespace 202004012306k8spolicytestbasictestchecksallkindofkubernetespoli
STEP: Creating namespace 202004012306k8spolicytestbasictestchecksallkindofkubernetespoli
STEP: Testing L3/L4 rules
STEP: Testing L7 Policy
=== Test Finished at 2020-04-01T23:07:11Z====
===================== TEST FAILED =====================
cmd: kubectl get pods -o wide --all-namespaces
Exitcode: 0 
Stdout:
 	 NAMESPACE                                                         NAME                               READY   STATUS    RESTARTS   AGE     IP              NODE   NOMINATED NODE   READINESS GATES
	 202004012306k8spolicytestbasictestchecksallkindofkubernetespoli   app1-68cb4f68c5-frv2x              2/2     Running   0          46s     10.10.0.158     k8s1   <none>           <none>
	 202004012306k8spolicytestbasictestchecksallkindofkubernetespoli   app1-68cb4f68c5-xqkfv              2/2     Running   0          46s     10.10.0.146     k8s1   <none>           <none>
	 202004012306k8spolicytestbasictestchecksallkindofkubernetespoli   app2-776cf4f9c6-6vjq2              1/1     Running   0          46s     10.10.0.199     k8s1   <none>           <none>
	 202004012306k8spolicytestbasictestchecksallkindofkubernetespoli   app3-65f9dc989c-nhxzd              1/1     Running   0          46s     10.10.0.24      k8s1   <none>           <none>
	 kube-system                                                       cilium-426dw                       1/1     Running   0          98s     192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       cilium-cdh9w                       1/1     Running   0          98s     192.168.36.12   k8s2   <none>           <none>
	 kube-system                                                       cilium-operator-56b6658d98-k6ktd   1/1     Running   0          98s     192.168.36.12   k8s2   <none>           <none>
	 kube-system                                                       coredns-767d4c6dd7-f8js6           1/1     Running   0          83s     10.10.0.192     k8s1   <none>           <none>
	 kube-system                                                       etcd-k8s1                          1/1     Running   0          3m52s   192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       kube-apiserver-k8s1                1/1     Running   0          3m52s   192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       kube-controller-manager-k8s1       1/1     Running   0          3m52s   192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       kube-proxy-gzklg                   1/1     Running   0          2m19s   192.168.36.12   k8s2   <none>           <none>
	 kube-system                                                       kube-proxy-vx8dp                   1/1     Running   0          3m56s   192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       kube-scheduler-k8s1                1/1     Running   0          3m52s   192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       log-gatherer-bw68z                 1/1     Running   0          104s    192.168.36.12   k8s2   <none>           <none>
	 kube-system                                                       log-gatherer-tl4vm                 1/1     Running   0          104s    192.168.36.11   k8s1   <none>           <none>
	 kube-system                                                       registry-adder-7tjtv               1/1     Running   0          2m13s   192.168.36.12   k8s2   <none>           <none>
	 kube-system                                                       registry-adder-kg7dm               1/1     Running   0          2m13s   192.168.36.11   k8s1   <none>           <none>
	 
Stderr:
 	 

Fetching command output from pods [cilium-426dw cilium-cdh9w]
cmd: kubectl exec -n kube-system cilium-426dw -- cilium service list
Exitcode: 0 
Stdout:
 	 ID   Frontend           Service Type   Backend                   
	 1    10.96.0.1:443      ClusterIP      1 => 192.168.36.11:6443   
	 2    10.96.0.10:9153    ClusterIP      1 => 10.10.0.192:9153     
	 3    10.96.0.10:53      ClusterIP      1 => 10.10.0.192:53       
	 4    10.103.25.228:69   ClusterIP      1 => 10.10.0.158:69       
	                                        2 => 10.10.0.146:69       
	 5    10.103.25.228:80   ClusterIP      1 => 10.10.0.158:80       
	                                        2 => 10.10.0.146:80       
	 
Stderr:
 	 

cmd: kubectl exec -n kube-system cilium-426dw -- cilium endpoint list
Exitcode: 0 
Stdout:
 	 ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                                                       IPv6                 IPv4          STATUS   
	            ENFORCEMENT        ENFORCEMENT                                                                                                                                                       
	 232        Disabled           Disabled          17917      k8s:id=app3                                                                                       f00d::a0a:0:0:df56   10.10.0.24    ready   
	                                                            k8s:io.cilium.k8s.policy.cluster=default                                                                                                     
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=default                                                                                              
	                                                            k8s:io.kubernetes.pod.namespace=202004012306k8spolicytestbasictestchecksallkindofkubernetespoli                                              
	                                                            k8s:zgroup=testapp                                                                                                                           
	 341        Disabled           Disabled          4          reserved:health                                                                                   f00d::a0a:0:0:ce46   10.10.0.40    ready   
	 359        Disabled           Disabled          27845      k8s:appSecond=true                                                                                f00d::a0a:0:0:c256   10.10.0.199   ready   
	                                                            k8s:id=app2                                                                                                                                  
	                                                            k8s:io.cilium.k8s.policy.cluster=default                                                                                                     
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=app2-account                                                                                         
	                                                            k8s:io.kubernetes.pod.namespace=202004012306k8spolicytestbasictestchecksallkindofkubernetespoli                                              
	                                                            k8s:zgroup=testapp                                                                                                                           
	 1046       Enabled            Disabled          5731       k8s:id=app1                                                                                       f00d::a0a:0:0:7ba9   10.10.0.158   ready   
	                                                            k8s:io.cilium.k8s.policy.cluster=default                                                                                                     
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=app1-account                                                                                         
	                                                            k8s:io.kubernetes.pod.namespace=202004012306k8spolicytestbasictestchecksallkindofkubernetespoli                                              
	                                                            k8s:zgroup=testapp                                                                                                                           
	 1512       Enabled            Disabled          5731       k8s:id=app1                                                                                       f00d::a0a:0:0:2cce   10.10.0.146   ready   
	                                                            k8s:io.cilium.k8s.policy.cluster=default                                                                                                     
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=app1-account                                                                                         
	                                                            k8s:io.kubernetes.pod.namespace=202004012306k8spolicytestbasictestchecksallkindofkubernetespoli                                              
	                                                            k8s:zgroup=testapp                                                                                                                           
	 1964       Disabled           Disabled          21580      k8s:io.cilium.k8s.policy.cluster=default                                                          f00d::a0a:0:0:1302   10.10.0.192   ready   
	                                                            k8s:io.cilium.k8s.policy.serviceaccount=coredns                                                                                              
	                                                            k8s:io.kubernetes.pod.namespace=kube-system                                                                                                  
	                                                            k8s:k8s-app=kube-dns                                                                                                                         
	 
Stderr:
 	 

cmd: kubectl exec -n kube-system cilium-cdh9w -- cilium service list
Exitcode: 0 
Stdout:
 	 ID   Frontend           Service Type   Backend                   
	 1    10.96.0.10:53      ClusterIP      1 => 10.10.0.192:53       
	 2    10.96.0.10:9153    ClusterIP      1 => 10.10.0.192:9153     
	 3    10.96.0.1:443      ClusterIP      1 => 192.168.36.11:6443   
	 4    10.103.25.228:69   ClusterIP      1 => 10.10.0.158:69       
	                                        2 => 10.10.0.146:69       
	 5    10.103.25.228:80   ClusterIP      1 => 10.10.0.158:80       
	                                        2 => 10.10.0.146:80       
	 
Stderr:
 	 

cmd: kubectl exec -n kube-system cilium-cdh9w -- cilium endpoint list
Exitcode: 0 
Stdout:
 	 ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])   IPv6                   IPv4         STATUS   
	            ENFORCEMENT        ENFORCEMENT                                                                                    
	 856        Disabled           Disabled          4          reserved:health               f00d::a0a:100:0:ed42   10.10.1.67   ready   
	 
Stderr:
 	 

===================== Exiting AfterFailed =====================

Deeper dive

In the cilium-426dw pod logs, it seems we were unable to allocate memory:

2020-04-01T23:05:55.590103474Z level=error msg="Command execution failed" cmd="[tc filter replace dev lxc7589f7ffe1d2 ingress prio 1 handle 1 bpf da obj 1964_next/bpf_lxc.o sec from-container]" error="exit status 1" subsys=datapath-loader
2020-04-01T23:05:55.591054639Z level=warning msg="Map object 'cilium_ct_any4_global' rejected: Cannot allocate memory (12)!" subsys=datapath-loader
2020-04-01T23:05:55.591102703Z level=warning msg=" - Type:         9" subsys=datapath-loader
2020-04-01T23:05:55.591141993Z level=warning msg=" - Identifier:   0" subsys=datapath-loader
2020-04-01T23:05:55.591178808Z level=warning msg=" - Pinning:      2" subsys=datapath-loader
2020-04-01T23:05:55.591215415Z level=warning msg=" - Size key:     14" subsys=datapath-loader
2020-04-01T23:05:55.59125154Z level=warning msg=" - Size value:   56" subsys=datapath-loader
2020-04-01T23:05:55.591289115Z level=warning msg=" - Max elems:    262144" subsys=datapath-loader
2020-04-01T23:05:55.59132558Z level=warning msg=" - Flags:        0" subsys=datapath-loader
2020-04-01T23:05:55.591360971Z level=warning subsys=datapath-loader
2020-04-01T23:05:55.591398127Z level=warning msg="Error loading maps into kernel!" subsys=datapath-loader
2020-04-01T23:05:55.591435091Z level=warning msg="Error fetching ELF ancillary data!" subsys=datapath-loader
2020-04-01T23:05:55.591472201Z level=warning msg="Unable to load program" subsys=datapath-loader
2020-04-01T23:05:55.59484751Z level=warning msg="JoinEP: Failed to load program" containerID=d1eb4eebc4 datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1964 error="Failed to load tc filter: exit status 1" file-path=1964_next/bpf_lxc.o identity=21580 ipv4=10.10.0.192 ipv6="f00d::a0a:0:0:1302" k8sPodName=kube-system/coredns-767d4c6dd7-f8js6 subsys=datapath-loader veth=lxc7589f7ffe1d2

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/CIContinuous Integration testing issue or flake

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions