Bug report
General Information
- Cilium version (run
cilium version) - v1.6.8
- Kernel version (run
uname -a) - Linux ip-10-0-2-17.ec2.internal 4.15.0-1065-aws #69-Ubuntu SMP Thu Mar 26 02:17:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
- Orchestration system version in use (e.g.
kubectl version, Mesos, ...) Client Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.901-1+193562b7b2eb63", GitCommit:"193562b7b2eb631c16dd7c92fecbb59c19588590", GitTreeState:"clean", BuildDate:"2020-04-15T23:22:31Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.901", GitCommit:"773e693428e4ce02fbf24f55b538d56062934205", GitTreeState:"clean", BuildDate:"2020-01-28T23:49:04Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"} -- 1.15.9 (1.15.901 is our internal fork)
- Link to relevant artifacts (policies, deployments scripts, ...)
- Upload a system dump (run
curl -sLO https://github.com/cilium/cilium-sysdump/releases/latest/download/cilium-sysdump.zip && python cilium-sysdump.zip and then attach the generated zip file)
How to reproduce the issue
- deploy cilium in AWS ENI IPAM mode
- record the output of a
kubectl get cn -o yaml $CILIUMNODE, observe that the .Status.ENI field is filled with info about the ENI
- re-deploy the cilium-operator on a machine that does not have permissions to describe ENIs
- record the output of the same CiliumNode and observe that the
.Status.ENI is empty
I think this boils down to the following:
I am not sure what we can do with the current architecture of the code to catch this condition. Related to this I've filed #11036.
The impact in production for this is that when the operator restarted it wiped out the Status.ENI field of the existing CiliumNodes and we also upgraded the cilium-agent daemonset. This caused the agents to restart and we run them with the CLI option --write-cni-conf-when-ready=/host/etc/cni/net.d/cni.conflist but the agent was never ready because it did not have the required IPs available on the CiliumNode resource.
Bug report
General Information
cilium version) -v1.6.8uname -a) -Linux ip-10-0-2-17.ec2.internal 4.15.0-1065-aws #69-Ubuntu SMP Thu Mar 26 02:17:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linuxkubectl version, Mesos, ...)Client Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.901-1+193562b7b2eb63", GitCommit:"193562b7b2eb631c16dd7c92fecbb59c19588590", GitTreeState:"clean", BuildDate:"2020-04-15T23:22:31Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.901", GitCommit:"773e693428e4ce02fbf24f55b538d56062934205", GitTreeState:"clean", BuildDate:"2020-01-28T23:49:04Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}-- 1.15.9 (1.15.901 is our internal fork)curl -sLO https://github.com/cilium/cilium-sysdump/releases/latest/download/cilium-sysdump.zip && python cilium-sysdump.zipand then attach the generated zip file)How to reproduce the issue
kubectl get cn -o yaml $CILIUMNODE, observe that the.Status.ENIfield is filled with info about the ENI.Status.ENIis emptyI think this boils down to the following:
I am not sure what we can do with the current architecture of the code to catch this condition. Related to this I've filed #11036.
The impact in production for this is that when the operator restarted it wiped out the
Status.ENIfield of the existing CiliumNodes and we also upgraded the cilium-agent daemonset. This caused the agents to restart and we run them with the CLI option--write-cni-conf-when-ready=/host/etc/cni/net.d/cni.conflistbut the agent was never ready because it did not have the required IPs available on the CiliumNode resource.