Skip to content

cilium-operator overrides ENI status with empty data when deploying with invalid AWS creds #11052

@ungureanuvladvictor

Description

@ungureanuvladvictor

Bug report

General Information

  • Cilium version (run cilium version) - v1.6.8
  • Kernel version (run uname -a) - Linux ip-10-0-2-17.ec2.internal 4.15.0-1065-aws #69-Ubuntu SMP Thu Mar 26 02:17:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Orchestration system version in use (e.g. kubectl version, Mesos, ...) Client Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.901-1+193562b7b2eb63", GitCommit:"193562b7b2eb631c16dd7c92fecbb59c19588590", GitTreeState:"clean", BuildDate:"2020-04-15T23:22:31Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.901", GitCommit:"773e693428e4ce02fbf24f55b538d56062934205", GitTreeState:"clean", BuildDate:"2020-01-28T23:49:04Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"} -- 1.15.9 (1.15.901 is our internal fork)
  • Link to relevant artifacts (policies, deployments scripts, ...)
  • Upload a system dump (run curl -sLO https://github.com/cilium/cilium-sysdump/releases/latest/download/cilium-sysdump.zip && python cilium-sysdump.zip and then attach the generated zip file)

How to reproduce the issue

  • deploy cilium in AWS ENI IPAM mode
  • record the output of a kubectl get cn -o yaml $CILIUMNODE, observe that the .Status.ENI field is filled with info about the ENI
  • re-deploy the cilium-operator on a machine that does not have permissions to describe ENIs
  • record the output of the same CiliumNode and observe that the .Status.ENI is empty

I think this boils down to the following:

I am not sure what we can do with the current architecture of the code to catch this condition. Related to this I've filed #11036.

The impact in production for this is that when the operator restarted it wiped out the Status.ENI field of the existing CiliumNodes and we also upgraded the cilium-agent daemonset. This caused the agents to restart and we run them with the CLI option --write-cni-conf-when-ready=/host/etc/cni/net.d/cni.conflist but the agent was never ready because it did not have the required IPs available on the CiliumNode resource.

Metadata

Metadata

Assignees

Labels

area/operatorImpacts the cilium-operator componentintegration/cloudRelated to integration with cloud environments such as AKS, EKS, GKE, etc.kind/bugThis is a bug in the Cilium logic.priority/highThis is considered vital to an upcoming release.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions