Skip to content

cilium-cni gets deleted by delayed cni-uninstall.sh invocation #11828

@nebril

Description

@nebril

Bug report

How to reproduce the issue

  1. edit quick-install.yaml:
  • increase terminationGracePeriodSeconds to 120
  • change preStop command to command: ["sh", "-c", "sleep 60 && /cni-uninstall.sh"]
  • apply quick-install.yaml
    2 kubectl apply -f install/kubernetes/quick-install.yaml
  1. wait for Cilium pods to be running
  2. kubectl delete ds cilium -n kube-system && kubectl apply -f install/kubernetes/quick-install.yaml

This causes Cilium daemonset to be deleted and recreated. This allows for two Cilium pods - one terminating from deleted daemonset, one initializing from new daemonset to coexist at the same node and can cause cni-uninstall.sh to be called after cni-install.sh causing breakage.

The timing on this race is really tight and it would not manifest often, but it happens in our CI, where we delete-recreate daemonsets pretty often.

Tinkering with terminationGracePeriodSeconds and adding sleep ensures that it will be reproduced every time.

Metadata

Metadata

Assignees

Labels

kind/bugThis is a bug in the Cilium logic.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions