AKS: Clean up azure-vnet state on Cilium agent start#14452
AKS: Clean up azure-vnet state on Cilium agent start#14452gandro merged 3 commits intocilium:masterfrom
Conversation
christarazi
left a comment
There was a problem hiding this comment.
Very clean commits and readable explanations in the commits, well done! The changes look good to me, but clearly there's a lot of context here which I'm not fully on top of, so I'll defer to others for closer inspection.
twpayne
left a comment
There was a problem hiding this comment.
I currently don't have enough understanding of Azure's networking to fully understand this PR, but the code and commits as-is look excellent.
qmonnet
left a comment
There was a problem hiding this comment.
Let me join the club of the reviewers unfamiliar with Azure. Code looks good!
3d8083e to
c4deff1
Compare
|
Addressed comments and rebased. 👍 Also removed |
8528be8 to
7c6a967
Compare
|
Looks like the docs job is not happy with the changes. |
7c6a967 to
08a3e5e
Compare
|
This should be ready @aanm PTAL 🙏 |
|
@ti-mo please add the labels for which this PR needs to be backported as well as the proper release label. |
Commit 0b70117 ("AKS: Fix dynamic reconfiguration of bridge mode") and preceding commits would modify an internal state file of the azure-vnet CNI plugin. This is potentially dangerous, because: - azure-vnet uses a lockfile to coordinate access to this state file between independent invocations. - The format of the state file is not controlled by us and subject to change. - How this state file is used can change between versions of azure-vnet. After investigating some reports of connectivity issues, it turns out that: - Modifying the state file directly isn't necessary. azure-vnet obeys the mode it's given in its CNI configuration, regardless of the mode recorded in the state file. - Changing the mode directly _in the state file_ from `bridge` to `transparent` causes azure-vnet to forget cleaning up static ARP entries for deleted Pods that were previously created in bridge mode before Cilium was deployed. This can cause addresses of Pods previously scheduled on the node to become unroutable on said node. - Waiting for the state file to appear causes a deadlock on newly-created nodes in a cluster where Cilium is already running, for example, when scaling out a node pool. azure-vnet only creates the state file when it gets its first CNI `ADD` event, which will never happen with the CNI configurations Cilium installs. Signed-off-by: Timo Beckers <timo@isovalent.com>
…ghbors This aims to address cilium#14233 ("Pods scheduled before Cilium don't get state cleaned up") until a better solution can be implemented for AKS, ideally by gaining support for deploying Cilium directly by Azure. Currently, when Cilium is deployed with Azure IPAM enabled, all workload Pod connectivity is interrupted until the Pod is re-scheduled. Signed-off-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Timo Beckers <timo@isovalent.com>
08a3e5e to
d91c7f7
Compare
Fixes #12113
Fixes #14233
Rationale is documented in the commit messages. We need something to move AKS support forward, as there are multiple breaking issues in 1.9.
@seanmwinn Should a Helm value be added to disable the rm/ebtables/ip neigh path?
@aanm Should we document the fact that deploying with Azure IPAM disconnects all workload Pods in the cluster? (getting rid of this behaviour would be a requirement for declaring Azure IPAM as GA, so this might not be necessary)