Skip to content

Delay in processing HNS LB policies on kube-proxy start on Windows nodes results in unreachable services #109162

@LP0101

Description

@LP0101

What happened?

When starting windows nodes with a high number of HNS LB policies/rules on the cluster, there is a delay in processing them. This leaves services unreachable during the delay, which takes about half a minute per policy. This can be substatial given enough rules.

This occurs when restarting kube-proxy and rebooting the host. Once the system does reach a state where all the policylists are processed, incremental updates to the services are handled fine (ie. endpoint changes).

What did you expect to happen?

HNS policies should not cause a large delay for Windows nodes.

How can we reproduce it (as minimally and precisely as possible)?

With a large number of HNS policies in place, restart kube-proxy on a Windows node.

Anything else we need to know?

No response

Kubernetes version

Details
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.7", GitCommit:"1f86634ff08f37e54e8bfcd86bc90b61c98f84d4", GitTreeState:"clean", BuildDate:"2021-11-17T14:41:19Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.7", GitCommit:"f74784f1eaf1e02b651778d6ee2df1ae5ee729ae", GitTreeState:"clean", BuildDate:"2022-03-10T07:58:41Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Details Azure AKS

OS version

Details
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Details

Container runtime (CRI) and version (if applicable)

Details

Related plugins (CNI, CSI, ...) and versions (if applicable)

Details

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.sig/networkCategorizes an issue or PR as relevant to SIG Network.sig/windowsCategorizes an issue or PR as relevant to SIG Windows.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions