-
Notifications
You must be signed in to change notification settings - Fork 42.8k
Kube-proxy facing locking timeout in large clusters during load test with services enabled #48107
Copy link
Copy link
Closed
Labels
area/kube-proxykind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.Indicates that an issue or PR should not be auto-closed due to staleness.sig/networkCategorizes an issue or PR as relevant to SIG Network.Categorizes an issue or PR as relevant to SIG Network.sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.Categorizes an issue or PR as relevant to SIG Scalability.
Description
Follows from discussion in #48052
We noticed this while performing load test on 4000 node clusters with services enabled. The iptables restore step in the proxier fails with:
E0625 09:03:14.873338 5 proxier.go:1574] Failed to execute iptables-restore: failed to acquire old iptables lock: timed out waiting for the condition
And the reason quite likely is because of "huge" size of iptables (tens of MBs) as we run 30 pods per node and each pod is part of exactly one service
=> 30 * 4000 = 120k service endpoints (and these updates happen on all 4000 nodes)
cc @kubernetes/sig-network-misc @kubernetes/sig-scalability-misc @danwinship @wojtek-t
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/kube-proxykind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.Indicates that an issue or PR should not be auto-closed due to staleness.sig/networkCategorizes an issue or PR as relevant to SIG Network.Categorizes an issue or PR as relevant to SIG Network.sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.Categorizes an issue or PR as relevant to SIG Scalability.