Skip to content

Kube-proxy facing locking timeout in large clusters during load test with services enabled #48107

@shyamjvs

Description

@shyamjvs

Follows from discussion in #48052

We noticed this while performing load test on 4000 node clusters with services enabled. The iptables restore step in the proxier fails with:

E0625 09:03:14.873338       5 proxier.go:1574] Failed to execute iptables-restore: failed to acquire old iptables lock: timed out waiting for the condition

And the reason quite likely is because of "huge" size of iptables (tens of MBs) as we run 30 pods per node and each pod is part of exactly one service
=> 30 * 4000 = 120k service endpoints (and these updates happen on all 4000 nodes)

cc @kubernetes/sig-network-misc @kubernetes/sig-scalability-misc @danwinship @wojtek-t

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/kube-proxykind/bugCategorizes issue or PR as related to a bug.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.sig/networkCategorizes an issue or PR as relevant to SIG Network.sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions