operator: add --leader-election-resource-lock-timeout flag by darox · Pull Request #44500 · cilium/cilium

darox · 2026-02-23T13:47:17Z

Add a new configurable flag --leader-election-resource-lock-timeout to
the cilium-operator that controls the HTTP client timeout used when
making API requests to acquire or renew the leader election resource
lock (Lease object in Kubernetes).

Problem:
The HTTP timeout for lease lock API calls is derived from the renew
deadline as max(1s, renewDeadline/2) by the upstream k8s client-go
resourcelock.NewFromKubeconfig() helper. With the default
--leader-election-renew-deadline of 10s, this yields a 5s HTTP timeout.
Users with high-latency control planes (e.g., worker nodes in a
different region than the remote control plane) frequently hit this
timeout, causing the operator to fail leader election with errors like:

error retrieving resource lock kube-system/cilium-operator-resource-lock:
net/http: request canceled while waiting for connection
(Client.Timeout exceeded while awaiting headers)

Previously, the only workaround was to increase
--leader-election-renew-deadline, which also changes the leader election
protocol timing semantics beyond just the HTTP timeout.

Solution:
Introduce --leader-election-resource-lock-timeout (type: duration,
default: 0) that directly controls the HTTP client timeout for lease
lock API requests, independently of the renew deadline.

When set to 0 (default), the existing behavior is preserved exactly:
timeout = max(1s, renewDeadline/2). When set to a positive duration,
that value is used directly as the HTTP client timeout.

Implementation details:

Added LeaderElectionResourceLockTimeout constant, struct field, and
viper binding following the same pattern as the existing
--leader-election-lease-duration, --leader-election-renew-deadline,
and --leader-election-retry-period flags.
Replaced the call to resourcelock.NewFromKubeconfig() with a manual
resource lock construction using resourcelock.New(). This gives us
control over the rest.Config.Timeout value passed to the Kubernetes
client used for leader election, while replicating the exact same
logic from the upstream helper (shallow copy of kubeconfig, user
agent annotation, NewForConfigOrDie).
The default behavior (timeout=0) faithfully reproduces the upstream
formula: timeout = max(1s, renewDeadline/2).

Usage example:
cilium-operator --leader-election-resource-lock-timeout=15s

Files changed:

operator/option/config.go: constant, struct field, Populate()
operator/cmd/flags.go: flag registration with BindEnv
operator/cmd/root.go: manual resource lock creation with timeout

maintainer-s-little-helper · 2026-02-23T13:47:20Z

Commit 85ee03c does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

darox · 2026-02-23T16:24:51Z

Closing as AI policy prohibits it.

sayboras · 2026-02-27T12:30:05Z

/test

Add a new configurable flag --leader-election-resource-lock-timeout to the cilium-operator that controls the HTTP client timeout used when making API requests to acquire or renew the leader election resource lock (Lease object in Kubernetes). Problem: The HTTP timeout for lease lock API calls is derived from the renew deadline as max(1s, renewDeadline/2) by the upstream k8s client-go resourcelock.NewFromKubeconfig() helper. With the default --leader-election-renew-deadline of 10s, this yields a 5s HTTP timeout. Users with high-latency control planes (e.g., worker nodes in a different region than the remote control plane) frequently hit this timeout, causing the operator to fail leader election with errors like: error retrieving resource lock kube-system/cilium-operator-resource-lock: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) Previously, the only workaround was to increase --leader-election-renew-deadline, which also changes the leader election protocol timing semantics beyond just the HTTP timeout. Solution: Introduce --leader-election-resource-lock-timeout (type: duration, default: 0) that directly controls the HTTP client timeout for lease lock API requests, independently of the renew deadline. When set to 0 (default), the existing behavior is preserved exactly: timeout = max(1s, renewDeadline/2). When set to a positive duration, that value is used directly as the HTTP client timeout. Implementation details: - Added LeaderElectionResourceLockTimeout constant, struct field, and viper binding following the same pattern as the existing --leader-election-lease-duration, --leader-election-renew-deadline, and --leader-election-retry-period flags. - Replaced the call to resourcelock.NewFromKubeconfig() with a manual resource lock construction using resourcelock.New(). This gives us control over the rest.Config.Timeout value passed to the Kubernetes client used for leader election, while replicating the exact same logic from the upstream helper (shallow copy of kubeconfig, user agent annotation, NewForConfigOrDie). - The default behavior (timeout=0) faithfully reproduces the upstream formula: timeout = max(1s, renewDeadline/2). Usage example: cilium-operator --leader-election-resource-lock-timeout=15s Files changed: - operator/option/config.go: constant, struct field, Populate() - operator/cmd/flags.go: flag registration with BindEnv - operator/cmd/root.go: manual resource lock creation with timeout Claude Opus 4.6 was used to assist in the development of this commit. Fixes: cilium#38144 Signed-off-by: darox <maderdario@gmail.com>

aanm · 2026-03-04T09:25:07Z

/test

darox · 2026-03-05T15:45:19Z

/test

darox · 2026-03-05T21:20:26Z

/test

maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Feb 23, 2026

darox force-pushed the add-configure-timeout-for-lease-lock-acquisition branch from 85ee03c to fb2016e Compare February 23, 2026 14:19

maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Feb 23, 2026

darox force-pushed the add-configure-timeout-for-lease-lock-acquisition branch 3 times, most recently from 63d4363 to 6710ef4 Compare February 23, 2026 15:23

darox closed this Feb 23, 2026

darox reopened this Feb 23, 2026

darox force-pushed the add-configure-timeout-for-lease-lock-acquisition branch 2 times, most recently from 990ceec to 603025c Compare February 26, 2026 14:15

darox marked this pull request as ready for review February 26, 2026 16:50

darox requested a review from a team as a code owner February 26, 2026 16:50

darox requested a review from nebril February 26, 2026 16:50

darox force-pushed the add-configure-timeout-for-lease-lock-acquisition branch from 603025c to a8cae63 Compare March 2, 2026 12:11

nebril approved these changes Mar 3, 2026

View reviewed changes

aanm added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Mar 4, 2026

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 4, 2026

aanm enabled auto-merge March 4, 2026 09:25

aanm added this pull request to the merge queue Mar 5, 2026

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 5, 2026

Merged via the queue into cilium:main with commit be7a8ec Mar 5, 2026
78 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operator: add --leader-election-resource-lock-timeout flag#44500

operator: add --leader-election-resource-lock-timeout flag#44500
aanm merged 1 commit intocilium:mainfrom
darox:add-configure-timeout-for-lease-lock-acquisition

darox commented Feb 23, 2026 •

edited

Loading

Uh oh!

maintainer-s-little-helper bot commented Feb 23, 2026

Uh oh!

darox commented Feb 23, 2026

Uh oh!

sayboras commented Feb 27, 2026

Uh oh!

aanm commented Mar 4, 2026

Uh oh!

darox commented Mar 5, 2026

Uh oh!

darox commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

darox commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maintainer-s-little-helper bot commented Feb 23, 2026

Uh oh!

darox commented Feb 23, 2026

Uh oh!

sayboras commented Feb 27, 2026

Uh oh!

aanm commented Mar 4, 2026

Uh oh!

darox commented Mar 5, 2026

Uh oh!

darox commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

darox commented Feb 23, 2026 •

edited

Loading