-
Notifications
You must be signed in to change notification settings - Fork 438
Leader election does not resign mastership on shutdown #3789
Copy link
Copy link
Closed
Description
The leader election fails to resign leadership during a graceful shutdown. When the application terminates, it passes a canceled context.Context to election.Close(). The resignation logic sees the canceled context and therefore aborts the call, preventing the leadership lease from being released.
Consequently, other nodes must wait for the lease to time out before electing a new leader, creating an unnecessary failover delay that impacts service availability. This issue arises primarily in the Kubernetes
Lease implementation but has also been observed errors with etcd.
Etcd:
W0611 12:28:50.702696 1 process.go:39] Signal received: terminated
I0611 12:28:50.702969 1 main.go:313] listenKeepAliveRsp canceled: context canceled
E0611 12:28:50.702976 1 runner.go:144] 1836655027542233688: no longer the master!
I0611 12:28:50.702985 1 operation_manager.go:333] Log operation manager shutting down
E0611 12:28:50.702992 1 runner.go:119] 1836655027542233688: context canceled
I0611 12:28:50.702996 1 operation_manager.go:353] wait for termination of election runners...
I0611 12:28:50.702996 1 runner.go:111] 1836655027542233688: shutdown election-monitoring loop
I0611 12:28:50.703005 1 main.go:171] Stopping HTTP server...
I0611 12:28:50.703052 1 election.go:82] 1836655027542233688: canceled mastership context
{"level":"warn","ts":"2025-06-11T12:28:50.703048Z","logger":"etcd-client","caller":"v3@v3.5.19/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000640d20/example-client.default.svc:2379","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
E0611 12:28:50.703118 1 election.go:120] 1836655027542233688: Resign(): context canceled
K8s:
W0611 11:30:37.984180 1 process.go:39] Signal received: terminated
E0611 11:30:37.984268 1 runner.go:144] 1836655027542233688: no longer the master!
E0611 11:30:37.984299 1 runner.go:119] 1836655027542233688: context canceled
I0611 11:30:37.984304 1 runner.go:111] 1836655027542233688: shutdown election-monitoring loop
I0611 11:30:37.984337 1 main.go:216] Stopping RPC server...
I0611 11:30:37.984414 1 operation_manager.go:333] Log operation manager shutting down
I0611 11:30:37.984426 1 operation_manager.go:353] wait for termination of election runners...
W0611 11:30:37.984416 1 runner.go:113] 1836655027542233688: election.Close: failed to release lock: client rate limiter Wait returned an error: context canceled
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels