Skip to content

Leader election does not resign mastership on shutdown #3789

@osmman

Description

@osmman

The leader election fails to resign leadership during a graceful shutdown. When the application terminates, it passes a canceled context.Context to election.Close(). The resignation logic sees the canceled context and therefore aborts the call, preventing the leadership lease from being released.

Consequently, other nodes must wait for the lease to time out before electing a new leader, creating an unnecessary failover delay that impacts service availability. This issue arises primarily in the Kubernetes
Lease implementation but has also been observed errors with etcd.

Etcd:

W0611 12:28:50.702696       1 process.go:39] Signal received: terminated
I0611 12:28:50.702969       1 main.go:313] listenKeepAliveRsp canceled: context canceled
E0611 12:28:50.702976       1 runner.go:144] 1836655027542233688: no longer the master!
I0611 12:28:50.702985       1 operation_manager.go:333] Log operation manager shutting down
E0611 12:28:50.702992       1 runner.go:119] 1836655027542233688: context canceled
I0611 12:28:50.702996       1 operation_manager.go:353] wait for termination of election runners...
I0611 12:28:50.702996       1 runner.go:111] 1836655027542233688: shutdown election-monitoring loop
I0611 12:28:50.703005       1 main.go:171] Stopping HTTP server...
I0611 12:28:50.703052       1 election.go:82] 1836655027542233688: canceled mastership context
{"level":"warn","ts":"2025-06-11T12:28:50.703048Z","logger":"etcd-client","caller":"v3@v3.5.19/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000640d20/example-client.default.svc:2379","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
E0611 12:28:50.703118       1 election.go:120] 1836655027542233688: Resign(): context canceled

K8s:

W0611 11:30:37.984180       1 process.go:39] Signal received: terminated
E0611 11:30:37.984268       1 runner.go:144] 1836655027542233688: no longer the master!
E0611 11:30:37.984299       1 runner.go:119] 1836655027542233688: context canceled
I0611 11:30:37.984304       1 runner.go:111] 1836655027542233688: shutdown election-monitoring loop
I0611 11:30:37.984337       1 main.go:216] Stopping RPC server...
I0611 11:30:37.984414       1 operation_manager.go:333] Log operation manager shutting down
I0611 11:30:37.984426       1 operation_manager.go:353] wait for termination of election runners...
W0611 11:30:37.984416       1 runner.go:113] 1836655027542233688: election.Close: failed to release lock: client rate limiter Wait returned an error: context canceled

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions