Skip to content

kvserver: log warning when lease transfer gets visible with a delay #95991

@tbg

Description

@tbg

Describe the problem

Transferring leases to unavailable/behind replicas causes outages. In many cases we investigate, when this happens the new leaseholder is waiting for a raft snapshot and until it receives the snapshot will not apply the log up to its lease, but just being behind on a long piece of raft log could essentially have the same effect (though the quota pool should prevent this but probably doesn't always).

When a lease transfer is applied by the recipient of the lease, and the proposed timestamp is (say) >500ms behind "now", the lease recipient should log. This would make it much easier to determine when a lease transfer might have caused a latency spike or period of unavailability.

I believe we could do all of this in this code:

func (r *Replica) leasePostApplyLocked(
ctx context.Context,
prevLease, newLease *roachpb.Lease,
priorReadSum *rspb.ReadSummary,
jumpOpt leaseJumpOption,
) {

Jira issue: CRDB-23884

Metadata

Metadata

Assignees

Labels

C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)O-supportWould prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docsT-kvKV Team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions