Skip to content

storeliveness: pace liveness heartbeats #148210

@tbg

Description

@tbg

In large clusters, this code can cause goroutines to pile up:

See

successes := 0
for _, msg := range heartbeats {
if sent := sm.sender.SendAsync(ctx, msg); sent {
successes++
} else {
log.Warningf(ctx, "failed to send heartbeat to store %+v", msg.To)
}
}

This loop is O(stores in the cluster).

See https://docs.google.com/document/d/1akok3TFngDS7IRdJSBKozRXhux01KT1GoLDPEkG4eho/edit?tab=t.izqcnllg524u for an example execution trace investigation.

Jira issue: CRDB-51478

Epic CRDB-52413

Metadata

Metadata

Assignees

Labels

A-leader-leasesRelated to the introduction of leader leasesC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)E-starterMight be suitable for a starter project for new employees or team members.O-25.2.1-scale-testingP-3Issues/test failures with no fix SLAT-kvKV Teamv26.1.0-prerelease

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions