Skip to content

kvserver: lease maintenance scheduler #98433

@erikgrinaker

Description

@erikgrinaker

We need a scheduler that eagerly maintains range leases. This should:

There are two motivations for this

  1. We don't want lease acquisition to be lazily driven by client requests.

    1. It can prevent ranges from ever acquiring a lease under certain failure modes, e.g. because the client request times out before the lease acquisition succeeds under disk stalls (see kvserver: disk stall prevents lease transfer #81100 and kvserver: persistent outage when liveness leaseholder deadlocks #80713).
    2. It adds unnecessary latency, e.g. in the case where a scan across many ranges has to sequentially acquire leases for each range, especially under failure modes such as network outages or disk stalls where the lease acquisition has to wait for network timeouts.
  2. When only using expiration leases, we want to ensure ranges have a lease even when there is no traffic on the ranges, to avoid lease acquisition latencies on the next request to the range.

A few important aspects to consider:

  • Some ranges are more important that others. In particular, the meta and liveness range leases must get priority.
  • Expiration lease extensions are currently expensive (one Raft write per extension per range). We may want to allow very cold ranges to let their leases expire after some period of inactivity (e.g. minutes).
  • There may not be any point in quiescing ranges where we eagerly extend expiration leases (see kvserver: explore quiescence removal #94592 and kvserver: don't quiesce ranges with expiration-based leases #94454).
  • We already have other scheduler infrastructure that we could piggyback on: the Raft scheduler, and the queue infrastructure (e.g. a new lease queue).
  • We should try to honour lease preferences.

We have a few similar mechanisms already, that should be mostly be replaced by this scheduler:

  • The replicate queue acquires leases for ranges that doesn't have one.

    // TODO(kvoli): This check should fail if not the leaseholder. In the case
    // where we want to use the replicate queue to acquire leases, this should
    // occur before planning or as a result. In order to return this in
    // planning, it is necessary to simulate the prior change having succeeded
    // to then plan this lease transfer.
    if _, pErr := repl.redirectOnOrAcquireLease(ctx); pErr != nil {
    return change, pErr.GoError()
    }

  • Store.startLeaseRenewer() eagerly renews the expiration leases on the meta and liveness ranges to avoid high tail latencies.

    // startLeaseRenewer runs an infinite loop in a goroutine which regularly
    // checks whether the store has any expiration-based leases that should be
    // proactively renewed and attempts to continue renewing them.
    //
    // This reduces user-visible latency when range lookups are needed to serve a
    // request and reduces ping-ponging of r1's lease to different replicas as
    // maybeGossipFirstRange is called on each (e.g. #24753).
    func (s *Store) startLeaseRenewer(ctx context.Context) {
    // Start a goroutine that watches and proactively renews certain
    // expiration-based leases.

  • Replica.maybeExtendLeaseAsyncLocked() will extend an expiration lease when processing a request in the last half of the lease interval.

    func (r *Replica) maybeExtendLeaseAsyncLocked(ctx context.Context, st kvserverpb.LeaseStatus) {
    // Check shouldExtendLeaseRLocked again, because others may have raced to
    // extend the lease and beaten us here after we made the determination
    // (under a shared lock) that the extension was needed.
    if !r.shouldExtendLeaseRLocked(st) {
    return
    }
    if log.ExpensiveLogEnabled(ctx, 2) {
    log.Infof(ctx, "extending lease %s at %s", st.Lease, st.Now)
    }
    // We explicitly ignore the returned handle as we won't block on it.
    //
    // TODO(tbg): this ctx is likely cancelled very soon, which will in turn
    // cancel the lease acquisition (unless joined by another more long-lived
    // ctx). So this possibly isn't working as advertised (which only plays a role
    // for expiration-based leases, at least).
    _ = r.requestLeaseLocked(ctx, st)
    }

Jira issue: CRDB-25265

Epic CRDB-25207

Metadata

Metadata

Assignees

Labels

A-kvAnything in KV that doesn't belong in a more specific category.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-kvKV Team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions