Skip to content

kv: log on excessive latch hold duration #114609

@nvb

Description

@nvb

We have logging for slow latch acquisitions due to conflicts, but not for when a request itself holds on to a latch for an excessive amount of time. This kind of logging would be helpful to avoid bugs and find unexpected slowness in the system which could cascade to other requests had there been contention.

Some notes:

  • plumb cluster settings into spanlatch.Manager.
  • add new spanlatch/settings.go file.
  • in it, add new public cluster setting called kv.concurrency.long_latch_hold_duration (or something better). Give it a default value of 3s.
  • add an acquireTime time.Time field to spanlatch.Guard.
  • assign in Manager.Acquire and Manager.WaitUntilAcquired after wait succeeds.
  • consult in Manager.Release after releasing latches. If acquireTime is set and it was acquired more than kv.concurrency.long_latch_hold_duration ago, log a warning.
  • put the warning behind a log.Every(1 * time.Second) to avoid log spam.
  • make sure the warning explains some of the possible causes for long latch hold times. Also make sure it includes the request information which is added in kv: include conflicting request information in latch manager traces/logs #114601.

Jira issue: CRDB-33593

Epic CRDB-34227

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-kv-observabilityA-kv-transactionsRelating to MVCC and the transactional model.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-kvKV Team

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions