kv: log on excessive latch hold duration

We have logging for slow latch acquisitions due to conflicts, but not for when a request itself holds on to a latch for an excessive amount of time. This kind of logging would be helpful to avoid bugs and find unexpected slowness in the system which could cascade to other requests had there been contention.

Some notes:
- [ ] plumb cluster settings into `spanlatch.Manager`.
- [ ] add new `spanlatch/settings.go` file.
- [ ] in it, add new public cluster setting called `kv.concurrency.long_latch_hold_duration` (or something better). Give it a default value of 3s.
- [ ] add an `acquireTime time.Time` field to `spanlatch.Guard`.
- [ ] assign in `Manager.Acquire` and `Manager.WaitUntilAcquired` _after_ `wait` succeeds.
- [ ] consult in `Manager.Release` after releasing latches. If `acquireTime` is set and it was acquired more than `kv.concurrency.long_latch_hold_duration` ago, log a warning.
- [ ] put the warning behind a `log.Every(1 * time.Second)` to avoid log spam.
- [ ] make sure the warning explains some of the possible causes for long latch hold times. Also make sure it includes the request information which is added in https://github.com/cockroachdb/cockroach/issues/114601.

Jira issue: CRDB-33593

Epic CRDB-34227


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: log on excessive latch hold duration #114609

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kv: log on excessive latch hold duration #114609

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions