Skip to content

kv: follower reads can fail due to stale cached range descriptors #44053

@ajwerner

Description

@ajwerner

Describe the problem

Follower reads relies on having an up-to-date range descriptor. Currently our mechanisms to evict range descriptors are based primarily on key mismatches or failure to find the leaseholder. Unfortunately if our cached descriptor contains the leaseholder and we have an up-to-date leaseholder cache we'll never evict from our descriptor cache.

The code which handles the RangeNotFoundError here notes that the failure to find the replica could be due to the replica not yet having been initialized and that eviction could lead to thrashing.

case *roachpb.RangeNotFoundError:
// The store we routed to doesn't have this replica. This can happen when
// our descriptor is outright outdated, but it can also be caused by a
// replica that has just been added but needs a snapshot to be caught up.

The outcome is that sometimes follower reads will always hit a node which does not contain the range and then go on to hit the next closest follower.

To Reproduce

This needs reproduction steps.

Expected behavior

Follower reads should eventually occur.

Metadata

Metadata

Labels

A-kvAnything in KV that doesn't belong in a more specific category.A-telemetryC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions