Skip to content

Configurable Replica Read Timeout with Retry #44771

@Tema

Description

@Tema

Configurable Replica Read Timeout with Retry Feature Request

Is your feature request related to a problem? Please describe:
One of common problems running TiDB in the cloud on network attached disks (Amazon EBS, Google PD or Azure managed disks) is temporary elevated disk IO latency. This can happen if a cloud provider storage node fails and goes through a repair procedure. During the repair phase a network attached disk exhibits 100ms or even single digit second latency vs single digit millisecond latency under the normal conditions.

Describe the feature you'd like:
If TiDB customer uses Follower Read or Stale Read feature, it is possible to retry a request initially landed on the TiKV node with network disk exhibiting elevated latency on the other TiKV replica. While retry policy already exists in tikv go-client, the default network timeout is 10s of seconds.

OLTP workload on TiDB could leverage an introduction of system variable tidb_tikv_read_timeout which then could be passed as a context timeout for TiKV requests made by TiDB layer and rely on existing selector logic to retry requests on other replicas. The implementation of this feature needs also to take care of the following:

Describe alternatives you've considered:
TiDB already has a max_execution_timeout system variable, but it is not used as a context deadline in go-client to network calls from TiDB to TiKV. Moreover, if TiKV request takes longer than max_execution_timeout, then the session is marked as killed and retry won’t happen.

Teachability, Documentation, Adoption, Migration Strategy:
The feature would be fully controlled by session variable tidb_tikv_read_timeout.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/feature-requestCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions