Skip to content

Raft: Increase max retry times to avoid too large remote requests#10301

Merged
ti-chi-bot[bot] merged 1 commit intopingcap:masterfrom
JaySon-Huang:reduce_large_retry
Jul 14, 2025
Merged

Raft: Increase max retry times to avoid too large remote requests#10301
ti-chi-bot[bot] merged 1 commit intopingcap:masterfrom
JaySon-Huang:reduce_large_retry

Conversation

@JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Jul 11, 2025

What problem does this PR solve?

Issue Number: close #10300

Problem Summary:

In DAGStorageInterpreter::buildLocalStreamsForPhysicalTable we need to confirm the regions' keyranges not changed and regions are not removed before the storage snapshot is built. Otherwise the query result is incorrect.

// After getting streams from storage, we need to validate whether Regions have changed or not after learner read.
// (by calling `validateQueryInfo`). In case the key ranges of Regions have changed (Region merge/split), those `streams`
// may contain different data other than expected.
validateQueryInfo(*query_info.mvcc_query_info, learner_read_snapshot, tmt, log);

Building storage snapshot usually takes about a hundreds ms to few seconds. But the max retry time is only 1. If retry reach the max time, then all the Regions will be fetch though "RemoteRead". When querying a hot write large table, it could fail more than 1. And because the table is large, query contains large number of Regions, retrying all Regions in "RemoteRead" way could takes lots of network bandwidth and extra CPU, leading to TiFlash unstable.

What is changed and how it works?

Raft: Increase max retry times to avoid too large remote requests
  * Increase the max retry number between LearnerRead and acquiring snapshot from the storage layer by the number of query regions

Other changes:

  • Avoid printing ann_query_info
  • Enrich the error message when ColumnFilePersistedSet::installCompactionResults meet unexpected exception.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Increase the maximum retry count when acquiring storage snapshots to improve query stability for large tables

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-triage-completed release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. and removed do-not-merge/needs-triage-completed release-note-none Denotes a PR that doesn't merit a release note. labels Jul 11, 2025
Copy link
Member

@CalvinNeo CalvinNeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 11, 2025
@JaySon-Huang
Copy link
Contributor Author

/retest-required

Signed-off-by: JaySon-Huang <tshent@qq.com>
Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 11, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CalvinNeo, windtalker

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [CalvinNeo,windtalker]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 11, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 11, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-07-11 07:19:02.551283121 +0000 UTC m=+2243395.274462123: ☑️ agreed by CalvinNeo.
  • 2025-07-11 11:28:18.161119055 +0000 UTC m=+2258350.884298023: ☑️ agreed by windtalker.

@JaySon-Huang
Copy link
Contributor Author

/test pull-integration-test

@ti-chi-bot ti-chi-bot bot merged commit bc743c9 into pingcap:master Jul 14, 2025
7 checks passed
ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Jul 14, 2025
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #10307.

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Jul 14, 2025
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #10308.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #10309.
But this PR has conflicts, please resolve them!

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Jul 14, 2025
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@JaySon-Huang JaySon-Huang deleted the reduce_large_retry branch July 14, 2025 07:16
ti-chi-bot bot pushed a commit that referenced this pull request Jul 15, 2025
…0301) (#10307)

close #10300

Raft: Increase max retry times to avoid too large remote requests
  * Increase the max retry number between LearnerRead and acquiring snapshot from the storage layer by the number of query regions

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: JaySon-Huang <tshent@qq.com>

Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: JaySon-Huang <tshent@qq.com>
ti-chi-bot bot pushed a commit that referenced this pull request Jul 16, 2025
…0301) (#10308)

close #10300

Raft: Increase max retry times to avoid too large remote requests
  * Increase the max retry number between LearnerRead and acquiring snapshot from the storage layer by the number of query regions

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: JaySon-Huang <tshent@qq.com>

Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: JaySon-Huang <tshent@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

There could be large amount of remote request when querying on a large and hot-write table

4 participants