Skip to content

raftstore: enhance the detection to cover I/O jitters on kvdb.#18439

Merged
ti-chi-bot[bot] merged 6 commits intotikv:masterfrom
LykxSassinator:opt_kvdb_io_jitters_detect
May 22, 2025
Merged

raftstore: enhance the detection to cover I/O jitters on kvdb.#18439
ti-chi-bot[bot] merged 6 commits intotikv:masterfrom
LykxSassinator:opt_kvdb_io_jitters_detect

Conversation

@LykxSassinator
Copy link
Contributor

@LykxSassinator LykxSassinator commented May 6, 2025

What is changed and how it works?

Issue Number: Close #18463

What's Changed:

In previous work #17801, TiKV has introduced the detection mechanism for kvdb disk to detect I/O hang issues.

However, recent customer feedback highlighted the need to extend detection coverage to I/O jitters, ensuring TiKV can automatically recover from abnormal states caused by KVDB I/O jitters.

Therefore, this ticket is built to tracks the development efforts to enhance TiKV’s I/O jitter detection and recovery mechanism. And the majority parts of this change are listed as followings show:

  • for configurations:
    • raftstore.inspect_kvdb_interval: 2s -> 100ms
  • for detection mechanism on kvdb:
    • SlowScore::ratio_thresh:60% -> 10%
Enhances the detection mechanism to cover the I/O jitters on kvdb disk if deploys with separated mount paths.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code
Workloads v8.5.1 With this PR
Special workloads image image
tpcc 1k warehouses image image
Sysbench - oltp_read_write image image

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

Enhances the detection mechanism to cover the I/O jitters on kvdb disk if deploys with separated mount paths.

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented May 6, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 6, 2025
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@ti-chi-bot ti-chi-bot bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 13, 2025
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@LykxSassinator LykxSassinator changed the title kvdb: able to detect I/O jitters. raftstore: enhance the detection to cover I/O jitters on kvdb. May 19, 2025
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/needs-linked-issue do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 19, 2025
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 19, 2025
@LykxSassinator LykxSassinator marked this pull request as ready for review May 20, 2025 05:55
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 20, 2025
@LykxSassinator
Copy link
Contributor Author

/retest

Copy link
Member

@hbisheng hbisheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Looks like the PR is mainly about tuning the params:

  • inspect_kvdb_interval: 2s -> 100ms
  • ratio_thresh:60% -> 10%

These changes will significantly increase the sensitivity of the KV IO jitter detection. This makes sense if we want to catch jitter as short as 100ms, though the trade-off is a higher risk of false positives.

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 20, 2025
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@ti-chi-bot ti-chi-bot bot added the lgtm label May 22, 2025
@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 22, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented May 22, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hbisheng, overvenus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label May 22, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented May 22, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-05-20 06:41:41.439145219 +0000 UTC m=+71138.540312427: ☑️ agreed by hbisheng.
  • 2025-05-22 15:30:48.403816166 +0000 UTC m=+106352.820875728: ☑️ agreed by overvenus.

@ti-chi-bot ti-chi-bot bot merged commit 6ebcdd9 into tikv:master May 22, 2025
8 checks passed
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone May 22, 2025
@LykxSassinator LykxSassinator added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label May 27, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #18481.

ti-chi-bot bot pushed a commit that referenced this pull request Jul 9, 2025
… (#18481)

close #18463

Enhances the detection mechanism to cover the I/O jitters on kvdb disk if deploys with separated mount paths.

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
@LykxSassinator LykxSassinator added needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. labels Jul 18, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #18722.

ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this pull request Jul 18, 2025
close tikv#18463

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #18723.
But this PR has conflicts, please resolve them!

ti-chi-bot bot pushed a commit that referenced this pull request Jul 23, 2025
… (#18723)

close #18463

Enhances the detection mechanism to cover the I/O jitters on kvdb disk if deploys with separated mount paths.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
okJiang pushed a commit to okJiang/tikv that referenced this pull request Oct 13, 2025
…18439)

close tikv#18463

Enhances the detection mechanism to cover the I/O jitters on kvdb disk if deploys with separated mount paths.

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: okjiang <819421878@qq.com>
okJiang pushed a commit to okJiang/tikv that referenced this pull request Oct 27, 2025
…18439) (tikv#18723)

close tikv#18463

Enhances the detection mechanism to cover the I/O jitters on kvdb disk if deploys with separated mount paths.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
okJiang pushed a commit to okJiang/tikv that referenced this pull request Oct 28, 2025
…18439) (tikv#18723)

close tikv#18463

Enhances the detection mechanism to cover the I/O jitters on kvdb disk if deploys with separated mount paths.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: okjiang <819421878@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HA: inspect the I/O jitters on kvdb if using separate mount paths.

4 participants