Skip to content

raftstore: calculate the slow score by considering individual disk performance factors#17801

Merged
ti-chi-bot[bot] merged 23 commits intotikv:masterfrom
LykxSassinator:detect_kvdb_slow
Nov 29, 2024
Merged

raftstore: calculate the slow score by considering individual disk performance factors#17801
ti-chi-bot[bot] merged 23 commits intotikv:masterfrom
LykxSassinator:detect_kvdb_slow

Conversation

@LykxSassinator
Copy link
Contributor

@LykxSassinator LykxSassinator commented Nov 11, 2024

What is changed and how it works?

Issue Number: Close #17884

What's Changed:

The current SlowScore mechanism of TiKV can detect and handle the IO latency issues of Raft disks. As an improvement, the write latency of KV disks can also be taken into account to identify abnormal situations of KV disks.
image

As the above designs, this pr introduce an extra and individual inspector, triggered by inspect-kvdb-interval, to inspect the I/O latency of kvdb if deployed with separate mount path. And the inspector will periodically sends the inspecting latency to SlowScore algorithm, to make it can detect whether there exists disk I/O hung issues on kvdb disk.

Additionally, to mitigate the effects of complex foreground and background I/O operations triggered by RocksDB, the inspector simply writes a string to a designated file and records the time cost for this operation, logging it as the apply_process_duration. And by testing, it's proved to be valid and more accurate than directly recording the duration of applying on RocksDB.

Moreover, if raft-engine and kvdb uses the same mount path when deploying, this newly introduced inspector will not be created to make the inspecting of disk health triggered by inspect-inerval as previous work does.

This pr introduces an extra and individual inspector to detect whether there exists I/O hung issues on kvdb disk, if
the kvdb is deployed with a separate mount path.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

And the following example shows that this mechanism make senses when injecting I/O delays to kvdb disk, using tpcc-1k workloads:
image

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

This pr introduces an extra and individual inspector to detect whether there exists I/O hung issues on kvdb disk, if
the kvdb is deployed with a separate mount path.

…isk performance factors.

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Nov 11, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 11, 2024
@LykxSassinator
Copy link
Contributor Author

/cc @hbisheng ptal, thx

@ti-chi-bot ti-chi-bot bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 11, 2024
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 12, 2024
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@LykxSassinator LykxSassinator changed the title [WIP] raftstore: calculate the slow score by considering individual d… [WIP] raftstore: calculate the slow score by considering individual disk performance factors Nov 25, 2024
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/needs-linked-issue do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 25, 2024
@ti-chi-bot ti-chi-bot bot merged commit 43e63b5 into tikv:master Nov 29, 2024
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Nov 29, 2024
@LykxSassinator LykxSassinator added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Dec 2, 2024
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this pull request Dec 2, 2024
close tikv#17884

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #17912.

ti-chi-bot bot pushed a commit that referenced this pull request Dec 2, 2024
…rformance factors (#17801) (#17912)

close #17884

This pr introduces an extra and individual inspector to detect whether there exists I/O hung issues on kvdb disk, if
the kvdb is deployed with a separate mount path.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Dec 2, 2024
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #17913.

ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this pull request Dec 2, 2024
close tikv#17884

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit that referenced this pull request Dec 2, 2024
…individual disk performance factors.(#17801) (#17901)

close #17884

This pr introduces an extra and individual inspector to detect whether there exists I/O hung issues on kvdb disk, if
the kvdb is deployed with a separate mount path.

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@LykxSassinator LykxSassinator added the needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. label Dec 11, 2024
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this pull request Dec 11, 2024
close tikv#17884

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #17980.

ti-chi-bot bot pushed a commit that referenced this pull request Dec 11, 2024
…rformance factors (#17801) (#17980)

close #17884

This pr introduces an extra and individual inspector to detect whether there exists I/O hung issues on kvdb disk, if
the kvdb is deployed with a separate mount path.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
okJiang added a commit to okJiang/tikv that referenced this pull request Oct 13, 2025
…idering individual disk performance factors.(tikv#17801) (tikv#17901)"

This reverts commit 8b006a5.

Signed-off-by: okjiang <819421878@qq.com>
okJiang pushed a commit to okJiang/tikv that referenced this pull request Oct 13, 2025
…rformance factors (tikv#17801)

close tikv#17884

This pr introduces an extra and individual inspector to detect whether there exists I/O hung issues on kvdb disk, if
the kvdb is deployed with a separate mount path.

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: Bisheng Huang <hbisheng@gmail.com>
Signed-off-by: okjiang <819421878@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HA: inspect the health of kvdb if using separate mount paths on kvdb and raft-engine.

4 participants