raftstore: optimize entry cache eviction to prevent append rejections under memory pressure#17488
raftstore: optimize entry cache eviction to prevent append rejections under memory pressure#17488ti-chi-bot[bot] merged 3 commits intotikv:masterfrom
Conversation
|
Hi @hhwyt. Thanks for your PR. I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
8d5060a to
d4144c5
Compare
|
/cc @glorv @Connor1996 @hicqu @zhangjinpeng87 PTAL. |
|
/pull-unit-test |
|
/retest |
|
@hhwyt: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
[LGTM Timeline notifier]Timeline:
|
|
/retest |
…ntries Non-dangle entries are copied from EntryCache, so their memory usage should also be included in the accounting. Signed-off-by: hhwyt <hhwyt1@gmail.com>
|
@hicqu @LykxSassinator PTAL. Added a new commit to address #17537. |
|
@hhwyt: Your PR was out of date, I have automatically updated it for you. If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: glorv, hicqu, LykxSassinator The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
In response to a cherrypick label: new pull request created to branch |
|
In response to a cherrypick label: new pull request created to branch |
… under memory pressure (tikv#17488) Signed-off-by: hhwyt <hhwyt1@gmail.com>
… under memory pressure (#17488) (#17792) close #17392 This PR addresses the issue where, under memory pressure, a follower directly rejects append requests instead of first attempting to free up memory by evicting the entry cache. One potential solution is for the follower, before rejecting append requests, to notify other peers on the same node (including leaders and followers) to evict their entry caches. This would require a global module that coordinates peers. When many followers on the same node receive append requests under memory pressure, they would notify the module, which would then send messages to a sufficient number of peers to trigger entry cache eviction and report the results back to the followers. This module could also be triggered by a leader on the same node when it's receiving write requests under memory pressure. However, this solution is a bit complex and has a low ROI. A more practical solution is to optimize the conditions for triggering entry cache eviction, so it occurs earlier, avoiding append rejection. Why doesn't the current version trigger cache eviction earlier? Currently, entry cache eviction is checked either by the `raft_log_gc_tick_interval` (default 3 seconds) or during `handle_raft_committed_entries`. The trigger conditions for eviction are: 1. Total memory usage has reached the high water mark. 2. Entry cache memory usage has reached the `evict_cache_on_memory_ratio` (default 0.1, not visible to the customer). On the other hand, append rejection is triggered when: 1. Total memory usage has reached the high water mark. 2. The total memory usage of the entry cache, raft messages, and applying entries has reached the `reject_messages_on_memory_ratio` (default 0.2). The issue is that when the first condition is met, the second condition for append rejection may be triggered earlier. The solution proposed in this PR is to modify the first condition for cache eviction. Instead of waiting for memory usage to fully reach the high water mark, eviction will be triggered when memory usage is **near** the high water mark. This change should not introduce significant performance overhead because eviction is only triggered when memory usage is near the high water mark, and it helps prevent the more disruptive append rejections. Signed-off-by: hhwyt <hhwyt1@gmail.com> Co-authored-by: hhwyt <hhwyt1@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
What is changed and how it works?
Issue Number: Close #17392, #17537
What's Changed:
Related changes
pingcap/docs/pingcap/docs-cn:Check List
Tests
Side effects
Release note