Skip to content

raftstore: optimize entry cache eviction to prevent append rejections under memory pressure#17488

Merged
ti-chi-bot[bot] merged 3 commits intotikv:masterfrom
hhwyt:master
Sep 24, 2024
Merged

raftstore: optimize entry cache eviction to prevent append rejections under memory pressure#17488
ti-chi-bot[bot] merged 3 commits intotikv:masterfrom
hhwyt:master

Conversation

@hhwyt
Copy link
Contributor

@hhwyt hhwyt commented Sep 5, 2024

What is changed and how it works?

Issue Number: Close #17392, #17537

What's Changed:

This PR addresses the issue where, under memory pressure, a follower  
directly rejects append requests instead of first attempting to free  
up memory by evicting the entry cache.  

One potential solution is for the follower, before rejecting append  
requests, to notify other peers on the same node (including leaders  
and followers) to evict their entry caches. This would require a  
global module that coordinates peers. When many followers on the  
same node receive append requests under memory pressure, they would  
notify the module, which would then send messages to a sufficient  
number of peers to trigger entry cache eviction and report the  
results back to the followers. This module could also be triggered  
by a leader on the same node when it's receiving write requests  
under memory pressure. However, this solution is a bit complex and  
has a low ROI.  

A more practical solution is to optimize the conditions for  
triggering entry cache eviction, so it occurs earlier, avoiding  
append rejection.  

Why doesn't the current version trigger cache eviction earlier?  

Currently, entry cache eviction is checked either by the  
`raft_log_gc_tick_interval` (default 3 seconds) or during  
`handle_raft_committed_entries`. The trigger conditions for eviction  
are:  
1. Total memory usage has reached the high water mark.  
2. Entry cache memory usage has reached the  
`evict_cache_on_memory_ratio` (default 0.1, not visible to the  
customer).  

On the other hand, append rejection is triggered when:  
1. Total memory usage has reached the high water mark.  
2. The total memory usage of the entry cache, raft messages, and  
applying entries has reached the `reject_messages_on_memory_ratio`  
(default 0.2).  

The issue is that when the first condition is met, the second  
condition for append rejection may be triggered earlier.  

The solution proposed in this PR is to modify the first condition  
for cache eviction. Instead of waiting for memory usage to fully  
reach the high water mark, eviction will be triggered when memory  
usage is **near** the high water mark.  

This change should not introduce significant performance overhead  
because eviction is only triggered when memory usage is near the  
high water mark, and it helps prevent the more disruptive append  
rejections.  

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has signed the dco. contribution This PR is from a community contributor. labels Sep 5, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 5, 2024

Hi @hhwyt. Thanks for your PR.

I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/needs-linked-issue labels Sep 5, 2024
@hhwyt hhwyt changed the title [WIP] Enhance needs_evict_entry_cache to proactively release entry cache and avoid write rejection under memory pressure [WIP] Enhance needs_evict_entry_cache to preemptively release entry cache and avoid write rejection under memory pressure Sep 6, 2024
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 10, 2024
@hhwyt hhwyt changed the title [WIP] Enhance needs_evict_entry_cache to preemptively release entry cache and avoid write rejection under memory pressure Optimize entry cache eviction to prevent append rejections under memory pressure Sep 10, 2024
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 10, 2024
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Sep 10, 2024
@hhwyt hhwyt force-pushed the master branch 4 times, most recently from 8d5060a to d4144c5 Compare September 10, 2024 06:04
@hhwyt
Copy link
Contributor Author

hhwyt commented Sep 10, 2024

/cc @glorv @Connor1996 @hicqu @zhangjinpeng87 PTAL.

@hhwyt
Copy link
Contributor Author

hhwyt commented Sep 10, 2024

/pull-unit-test

@glorv glorv requested review from Connor1996, glorv and hicqu September 10, 2024 06:27
@hhwyt
Copy link
Contributor Author

hhwyt commented Sep 10, 2024

/retest

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 10, 2024

@hhwyt: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@glorv
Copy link
Contributor

glorv commented Sep 10, 2024

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the approved label Sep 13, 2024
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 24, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 24, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-09-13 02:55:13.642738284 +0000 UTC m=+584183.383162223: ☑️ agreed by LykxSassinator.
  • 2024-09-24 07:56:51.541409434 +0000 UTC m=+1552681.281833373: ☑️ agreed by hicqu.

@hhwyt
Copy link
Contributor Author

hhwyt commented Sep 24, 2024

/retest

…ntries

Non-dangle entries are copied from EntryCache, so their memory usage should also be included in the accounting.

Signed-off-by: hhwyt <hhwyt1@gmail.com>
@hhwyt
Copy link
Contributor Author

hhwyt commented Sep 24, 2024

@hicqu @LykxSassinator PTAL. Added a new commit to address #17537.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 24, 2024

@hhwyt: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: glorv, hicqu, LykxSassinator

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [LykxSassinator,glorv,hicqu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot merged commit 5fa16ee into tikv:master Sep 24, 2024
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Sep 24, 2024
@glorv glorv added needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. labels Nov 8, 2024
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #17792.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #17793.

hhwyt added a commit to ti-chi-bot/tikv that referenced this pull request Dec 12, 2024
… under memory pressure (tikv#17488)

Signed-off-by: hhwyt <hhwyt1@gmail.com>
ti-chi-bot bot added a commit that referenced this pull request Dec 12, 2024
… under memory pressure (#17488) (#17792)

close #17392

This PR addresses the issue where, under memory pressure, a follower  
directly rejects append requests instead of first attempting to free  
up memory by evicting the entry cache.  

One potential solution is for the follower, before rejecting append  
requests, to notify other peers on the same node (including leaders  
and followers) to evict their entry caches. This would require a  
global module that coordinates peers. When many followers on the  
same node receive append requests under memory pressure, they would  
notify the module, which would then send messages to a sufficient  
number of peers to trigger entry cache eviction and report the  
results back to the followers. This module could also be triggered  
by a leader on the same node when it's receiving write requests  
under memory pressure. However, this solution is a bit complex and  
has a low ROI.  

A more practical solution is to optimize the conditions for  
triggering entry cache eviction, so it occurs earlier, avoiding  
append rejection.  

Why doesn't the current version trigger cache eviction earlier?  

Currently, entry cache eviction is checked either by the  
`raft_log_gc_tick_interval` (default 3 seconds) or during  
`handle_raft_committed_entries`. The trigger conditions for eviction  
are:  
1. Total memory usage has reached the high water mark.  
2. Entry cache memory usage has reached the  
`evict_cache_on_memory_ratio` (default 0.1, not visible to the  
customer).  

On the other hand, append rejection is triggered when:  
1. Total memory usage has reached the high water mark.  
2. The total memory usage of the entry cache, raft messages, and  
applying entries has reached the `reject_messages_on_memory_ratio`  
(default 0.2).  

The issue is that when the first condition is met, the second  
condition for append rejection may be triggered earlier.  

The solution proposed in this PR is to modify the first condition  
for cache eviction. Instead of waiting for memory usage to fully  
reach the high water mark, eviction will be triggered when memory  
usage is **near** the high water mark.  

This change should not introduce significant performance overhead  
because eviction is only triggered when memory usage is near the  
high water mark, and it helps prevent the more disruptive append  
rejections.

Signed-off-by: hhwyt <hhwyt1@gmail.com>

Co-authored-by: hhwyt <hhwyt1@gmail.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
@ti-chi-bot ti-chi-bot bot removed the needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. label Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved contribution This PR is from a community contributor. dco-signoff: yes Indicates the PR's author has signed the dco. lgtm needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize suspending writes when memory usage is high

5 participants