Yield CPU for concurrent flush and concurrent mergeDelta (#5410)#5424
Conversation
|
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. DetailsReviewer can indicate their review by submitting an approval review. |
|
The merge conflict is caused by #5296 not merged. |
|
@breezewish can you resolve the conflicts and make this pr merged? |
fba64c5 to
af90192
Compare
Signed-off-by: Wish <breezewish@outlook.com>
af90192 to
f1025db
Compare
|
/run-all-tests |
Coverage for changed filesCoverage summaryfull coverage report (for internal network access only) |
|
/merge |
|
@breezewish: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
This pull request has been accepted and is ready to merge. DetailsCommit hash: f1025db |
|
@ti-chi-bot: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
/run-unit-test |
|
/run-all-tests |
Coverage for changed filesCoverage summaryfull coverage report (for internal network access only) |
This is an automated cherry-pick of #5410
What problem does this PR solve?
Issue Number: close #5409
What is changed and how it works?
Add sleep for
flushCacheandmergeDeltaBySegment:When
flushCacheis retrying, wait backoff will be 5ms ~ 100ms (considering that existing flushCache usually takes short time to finish).When
mergeDeltaBySegmentis retrying, wait backoff will be 50ms ~ 1s (considering that split-prepare could take several seconds to finish).Check List
Tests
To test with the fix, I introduced a
splitEachSegmentdebug function locally to manually trigger a split:The test case is to trigger the split for a 1GB segment, and then perform a mergeDelta at the same time.
Before the fix (using release v6.1):
when there are both split (takes 10s) and mergeDelta (takes 20s in total, blocked by split for 10s), there are 211K retries in 10s when the mergeDelta is blocked:
The CPU usage is around 200% during the split+mergeDelta:
After the fix:
there are only 14 retry attempts with exp backoff:
The CPU usage keeps around 100% (first 11s for split, next 10s for mergeDelta):
Note: As there is maximum 1s backoff, the CPU usage dropped for a short while when split was finished and the mergeDelta was not yet started.
Side effects
Documentation
Release note