Disagg: Fix the issue that lifetime of read-snapshots in WN may be longer than expected#9297
Conversation
|
/retest |
3cbfaa7 to
09a2ccd
Compare
36d5d78 to
9dc0288
Compare
…lizing the snapshot data.
7769eb6 to
c1e5333
Compare
| if (request.page_ids_size() == 0 && !needFetchMemTableSet()) | ||
| return; | ||
|
|
||
| // No matter all delta data is cached or not, call FetchDisaggPages to release snapshot in WN. |
There was a problem hiding this comment.
Will this cause optimizations like in https://github.com/tidbcloud/tiflash-cse/pull/236/files get reverted? (The PR skips occupySpace to reduce lock contention)
There was a problem hiding this comment.
@JinheLin @breezewish
What about adding another unary RPC method "CancelDisaggTask". When the MPP request on CN is finished or canceled, CN calls "CancelDisaggTask" to let the WN release the whole snapshot. Instead of sending "fetchPages" segment by segment?
This can
- decouple the optimization of 'fetchPages' and release WN snapshot
- reduce the total RPC calls between CN and RN
- handle the situation when the request is canceled in the CN
There was a problem hiding this comment.
Are you referring to this code: https://github.com/tidbcloud/tiflash-cse/pull/236/files#diff-415598e006988b3cadcac5e3a4eda7fdbed5ac29cc1ce8d11211a5d26a4d8449 ?
If there are no ColumnFileTiny to fetch, skip here: https://github.com/pingcap/tiflash/blob/master/dbms/src/Storages/DeltaMerge/SegmentReadTask.cpp#L332.
Code in https://github.com/pingcap/tiflash/pull/9297/files/c1e5333660e180072ff658e2cd45349b6d2db28b#diff-9fc375427641c2044d5949241ef2f173ee8b4d24e252216a6cba0018ad5c0413L545-L547 means all pages are cached.
There was a problem hiding this comment.
-
RPCs that can be reduced is when all pages are cached and there is no
ColumnFileInMemorysince v8.1, because ColumnFileInMemory is not cached.- If there are no write, segments are stable-only, snapshots are released quickly in WN.
- If there are some writes, the probability that there are no
ColumnFileInMemoryis low, so these RPCs likely cannot be reduced.
-
The coupling problem also looks fine because the logic is simple, and this code does not spread outward.
-
The situation that
CancelDisaggTaskto handle is abnormal requests.CancelDisaggTaskcan only serve as a fallback. If a request takes a long time to execute, the lifecycle of the snapshot will still be long than expected.
There was a problem hiding this comment.
Another optimization is use just one stream RPC to fetch pages of all segments. But this should be quite troublesome.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JaySon-Huang, Lloyd-Pottiger The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
@JinheLin: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests
If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
In response to a cherrypick label: new pull request created to branch |
…nger than expected (#9297) (#9317) close #9298 1. In WN, release snapshots of SegmentReadTasks that do not have ColumnFileTiny and ColumnFileInMemory. 2. In CN, no matter all ColumnFileTinys and ColumnFileInMemorys are cached or not, call FetchDisaggPages to release snapshot in WN. Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Co-authored-by: jinhelin <linjinhe33@gmail.com>
…nger than expected (#9297) (#9316) close #9298 1. In WN, release snapshots of SegmentReadTasks that do not have ColumnFileTiny and ColumnFileInMemory. 2. In CN, no matter all ColumnFileTinys and ColumnFileInMemorys are cached or not, call FetchDisaggPages to release snapshot in WN. Co-authored-by: jinhelin <linjinhe33@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
What problem does this PR solve?
Issue Number: close #9298
Problem summary:
Snapshot of each SegmentReadTask will be released after FetchDisaggPages is called. But for some segments with no ColumnFileTiny or ColumnFileInMemory to fetch, FetchDisaggPages will never be called.
What is changed and how it works?
Notes:
Check List
Tests
Side effects
Documentation
Release note