ddl: make read and write async during adding index#39249
ddl: make read and write async during adding index#39249ti-chi-bot merged 14 commits intopingcap:masterfrom
Conversation
|
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. DetailsReviewer can indicate their review by submitting an approval review. |
|
/hold WIP |
|
/run-check_dev_2 |
|
/unhold |
| rc.mergeWarnings(taskCtx.warnings, taskCtx.warningsCount) | ||
|
|
||
| if num := result.scanCount - lastLogCount; num >= 30000 { | ||
| if num := result.scanCount - lastLogCount; num >= 90000 { |
There was a problem hiding this comment.
For 90000 row print log, it will be how much of interval of time?
There was a problem hiding this comment.
Normally, throughput of adding index using ingest is about 200k rows per second.
There was a problem hiding this comment.
how about we set 200k rows print one log?
There was a problem hiding this comment.
Maybe 200k is too large for the old implementation, the average throughput is 20k rows per second.
There was a problem hiding this comment.
I think we can create a sliding window to static the throughput. so we can adapt the rows for machine performance.
Lucky, I have create a sliding windows at util/window at this pr
wjhuang2016
left a comment
There was a problem hiding this comment.
Any tests for changing the worker size while adding the index?
| if scheduler.copReqSenderPool != nil { | ||
| scheduler.copReqSenderPool.sendTask(task) | ||
| } | ||
| scheduler.taskCh <- task |
There was a problem hiding this comment.
Should the task be sent to two different channels?
There was a problem hiding this comment.
I think so. This is because I want to keep the old backfill implementation and introduce as few changes as possible:
Old backfill:
scheduler.taskCh -> backfill worker -> read rows -> write indexes -> resultCh
New backfill in this PR:
scheduler.taskCh -> backfill worker -> write indexes -> resultCh
^
|
copReqSenderPool.taskCh -> cop-req sender -> read rows
If we only use one channel, then it will be
scheduler.taskCh -> backfill worker -> write indexes -> resultCh
| ^
v |
-> cop-req sender -> read rows
The backfill worker cannot process the next task until the current task is complete. However, this is not the behavior we want in the cop-request sender pool. To gain better performance, the sender pool should be always filled with tasks.
Besides, for one channel solution, it is not easy to set the concurrency of backfill worker and cop-request sender separately.
If you mean the test for the backfill worker, you can search |
|
/merge |
|
This pull request has been accepted and is ready to merge. DetailsCommit hash: ea47941 |
TiDB MergeCI notify🔴 Bad News! [1] CI still failing after this pr merged.
|
What problem does this PR solve?
Issue Number: ref #35983
Problem Summary:
This is a further step from #39191.
What is changed and how it works?
This PR introduces the cop-request sender pool for coprocessor-read. Each sender lives in a separate goroutine, builds a coprocessor request for a task range, holds the index columns values from coprocessor response, converts them to index KVs, and sends the index KVs to a channel in batch.
The backfill workers keep reading index KVs from the channel, and write them to lightning's local engines.
Compared with the synchronous version(before this PR), it has the following advantages:
As a quick verification, here is the add index performance:
The commit before #39191:
After this PR:
By using this patch to record the time spent on each phase of adding index, we have the following log:
The commit before #39191:
After this PR:
backfill txn readis reduced from 9.11 seconds to 0.10 seconds, it is almost eliminated.Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.