Do Not Merge: VReplication Copy Phase: Parallelize bulk insert to check for performance improvements#7728
Conversation
78de2de to
02abe6f
Compare
ca7ae94 to
f7a41c9
Compare
…cs and flags. Signed-off-by: Rohit Nayak <rohit@planetscale.com>
0664e10 to
5410ad6
Compare
…metric fails in Prometheus and wondering if it failing Prometheus' naming convention Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
|
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
|
Hey @rohit-nayak-ps, at PlanetScale we're pretty keen to speed up MoveTables for customers with terrabytes-large tables. I came across this PR while seeing how much of a performance improvement we can get from parallelizing bulk inserts. Before I discovered this PR, I created this one, which takes a pretty similar approach to yours. I was wondering: would you be interested in reviving this PR? Or if you too busy, letting me revive it? I created a branch based on yours that resolves those merge conflicts. I tested it out and got pretty good performance results. If those changes look good to you, I can open a PR against this one, or apply the commits directly. |
|
Replaced by #10828 which is essentially the same changes over the latest main. Did this in preference to a merge since it is a year since I started this and this felt easier and safer to do rather than fix conflicts in this one. |
POC
Batches multiple bulk inserts in the copy phase to test if concurrent bulk inserts improve copy phase performance.
Description
During the copy phase we stream rows and for every
PacketSizeof rows (default 250K bytes) we do a bulk insert. This PR batches multiple of these sets.We add a
VReplicationTableCopyTimingsstat to monitor when a table copy starts and when it ends for benchmarking purposes. This metric is per vreplication stream.This feature is behind an experimental vttablet flag. To enable it use
-vreplication_experimental_flags 2on the target.You can specify the number of bulk inserts using
-vreplication_parallel_bulk_inserts 16Default is 4Approach
Instead of inserting one batch at a time we collect N batches and insert them in parallel. Commits are done in order.
Checklist
Deployment Notes
Impacted Areas in Vitess
Components that this PR will affect: