WIP: VReplication parallel copy#8934
Conversation
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…t startTransactionWithConsistentSnapshot Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
Updates:
|
|
Whoa. All pre-existing tests are passing with the new logic. I'll start crafting specialized tests. |
|
Of course all tests passed. They weren't running the new flow after all. |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
Most test failures right now seems to originate again by the type of testing we do: our endtoend tests look for a specific sequence of queries, and now parallelism ruined it all... |
|
Thoughts on the design are welcome |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
Any guesses / predictions as to how this might affect the memory usage on the vttablets? Both source and target. |
|
Great question, it will add |
|
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
|
This PR was closed because it has been stale for 7 days with no activity. |
Followup to #8056
This is an initial attempt at paralellizing VReplication copy; the main applicability is for MoveTables or Migrate, with multiple tables involved.
Recap
As quick recap from "The general VReplication flow" in #8056:
VReplication currently only ever copies one table at a time
Rows are read by
rowstreamerusing aLOCK TABLES READ+ get GTID +START TRANSACTION WITH CONSISTENT SNAPSHOT+UNLOCK TABLESvcopierinvokes the above via gRPC, receives the rows and writes them down to target table + updatescopy_statecatchup and fast forward steps follow, applying events from the binary log
Why parallelize and what's the premise
Trivially we want to parallelize the copy to save time; we've seen as high as weeks-long mass imports of data.
Parallelization can occur in two places:
rowstreamervcopierAUTO_INCREMENTcolumn. IfAUTO_INCREMENTexists, basic tests show almost no gain with 2+ concurrent writes. We wish to focus on multi-table concurrency.We know gRPC is a major source of overhead, and so we want to avoid multiple gRPC calls; we also assume that parallel data transfer across the network is not faster than serial data transfer across the network, assuming we're able to keep the network busy/utilized.
How not to parallelize VReplication copy
We don't take current behavior and multiply
ntimes in parallel. If we did that:ngRPCsnLOCK TABLEstatementsntimes the same binary log eventsnVPlayers processing those duplicate eventsProposed solution
We want a single gRPC call that parallelizes into
nworkers in both ends: onrowstreamerand onvcopier; we want a singlevplayerto process all binlog events.gRPC
We add a
VStreamRowsParallel()function, withVStreamRowsParallelRequestmessage. In essence, it's similar toVStreamRows, but:lastPkvaluesObviously queries and the
lastPKvalues correspond to each other.VStreamRowsResponseis extended to includeTableName.vcopierwill need this to differentiate between responses of different queries/tables.rowstreamer
VStreamRowsParallelrequest with multiple queriessendQueryfor all queriesn+1 DB connections. One for each table/plan, plus one global that creates a lock.LOCAL TABLES t1 READ, t2 READ, t3 READ, ...querySTART TRANSACTION WITH CONSISTENT SNAPSHOTUNLOCK TABLESSELECT FROM t(i)concurrently. Each maintains pktsizesendrows. This is serialized.vreplication/vcopier
Much of the logic is already implictly supported, by virtue of
copy_statebackend table. VCopier supports multiple tables in a workflow (asMoveTablessupports-allflag), and so catchup/fastforward know how to handle the existence of multiple plans and multiplelastPks.To simplify things, we will parallize by running batches of
ntables at a time. This can, and will, have fragmentation. One or two of the tables will be larger than the others; some tables will complete first, but the batch will only complete when allntables are processed.We do it that way because this is what allows us to take a single table lock for all tables involved, and to keep our sanity while looking into GTID value.
Best approach would be to use a greedy alorithm: pick tables by size descending. this will parallaize more tables of same size at a time, which optimizes for less fragmentation (fragmentation == time wasted not parallelizing when we have an available slot).
VCopier needs to pick
ntables at a time, compute plans for all these tables, and invokeStreamRowsParallel. Possibly there will already belastPkfor some of those tables; this is trivially read fromcopy_statewith no significant changes other than reorganizing the data.We will create
nworkers. Ideally, each worker will write to a different table (we haventables), but it is possible thatrowstreamersends results from one table more frequelty. The logic to parallelize the writes onvcopieris not trivial I think. We can allow for periodic parallelization of writes to same table if that simplifies the code.vcopierneeds to both parallelize (via goroutine) the writes, but at the same time be able to respond to thesendfunction witherrorresult. As I'm writing this the problem goes more complex in my mind... :(Then, calls to catchup/fastforward converge again; there's only one thread to run those.
Initial PR status:
parallel_rowstreamer.goandsingle_rowstreamer.goparallel_vcopier.go; there is an initial refactor to support multiple concurrnet plans;; there is no parallelization yet, and vplayer needs to be extracted/encapsulated.Checklist
cc @rohit-nayak-ps @sougou @deepthi ; no need for code review right now, though you're welcome to, of course.