Online DDL via VReplication by shlomi-noach · Pull Request #7419 · vitessio/vitess

shlomi-noach · 2021-01-31T06:53:47Z

Ready for review

This PR implements a "forward" alter table via VReplication.

General design described in #6926 (comment), which this PR implements.

Notable bullets about the changes in this PR:

The main logic of this PR is in go/vt/vttablet/onlineddl/vrepl.go and go/vt/vttablet/onlineddl/executor.go
go/vt/vttablet/onlineddl/executor.go can now invoke, cancel and terminate VReplication based migrations. Some of this management is symmetric to how we invoke, cancel and terminate gh-ost or pt-osc, but there are also significant differences:
- while gh-ost and pt-osc can only work within the context of this tablet (invoked as an OS Exec), VReplication migrations can execute by one tablet and continued on another, even in face of failover.
- gh-ost and pt-osc have self-logic to run the migration. VReplication needs to be told what exactly it should run. The tablet knows about a alter table statement, and needs to create a matching rule/filter query for VReplication.
- gh-ost and pt-osc have self-logic to cut-over. Vreplication does not. Executor identifies the occasion for cut-over and implements the cut-over logic, which entails stopping writes to original tables, waiting for pos, stopping VReplication, replacing tables.
Imported a bunch of code from https://github.com/github/gh-ost. These are found under go/vt/vttablet/onlineddl/vrepl. The kind of functionality in those files:
- Parsing an ALTER TABLE statement
- Structures to define columns
- Dealing with datatypes
  We don't strictly need all the above as-is. We can replace parsing with vitess's parsing. We don't need to track datatypes because vreplication does. But I copied&pasted those files because they are mature and stable and get the work done.
  I intend to refactor them later on and remove redundant/unrelated code, or replace with existing Vitess functionality. But I'd like to do so on a separate PR so as to focus on the main overall functionality here.
Split onlineddl/endtoend tests into endtoend_ghost and endtoend_vrepl. They are mostly similar but with some changes (like how do you throttle a migration). The tests are OK-ish but need to be more elaborate. I again wish to follow up in a future PR as I am yet to design a satisfactory endtoend test for schema migrations and failure scenarios.
Due to some earlier attempt to use wrangler inside tabletserver, I created a new interface called vexec.Executor, and refactored ddlExecutor to be vexec.Executor instead of onlineddl.Executor. I ended up not using wrangler inside tabletserver but I liked the refactor and kept it.

initial PR comment when this was Work In Progress:

Description

Implementing Online DDL via VReplication.
There's a bunch of code to wrap the VReplication mechanism. I import a lot of logic from gh-ost (analyzing tables, analyzing primary key, sanity checks). Some of this should be replaced with vitess's sqlparser, but for now it's faster to go with gh-ost imported code as I know it is stable.

This is work in progrss. the idea is that we run VReplication in a materialize-like fashion, where both source and target are the shard's PRIMARY.

Vreplication is kicked by the online DDL executor, and not by vtctl/wrangler. this is very different from the normal VReplication flow, and reasoning is that we want the tablets to validate and to schedule the migration at its own good time. So we're using the existing Online DDL Executor logic to start/stop migrations. Vreplication is different in that it can survive restarts or failovers. This is something the executor will take into account.

VReplication only works through PRIMARY KEYs. In that aspect, gh-ost is more flexible as it can iterate any UNIQUE KEY. Another thing for me to check is what are valid PRIMARY KEY changes: is it OK to add a column to a PRIMARY KEY? Remove a column?

Another difference to normal VReplication flow is how we do the cut-over. We will automate the cut-over, and will then rename the tables. No change to routing rules. Later, we'd need to see if we can also reverse VReplication for renamed tables. I'm not sure how simple/hard that would be.

anyway, this is really work in progress and no guarantees. Will comment as progress is made.

Related Issue(s)

tracking: Online DDL: tracking issue #6926

Checklist

Should this PR be backported?
Tests were added or are not required
Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

…parser Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

…t on this branch's radar Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

…ap query Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

…writes to table, waiting for updated pos, renaming tables, releasing table, releasing locks Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

rohit-nayak-ps

This is really great :-)

Left some comments/queries: in general the functionality looks good.

go/vt/vttablet/onlineddl/executor.go

rohit-nayak-ps · 2021-02-15T15:54:44Z

go/vt/vttablet/onlineddl/executor.go

-		"strategy": sqltypes.StringBindVariable(string(schema.DDLStrategyPTOSC)),
+// isVReplMigrationReadyToCutOver sees if the vreplication migration has completed the row copy
+// and is up to date with the binlogs.
+func (e *Executor) isVReplMigrationReadyToCutOver(ctx context.Context, s *VReplStream) (isReady bool, err error) {


For low write qps it is possible that there are no binlog events for a table for a long time. If I read this logic correctly, in such cases the Online DDL workflow may not cut over automatically.

Interesting. Isn't there a heartbeat mechanism for vreplication though?

time_updated is updated by the heartbeat. However transaction_timestamp is only when a new binlog event is seen, not for a heartbeat. This is quite rare since it means there is no writes on the source at all. But it has been reported by some users. Also if a database is taken "offline" for some reason (in the sense that no app is writing to it) and the user starts a workflow this can happen.

Practically we may not need to address this now.

removed evaluation of transaction-timestamp. I want to be on the safer side.

timeUpdated will always be current because of the heartbeat so the moment the copy state is empty it will say that the migrate workflow is ready for cutover. Am I missing something?

Am I missing something?

Ermm... Not sure why you're asking? Am I missing something? 😅

To elaborate a bit, though I'm still unsure what you meant:

timeUpdated will always be current

Unless there's some lag, or vreplication is stopped

so the moment the copy state is empty

We also validate that pos in non-empty. Were you suggesting the migration might cut-over before starting copying rows?

Having discussed this: meanwhile checking transaction_timestamp is the safer approach. time_updated gets update regardless of incoming events.
So the risk of transaction_timestamp is in completely stale servers. However, it's enough that some table is updated and the problem goes away. But if the server is completely stale, cut-over will not happen.

To solve that we can either:

force injection of some dummy change, or

compare GTID pos (likely not a good idea, because _vt tables are getting changed, affecting GTID, but scrubbed in the streamer)

modify streamer to include more information that we can use on target side.

go/vt/vttablet/onlineddl/vrepl.go

rohit-nayak-ps · 2021-02-15T16:06:13Z

go/vt/vttablet/onlineddl/executor.go

+// isVReplMigrationReadyToCutOver sees if the vreplication migration has completed the row copy
+// and is up to date with the binlogs.
+func (e *Executor) isVReplMigrationReadyToCutOver(ctx context.Context, s *VReplStream) (isReady bool, err error) {
+	// Check all the cases where migration is still running:


When an online ddl completes, vttablets and vtgates will need to reload their schema as per the current mechanisms, before they see the changes made by the ddl correct?

Right. I issue a ReloadSchema before running the migration. Should I run ReloadSchema immediately after completing the migration, or should I actually do that during the cut-over (when tables have been switched, and writes are still blocked)?

Doing it when writes are blocked might be better since binlog events after the write will then see the new schema. Otherwise we may have a race: new events for the altered table may be processed before the new schema is reloaded.

But the ReloadSchema is just for this tablet, right?

https://github.com/vitessio/vitess/pull/7419/files#diff-059c9f46e8d270d9c5514ef2b08679035eb0daaa8d95074e34ef43a81d50dc37R637-R639

if err := tmClient.ReloadSchema(ctx, tablet.Tablet, ""); err != nil { return err }

correct

Just clarifying, that this goes through gRPC; so I want to make sure this isn't an expensive event, because meanwhile we're blocking (rather -- rejecting) writes to the table, so it's a critical point in time.

Define expensive...: ReloadSchema can take a while since it parses the entire db schema. It has been speeded up recently but for a large number of tables it will probably take a non-trivial amount of time. I would think, still in the order of milliseconds, but not entirely sure.

non-trivial amount of time.

milliseconds is just fine. Would it take more than 1 second (I'd define "expensive" at 1 second) ? e.g. on a schema with 10,000 tables? Is it possible to ReloadSchema for just one table?

(having discussed this in sync with @rohit-nayak-ps) reloading schema can take a substantial time if many thousands of tables are involved. For now, we issue an async ReloadSchema (with goroutine). In the future, we will look into reloading a single table.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

…tion-online-ddl Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach · 2021-02-16T08:57:17Z

I have merged #7492 here. This adds a new endtoend/online_ddl_vrepl_stress test, which tests vreplication/online DDL under concurrent load, where a mockup "app" opens multiple connections, runs some randomly spread queries, and keeps track of changes. A migration takes place during that workload. At th eend of the migration, the app looks at migrated table and expects some metrics in the table to match its own expected metrics. See more in #7492 (comment)

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach · 2021-02-16T15:10:04Z

7e2a074 introduces dummy statement injection for when we run vreplication based online DDL, and the server is utterly stale. In this situation, transaction_timestamp is never updated, and we may never know whether we should cut-over. See #7419 (comment)

The solution is hacky: if we notice transaction_timestamp is too old (over 1 minute by default), then we set a timer to inject a futuristic dummy statement, in the form of DROP VIEW IF EXISTS ... (name of view which surely does not exist). We inject several of these statements at a schedule when we expect the next online DDL cycle to run, a point where we check for running vreplication migrations. We expect those injections to be intercepted in the binary log (assuming no lag) an thus pave the way for a cut-over.

shlomi-noach · 2021-02-16T17:14:21Z

@rohit-nayak-ps actually, re: discussion about cut-over and stale server -- the tests disagree. The tests are happy to cut-over when the copy is complete, even when there's no traffic on the table. Did some local tests, inserted two rows on a table, then slept, then migrated -- cut-over took place.
Let me take a closer look on what happened.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach · 2021-02-17T14:12:31Z

I notice that it is impossible to ADD UNIQUE KEY if there's a data conflict. This needs to be supported. Gonna look into it on a different PR, will possibly import a few tests from gh-ost.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach · 2021-02-18T05:27:04Z

@sougou confirmed changes to _vt tables are intercepted on the target, and transaction_timestamp gets updated. There is indeed an infinite loop -- but it's rate limited: only once event per second is registered in transaction_timestamp.

I've reverted 7e2a074 ; there is no need to inject dummy transaction. The existing mechanism works even on stale servers.

shlomi-noach · 2021-02-18T05:29:25Z

Looking at ADD UNIQUE KEY, the changes required to support it are small, but also begin the rolling of the snowball that is reworking the copier+player flow. I will address the idea in an issue and a followup PR.

This PR, meanwhile, is good to review, notwithstanding the fact that vrepl online DDL does not support ADD UNIQUE KEY and a bit of other UNIQUE KEY.PRIMARY KEY changes. We will work on those in iterations.

shlomi-noach · 2021-02-18T06:45:52Z

I should clarify. Adding a UNIQUE KEY is possible, when unique constraints are met. There is a use case for adding a UNIQUE KEY when the constraint is not met; I'll discuss this elsewhere.

rohit-nayak-ps

lgtm

shlomi-noach added 30 commits January 28, 2021 17:14

Towards VReplication based Online DDL

c67648f

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

testing for DDLStrategyOnline

56cb0d6

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

import from gh-ost and iterating

379d7ec

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

towards vreplication based online DDL

52d3e85

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

cleanup

4f3fdc4

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

call generalized 'analyze'

df0905c

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

kick schema screation as soon as Open()

29895c6

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

generate vreplication insert and start statements

c37f29a

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

communicating to VREngine through gRPC

b8409d6

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

refactor parsing alterOptions; in the future this will go through sql…

56b7a59

…parser Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

simplified queries in onlineddl

da96194

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

init schema in InitDBConfig()

d6d8eed

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

mutex protestion

c6f758d

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

whoops; mutex was already there

54428fe

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

fix chech type. Backport of vitessio#7422

6784448

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

engine's early schema creation did not work as expected. Not improtan…

5a620da

…t on this branch's radar Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

check vreplication liveness, cancel vreplication migration

82a0aa0

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

check if vreplication migration is ready to cut-over

f825e56

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

added index on 'workflow' column in vreplication

1781640

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

refactor schemaMigrationsTableName outside vexec and into go/vt/schema

24b3899

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

new interface for VExec executors on tablet

5693c13

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

use new vexec.Executor interface

5d1cad0

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

VReplStream struct

691096a

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

refactor SchemaMigrationsTableName to go/vt/schema. Generate table sw…

c6487c1

…ap query Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

check if vreplication DDL is ready for cutover; cut-over by stopping …

9c2dd6c

…writes to table, waiting for updated pos, renaming tables, releasing table, releasing locks Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

WithDDL: force apply the schema on first use

3ba8845

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

smaller index

6ce7564

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

reuse exec function

b770b0b

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

cleanup

50e2468

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

with_ddl: fix test

065c677

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

multiple iterations for 'workload without ALTER TABLE'

80c119d

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

rohit-nayak-ps reviewed Feb 15, 2021

View reviewed changes

shlomi-noach added 9 commits February 15, 2021 19:04

using waitgroup

e5ab8b3

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

typo

e5dabb4

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

remove transactionTimestamp evaluation

114642b

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

fixes error message case

323562b

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

no need for replicas, make test more lightweight

8650ef9

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

wait for runMultipleConnections() to complete

f134469

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

using context.WithCancel, simplify logic

534201e

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

async ReloadScema at cut-over

942b5be

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Merge branch 'vreplication-online-ddl-mini-stress-test' into vreplica…

61454be

…tion-online-ddl Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach added 2 commits February 16, 2021 11:21

restore transaction_timestamp test

1428eea

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

inject dummy statements when vreplication transaction_timestamp is stale

7e2a074

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

minor naming change

ed4207e

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

revert 7e2a074: no need for dummy injections

7f6a800

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

rohit-nayak-ps approved these changes Feb 28, 2021

View reviewed changes

shlomi-noach merged commit 9d3a91d into vitessio:master Mar 1, 2021

shlomi-noach deleted the vreplication-online-ddl branch March 1, 2021 05:38

This was referenced Mar 1, 2021

Restore CI workflow shard 26, accidentally dropped #7569

Merged

Documenting new Online DDL via VReplication, 'online' strategy vitessio/website#718

Merged

askdba added the Component: Query Serving label Mar 4, 2021

askdba added this to the v10.0 milestone Mar 4, 2021

ajm188 mentioned this pull request Jul 14, 2021

slack vitess v10.pre tinyspeck/vitess#228

Merged

Conversation

shlomi-noach commented Jan 31, 2021 • edited by deepthi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ready for review

Description

Related Issue(s)

Checklist

Deployment Notes

Impacted Areas in Vitess

Uh oh!

rohit-nayak-ps left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shlomi-noach Feb 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shlomi-noach commented Feb 16, 2021

Uh oh!

shlomi-noach commented Feb 16, 2021

Uh oh!

shlomi-noach commented Feb 16, 2021

Uh oh!

shlomi-noach commented Feb 17, 2021

Uh oh!

shlomi-noach commented Feb 18, 2021

Uh oh!

shlomi-noach commented Feb 18, 2021

Uh oh!

shlomi-noach commented Feb 18, 2021

Uh oh!

rohit-nayak-ps left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shlomi-noach commented Jan 31, 2021 •

edited by deepthi

Loading

shlomi-noach Feb 15, 2021 •

edited

Loading