VReplication support for non-PRIMARY KEY (Online DDL context) by shlomi-noach · Pull Request #8364 · vitessio/vitess

shlomi-noach · 2021-06-22T05:52:05Z

Description

Today, VReplication assumes:

All tables have a PRIMARY KEY
source table and target table have same PRIMARY KEY (same columns, same order of columns)

Online DDL requires that this restriction is lifted. A user should be allowed to change a PRIMARY KEY. Online DDL-wise, the requirement is that both source and target tables have at least one shared not-nullable UNIQUE KEY. A valid example:

create table t {
  uuid varchar(64) not null primary key,
  some_other_column int not null
}

with the following change:

alter table t drop primary key add column id bigint unsigned not null auto_increment primary key, add unique key uuid_uidx(uuid)

In the above scenario both before/after tables have a non-nullable unique key on uuid column. In the source table, it is PRIMARY, and in target table, it is uuid_idx. The fact that the name of the index is different is merely a technicality. VReplication should be able to iterate tables by uuid order on both source and target.

Today, the assumption that we only ever use a PRIMARY KEY goes very deep. The reload() function in vreplication pre-evaluates all tables in a schema according to their PK, irrespective of any VReplication stream. In today's assumption, when a new VReplication stream kicks in, it just knows which table columns comprise the PRIMARY KEY, because everything is already evaluated beforehand.

However, we now want to choose a unique key per operation. Maybe the source/target share the exact same PRIMARY KEY, as is normally the case, and there's nothing to do. But maybe they don't, and there's multiple options. So we need to evaluate which unique keys are shared between the two tables, pick the "best" shared key, and iterate by that key. But we will only know this once a online DDL is requested!

Again, the assumption that we iterate on PRIMARY KEY is deeply rooted. I want to avoid rewriting huge sections of code. Instead, I propose an "override". We keep such terms as lastPK or pkColumns, but we understand that they may refer to non-PK unique keys. We introduce unique key hints in VReplication's filter/rule. Online DDL evaluates shared unique key and writes down its name into the rule. VReplication overrides pkColumns by reading the unique key columns in key order (ie how columns are ordered withing the unique key, not how they are ordered in the table). Most other information remains intact.

Importing gh-ost suite tests to validate correctness, and adding more tests.

This is work in progress.

Related Issue(s)

#8179

Checklist

Tests were added or are not required
Documentation was added or is not required

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

…PK' columns from unique key Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

…nique key specific info Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

…d unique key Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach · 2021-06-22T09:53:46Z

While this is still WIP, I'm happy to get eyes on the proposed solution. For your convenience, this is the outline:

proto

vreplication's filter/rule has 2 new fields:
- source_unique_key (mapped to SourceUniqueKey variable)
- target_unique_key (mapped to TargetUniqueKey variable)
- These are optional, and are only populated by Online DDL. Normally both will have the string value PRIMARY
- But it's possible one or both will have the name of another unique key. The two names don't have to be identical, but Online DDL ensures that both will cover the exact same columns in same order (though a column is also allowed to be renamed, but that's just a nuance)

Online DDL

In vrepl.go, the part of Online DDL that analyzes a VReplication-based migration, we analyze both source and target table for all existing unique keys. We rank unique keys from best to worst. PRIMARY is always "best" because it's the clustered index hence by definition most performant key. Then, we prefer keys that use integers, over keys that use strings, etc.
vrepl.go then intersects unique keys from source table and unique keys from target table and determines which keys are shared. It's possible that a UK on the source table and a UK on the target table have different names. That's fine, as long as they cover same columns in same order.
Based on ranking, and based on shared keys, vrepl.go determines the chosen UK on source table, and chosen UK on target table. This is what gets to aforementioned proto/rule.

Streamer

rowstreamer receives the streaming query. The Unique Key name for source table is encoded as a comment directive /*vt+ ukName="name_of_the_unqiue_key" */
If the unique key is not specificed, or is PRIMARY, we fallback to standard behavior.
Otherwise, rowstreamer investigates the unique key by querying information_schema, and gets the orderd list of columns in that unique key.
rowstreamer from there on assumes those columns to be the PK columns. It still calls them pkColumns (I wanted to minimize code changes)
From there on, everything happens implicitly: SendQuery is calculated by pkColumns which means rowstreamer reads the table by UK order, not PK order, and also sends the UK columns as Fields, and also calculates lastPK by UK columns (again, the name remains lastPK but reall ythis could be any UK).

VReplicator / table plan builder

Detects SourceUniqueKey in Rule, generates a comment directive /*vt+ ukName="name_of_the_unqiue_key" */ which rowstreamer will later consume
Detects TargetUniqueKey in Rule
if TargetUniqueKey is trivial (ie PRIMARY) or unspecified, then nothing happens and we fallback to standard behavior (tables ar ealready analyzed with PK info)
otherwise, for each stream/rule:
- table plan builder uses vreplicator's recalculatePKColsInfo function to:
  - read list of columns in specified unique key
  - re-evaluate colsInfo: make a copy of colsInfo, reorder columns by UK order, reassign IsPK
- using the modified colsInfo, run analyzePK to evaluate pkCols. They're still called pkCols, but they may now identify the columns of any UK.
- pkCols are then used, as usual and without change, to generate queries.

Notes

I believe I have modified the minimal set of code required for supporting non-PK columns
The use of comment directives is temporary and is my easiest way to communicate hints from vreplication to streamer. We can change that later.
I've added some tests from gh-ost test suite, which pass. I've added tests to vrepl_stress endtoend test, which pass. But I'm still unsure if I've generate da strong enough test case.
All thoughts welcome. I certainly need this logic reviewed.

cc @rohit-nayak-ps @sougou @deepthi

…h opposite ordering Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

…_stress_suite test Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

…nges under stress Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach · 2021-07-06T09:56:33Z

Question: as mentioned above:

vreplication's filter/rule has 2 new fields:
source_unique_key (mapped to SourceUniqueKey variable)
target_unique_key (mapped to TargetUniqueKey variable)

when source_unique_key is non empty and non-PRIMARY, that means rowstreamer needs to read information_schema.columns to see what columns are covered by the unique key and in what order.
similarly, when target_unique_key is non empty and non-PRIMARY, that means vreplicator needs to read information_schema.columns to see what columns are covered by the unique key and in what order.

The module that populates these two values is onlineddl/vrepl, which already knows the column identities and order for both keys. I wonder if instead of source_unique_key (string) and target_unique_key (string) we should instead have source_unique_key_columns (string array) and target_unique_key_columns (string array).

This way the one who own the decision to use a specific unique key can already dictate the columns; the vreplication components don't need to read information_schema. Downside is that vreplication will need to trust that the given columns are, indeed, covered by a unique key. Also, it loses context to the name/identity of the key it actually uses.

Extra information is that I believe I will want to anyway add the names of the columns in the source key (possibly renamed if there's a ALTER TABLE CHANGE col new_name) because reasons, something I'm thinking about (will do writeup elsewhere).

The more I think of it, the more I believe we should just write the names of columns rather than name of index.

Thoughts?

EDIT: this change is implemented in #8420

…target_unique_key_columns, values populated by vrepl and no need for information_schema.columns queries on source and target Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

…e strings instead of string arrays. The value of source_unique_key_columns is comma delimited (and escaped) list of column names; similarly for target_unique_key_columns Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

…vrpel-suite-pk-uk Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach · 2021-07-12T04:06:26Z

Merged #8420 into this PR. From #8420's main comment:

Description

A take/twist on #8364, per comment

Mostly exactly the same as #8364 , except:

Rule's proto has source_unique_key_columns instead of source_unique_key, and target_unique_key_columns instead of target_unique_key.
rowstreamer doesn't need to inspect information_schema.columns, since it relies on source_unique_key_columns
vreplicator likewise doesn't need to inspect information_schema.columns, since it relies on target_unique_key_columns

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

rohit-nayak-ps

lgtm.

Very nice. I like the fact there are minimal changes to the core workflow due to this!

rohit-nayak-ps · 2021-07-12T14:48:55Z

Just observing here that this feature is only implemented for Online DDL, though internally vreplication supports it via this PR. If we want to extend this to vreplication workflows (movetables/reshard/...) we will need to generate the filter hints that Online DDL generates while creating those workflows.

shlomi-noach · 2021-07-12T15:01:11Z

Just observing here that this feature is only implemented for Online DDL, though internally vreplication supports it via this PR.

Correct. There has to be some logic above VReplication to set those fields.

shlomi-noach added 20 commits June 20, 2021 12:34

refactor: pass colInfos instead of colInfoMap. Add comments as needed

671c8d6

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Adding UniqueKey to Rule

686a3de

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

adapt to replication_plan_builder refactor

3e2c740

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

convenience method: return comment string, unquote if needed

b0ade0a

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Table.FindColumn

2877b55

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

rowstreamer: read unique_key hint from comment dircetives, evaluate '…

0a57edd

…PK' columns from unique key Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

unique key analysis query

c6814c4

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

when rule indicates non-PRIMARY UniqueKey, override PK ColInfo with u…

77e593c

…nique key specific info Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

generated code

61362c1

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

onlineddl/vrepl: read source/target unique keys, evaluate best sharee…

9cbfa61

…d unique key Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

proto: souce_unique_key and target_unique_key

dfcd76a

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

proto: SourceUniqueKey and TargetUniqueKey

0490612

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

BaseShowTableUniqueKey query

0fb881c

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

source unique key and target unique key split

a04a5b4

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

sourceUniqueKeys and targetUniqueKeys don't need to be fields in VRepl

9560db7

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

adding unit test

4ceae54

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

more unit test cases

6ee5d48

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

enforce use of mapping if exists

84947cb

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

vreplicator: refactor logic, add unit test

a7339d1

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

code comment

3981575

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach added Component: VReplication Type: CI/Build Type: Enhancement Logical improvement (somewhere between a bug and feature) labels Jun 22, 2021

shlomi-noach added 6 commits June 22, 2021 12:58

vrepl_mini_stress_test: double the tests, use a non PK unique key wit…

f62c1dc

…h opposite ordering Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Merge branch 'main' into online-ddl-vrpel-suite-pk-uk

dde35e0

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Merge branch 'main' into online-ddl-vrpel-suite-pk-uk

53184b7

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Adding test files

b5f0aa4

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

restore vrepl_stress test to original (main). Will create a new vrepl…

56ed681

…_stress_suite test Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

new endtoend test: vrepl_stress_suite, testing a myriad of schema cha…

f445dee

…nges under stress Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach requested review from GuptaManan100, harshit-gangal and systay as code owners July 6, 2021 09:16

go 1.16

6f13166

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach marked this pull request as draft July 6, 2021 09:50

source_unique_key => source_unique_key_columns, target_unique_key => …

01c0abf

…target_unique_key_columns, values populated by vrepl and no need for information_schema.columns queries on source and target Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach mentioned this pull request Jul 6, 2021

VReplication support for non-PRIMARY KEY (Online DDL context): alternate PR #8420

Closed

2 tasks

shlomi-noach added 4 commits July 7, 2021 09:11

reject unique keys leading with float/double data type column

60b7c15

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

merge main, resolve conflict

3556c02

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Merge branch 'onlineddl-vrepl-source-target-columns' into online-ddl-…

3414ee0

…vrpel-suite-pk-uk Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

protoc

3849770

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach marked this pull request as ready for review July 12, 2021 04:55

shlomi-noach requested review from ajm188 and doeg as code owners July 12, 2021 04:55

remove redundant shards var

b312d9a

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

rohit-nayak-ps approved these changes Jul 12, 2021

View reviewed changes

shlomi-noach merged commit 4f89932 into vitessio:main Jul 12, 2021

shlomi-noach deleted the online-ddl-vrpel-suite-pk-uk branch July 12, 2021 15:02

shlomi-noach mentioned this pull request Jul 12, 2021

VReplication: support different keys on source and target tables (different covered columns) #8423

Merged

2 tasks

frouioui added the release notes label Sep 28, 2021

shlomi-noach mentioned this pull request May 8, 2022

VReplication: use PK equivalent (PKE) if no PK found #10192

Merged

3 tasks

mattlord mentioned this pull request May 10, 2022

Unify / De-Duplicate Key Selection and Management between OnlineDDL and other VReplication workflows #10259

Open

shlomi-noach mentioned this pull request Apr 24, 2023

VReplication Workflows: support noblob binlog_row_image format for MoveTables and Reshard #12905

Merged

4 tasks

shlomi-noach mentioned this pull request Dec 19, 2023

VDiff: Support diffing tables without a defined Primary Key #14794

Merged

5 tasks

shlomi-noach mentioned this pull request Aug 14, 2024

Online DDL: do not CONVERT column charset when migration does not modify column's charset or collation #16594

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VReplication support for non-PRIMARY KEY (Online DDL context)#8364

VReplication support for non-PRIMARY KEY (Online DDL context)#8364
shlomi-noach merged 58 commits intovitessio:mainfrom
planetscale:online-ddl-vrpel-suite-pk-uk

shlomi-noach commented Jun 22, 2021 •

edited

Loading

Uh oh!

shlomi-noach commented Jun 22, 2021

Uh oh!

shlomi-noach commented Jul 6, 2021 •

edited

Loading

Uh oh!

shlomi-noach commented Jul 12, 2021

Uh oh!

rohit-nayak-ps left a comment

Uh oh!

rohit-nayak-ps commented Jul 12, 2021

Uh oh!

shlomi-noach commented Jul 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shlomi-noach commented Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue(s)

Checklist

Uh oh!

shlomi-noach commented Jun 22, 2021

proto

Online DDL

Streamer

VReplicator / table plan builder

Notes

Uh oh!

shlomi-noach commented Jul 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shlomi-noach commented Jul 12, 2021

Description

Uh oh!

rohit-nayak-ps left a comment

Choose a reason for hiding this comment

Uh oh!

rohit-nayak-ps commented Jul 12, 2021

Uh oh!

shlomi-noach commented Jul 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shlomi-noach commented Jun 22, 2021 •

edited

Loading

shlomi-noach commented Jul 6, 2021 •

edited

Loading