Skip to content

ApplySchema: do not ReloadSchema on ExecuteFetchAsDba#10739

Merged
shlomi-noach merged 9 commits intovitessio:mainfrom
planetscale:apply-schema-no-reload-schema-on-execute
Jul 20, 2022
Merged

ApplySchema: do not ReloadSchema on ExecuteFetchAsDba#10739
shlomi-noach merged 9 commits intovitessio:mainfrom
planetscale:apply-schema-no-reload-schema-on-execute

Conversation

@shlomi-noach
Copy link
Copy Markdown
Contributor

Description

Followup to #10719
There is yet another location where ReloadSchema is called: from within the VTTablet, and this is controlled as a parameter in ExecuteFetchAsDba. We now strictly do not call ReloadSchema anywhere until all SQLs have been applied.

This specific PR applies to direct DDLs, not to Online DDLs.

Related Issue(s)

#10719

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Cluster management Component: Query Serving labels Jul 19, 2022
@shlomi-noach shlomi-noach requested a review from dbussink July 19, 2022 09:22
@vitess-bot
Copy link
Copy Markdown
Contributor

vitess-bot Bot commented Jul 19, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a new flag is being introduced, review whether it is really needed. The flag names should be clear and intuitive (as far as possible), and the flag's help should be descriptive. Additionally, flag names should use dashes (-) as word separators rather than underscores (_).
  • If a workflow is added or modified, each items in Jobs should be named in order to mark it as required. If the workflow should be required, the GitHub Admin should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should either include a link to an issue that describes the bug OR an actual description of the bug and how to reproduce, along with a description of the fix.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.

@vitess-bot
Copy link
Copy Markdown
Contributor

vitess-bot Bot commented Jul 19, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a new flag is being introduced, review whether it is really needed. The flag names should be clear and intuitive (as far as possible), and the flag's help should be descriptive. Additionally, flag names should use dashes (-) as word separators rather than underscores (_).
  • If a workflow is added or modified, each items in Jobs should be named in order to mark it as required. If the workflow should be required, the GitHub Admin should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should either include a link to an issue that describes the bug OR an actual description of the bug and how to reproduce, along with a description of the fix.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.

@shlomi-noach
Copy link
Copy Markdown
Contributor Author

vreplication_cellalias failure:

2022-07-19T10:41:26.7038578Z I0719 10:40:18.698782   23257 cluster_test.go:541] InitializeShard and make 100 primary
2022-07-19T10:41:26.7039618Z I0719 10:40:18.698842   23257 vtctlclient_process.go:209] Executing vtctlclient with command: vtctlclient --server 127.0.0.1:15999 PlannedReparentShard -- --keyspace_shard product/0 --wait_replicas_timeout 31s --new_primary zone1-100 (attempt 1 of 10)
2022-07-19T10:41:26.7042175Z I0719 10:40:18.779511   23257 cluster_test.go:543] Finished creating shard 0
2022-07-19T10:41:26.7042988Z I0719 10:40:18.779572   23257 vtctlclient_process.go:209] Executing vtctlclient with command: vtctlclient --server 127.0.0.1:15999 ApplySchema -- --sql 
2022-07-19T10:41:26.7043974Z create table product(pid int, description varbinary(128), date1 datetime not null default '0000-00-00 00:00:00', date2 datetime not null default '2021-00-01 00:00:00', primary key(pid)) CHARSET=utf8mb4;
2022-07-19T10:41:26.7045149Z create table customer(cid int, name varchar(128) collate utf8mb4_bin, meta json default null, typ enum('individual','soho','enterprise'), sport set('football','cricket','baseball'),
2022-07-19T10:41:26.7046061Z 	ts timestamp not null default current_timestamp, bits bit(2) default b'11', date1 datetime not null default '0000-00-00 00:00:00', 
2022-07-19T10:41:26.7046860Z 	date2 datetime not null default '2021-00-01 00:00:00', primary key(cid,typ)) CHARSET=utf8mb4;
2022-07-19T10:41:26.7047535Z create table customer_seq(id int, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence';
2022-07-19T10:41:26.7048114Z create table merchant(mname varchar(128), category varchar(128), primary key(mname)) CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
2022-07-19T10:41:26.7048809Z create table orders(oid int, cid int, pid int, mname varchar(128), price int, qty int, total int as (qty * price), total2 int as (qty * price) stored, primary key(oid)) CHARSET=utf8;
2022-07-19T10:41:26.7049577Z create table order_seq(id int, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence';
2022-07-19T10:41:26.7050571Z create table customer2(cid int, name varchar(128), typ enum('individual','soho','enterprise'), sport set('football','cricket','baseball'),ts timestamp not null default current_timestamp, primary key(cid)) CHARSET=utf8;
2022-07-19T10:41:26.7051417Z create table customer_seq2(id int, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence';
2022-07-19T10:41:26.7052331Z create table `Lead`(`Lead-id` binary(16), name varbinary(16), date1 datetime not null default '0000-00-00 00:00:00', date2 datetime not null default '2021-00-01 00:00:00', primary key (`Lead-id`));
2022-07-19T10:41:26.7053321Z create table `Lead-1`(`Lead` binary(16), name varbinary(16), date1 datetime not null default '0000-00-00 00:00:00', date2 datetime not null default '2021-00-01 00:00:00', primary key (`Lead`));
2022-07-19T10:41:26.7053953Z create table _vt_PURGE_4f9194b43b2011eb8a0104ed332e05c2_20221210194431(id int, val varbinary(128), primary key(id));
2022-07-19T10:41:26.7055263Z create table db_order_test (c_uuid varchar(64) not null default '', created_at datetime not null, dstuff varchar(128), dtstuff text, dbstuff blob, cstuff char(32), primary key (c_uuid,created_at)) CHARSET=utf8mb4;
2022-07-19T10:41:26.7056255Z create table datze (id int, dt1 datetime not null default current_timestamp, dt2 datetime not null, ts1 timestamp default current_timestamp, primary key (id));
2022-07-19T10:41:26.7057053Z  --ddl_strategy direct -allow-zero-in-date --skip_preflight product (attempt 1 of 10)
2022-07-19T10:41:26.7057883Z I0719 10:40:18.892792   23257 vtctlclient_process.go:209] Executing vtctlclient with command: vtctlclient --server 127.0.0.1:15999 ApplyVSchema -- --vschema 
2022-07-19T10:41:26.7058285Z {
2022-07-19T10:41:26.7058500Z   "tables": {
2022-07-19T10:41:26.7058751Z 	"product": {},
2022-07-19T10:41:26.7059005Z 	"merchant": {},
2022-07-19T10:41:26.7059244Z 	"orders": {},
2022-07-19T10:41:26.7059491Z 	"customer": {},
2022-07-19T10:41:26.7059749Z 	"customer_seq": {
2022-07-19T10:41:26.7060006Z 		"type": "sequence"
2022-07-19T10:41:26.7060245Z 	},
2022-07-19T10:41:26.7060480Z 	"customer2": {},
2022-07-19T10:41:26.7060725Z 	"customer_seq2": {
2022-07-19T10:41:26.7060998Z 		"type": "sequence"
2022-07-19T10:41:26.7061226Z 	},
2022-07-19T10:41:26.7061483Z 	"order_seq": {
2022-07-19T10:41:26.7061729Z 		"type": "sequence"
2022-07-19T10:41:26.7061954Z 	},
2022-07-19T10:41:26.7062174Z 	"Lead": {},
2022-07-19T10:41:26.7062451Z 	"Lead-1": {},
2022-07-19T10:41:26.7062713Z 	"db_order_test": {},
2022-07-19T10:41:26.7062962Z 	"datze": {}
2022-07-19T10:41:26.7063164Z   }
2022-07-19T10:41:26.7063365Z }
2022-07-19T10:41:26.7063618Z  product (attempt 1 of 10)
2022-07-19T10:41:26.7063955Z I0719 10:40:18.908310   23257 cluster_test.go:411] Starting vtgate
2022-07-19T10:41:26.7064540Z I0719 10:40:18.948033   23257 mysqld.go:588] execCmd: /usr/sbin/mysqld /usr/sbin/mysqld [--version]
2022-07-19T10:41:26.7065513Z I0719 10:40:18.956999   23257 mysqld.go:602] execCmd: /usr/sbin/mysqld output: /usr/sbin/mysqld  Ver 5.7.38-0ubuntu0.18.04.1 for Linux on x86_64 ((Ubuntu))
2022-07-19T10:41:26.7068151Z I0719 10:40:18.957125   23257 vtgate_process.go:132] Running vtgate with command: vtgate --topo_implementation etcd2 --topo_global_server_address 127.0.0.1:2379 --topo_global_root /vitess/global --log_dir /tmp/vt_2534044224/vreple2e_828626/tmp --log_queries_to_file /tmp/vt_2534044224/vreple2e_828626/tmp/vtgate_querylog.txt --port 15001 --grpc_port 15991 --mysql_server_port 15306 --mysql_server_socket_path /tmp/vt_2534044224/vreple2e_828626/tmp/mysql.sock --cell zone1 --cells_to_watch zone1,zone2 --tablet_types_to_wait primary --service_map grpc-tabletmanager,grpc-throttler,grpc-queryservice,grpc-updatestream,grpc-vtctl,grpc-vtgateservice --mysql_auth_server_impl none --mysql_server_version 5.7.38-vitess --tablet_refresh_interval 10ms
2022-07-19T10:41:26.7069743Z I0719 10:40:19.563651   23257 cluster_test.go:411] Starting vtgate
2022-07-19T10:41:26.7070316Z I0719 10:40:19.611087   23257 mysqld.go:588] execCmd: /usr/sbin/mysqld /usr/sbin/mysqld [--version]
2022-07-19T10:41:26.7071065Z I0719 10:40:19.620235   23257 mysqld.go:602] execCmd: /usr/sbin/mysqld output: /usr/sbin/mysqld  Ver 5.7.38-0ubuntu0.18.04.1 for Linux on x86_64 ((Ubuntu))
2022-07-19T10:41:26.7073661Z I0719 10:40:19.620355   23257 vtgate_process.go:132] Running vtgate with command: vtgate --topo_implementation etcd2 --topo_global_server_address 127.0.0.1:2379 --topo_global_root /vitess/global --log_dir /tmp/vt_2534044224/vreple2e_828626/tmp --log_queries_to_file /tmp/vt_2534044224/vreple2e_828626/tmp/vtgate_querylog.txt --port 15001 --grpc_port 15991 --mysql_server_port 15306 --mysql_server_socket_path /tmp/vt_2534044224/vreple2e_828626/tmp/mysql.sock --cell zone2 --cells_to_watch zone1,zone2 --tablet_types_to_wait primary --service_map grpc-tabletmanager,grpc-throttler,grpc-queryservice,grpc-updatestream,grpc-vtctl,grpc-vtgateservice --mysql_auth_server_impl none --mysql_server_version 5.7.38-vitess --tablet_refresh_interval 10ms
2022-07-19T10:41:26.7075572Z I0719 10:40:19.621877   23257 vtctlclient_process.go:209] Executing vtctlclient with command: vtctlclient --server 127.0.0.1:15999 RebuildKeyspaceGraph product (attempt 1 of 10)
2022-07-19T10:41:26.7076533Z I0719 10:40:19.650155   23257 vtctlclient_process.go:209] Executing vtctlclient with command: vtctlclient --server 127.0.0.1:15999 AddCellsAlias -- --cells zone2 alias (attempt 1 of 10)
2022-07-19T10:41:26.7077353Z I0719 10:40:19.693908   23257 vtgate_process.go:204] Waiting for healthy status of 1 product.0.primary tablets in cell zone1
2022-07-19T10:41:26.7077905Z I0719 10:40:19.696049   23257 vtgate_process.go:204] Waiting for healthy status of 2 product.0.replica tablets in cell zone1
2022-07-19T10:41:26.7078411Z === RUN   TestCellAliasVreplicationWorkflow/insertInitialData
2022-07-19T10:41:26.7078869Z I0719 10:40:19.702016   23257 vreplication_test.go:425] Inserting initial data
2022-07-19T10:41:26.7079302Z I0719 10:40:21.149313   23257 vreplication_test.go:431] Done inserting initial data
2022-07-19T10:41:26.7079727Z === RUN   TestCellAliasVreplicationWorkflow/VStreamFrom
2022-07-19T10:41:26.7080160Z     vreplication_test.go:376: 
2022-07-19T10:41:26.7080651Z         	Error Trace:	vreplication_test.go:376
2022-07-19T10:41:26.7081174Z         	            				asm_amd64.s:1571
2022-07-19T10:41:26.7081615Z         	Error:      	Received unexpected error:
2022-07-19T10:41:26.7083038Z         	            	unexpected: query ended without no results and no error (errno 1105) (sqlstate HY000) during query: vstream * from product
2022-07-19T10:41:26.7083729Z         	Test:       	TestCellAliasVreplicationWorkflow/VStreamFrom
2022-07-19T10:41:26.7084251Z     vreplication_test.go:419: nothing streamed within timeout

seems like it might be related to changes in this PR, because the failure comes shortly after ApplySchema?

@shlomi-noach
Copy link
Copy Markdown
Contributor Author

Cluster (12) consistent failure.

2022-07-19T10:36:39.1386134Z === RUN   TestSharded
2022-07-19T10:36:39.1563691Z     message_test.go:511: 
2022-07-19T10:36:39.1565147Z         	Error Trace:	message_test.go:511
2022-07-19T10:36:39.1566356Z         	            				message_test.go:368
2022-07-19T10:36:39.1567617Z         	Error:      	Expected nil, but got: Code: INVALID_ARGUMENT
2022-07-19T10:36:39.1570579Z         	            	rpc error: code = InvalidArgument desc = target: user.-80.primary: vttablet: rpc error: code = InvalidArgument desc = table sharded_message not found in schema (CallerID: unsecure_grpc_client)
2022-07-19T10:36:39.1573523Z         	            	target: user.80-.primary: vttablet: rpc error: code = InvalidArgument desc = table sharded_message not found in schema (CallerID: unsecure_grpc_client)
2022-07-19T10:36:39.1575588Z         	Test:       	TestSharded
2022-07-19T10:36:39.1576858Z --- FAIL: TestSharded (0.02s)
2022-07-19T10:36:39.1578162Z === RUN   TestUnsharded
2022-07-19T10:36:52.1909586Z --- PASS: TestUnsharded (13.03s)
2022-07-19T10:36:52.1911359Z === RUN   TestReparenting
2022-07-19T10:36:52.1913145Z I0719 10:36:52.190102   31342 message_test.go:600] Message stream ended: Code: CANCELED
2022-07-19T10:36:52.1915318Z rpc error: code = Canceled desc = grpc: the client connection is closing
2022-07-19T10:36:52.1943090Z     message_test.go:388: 
2022-07-19T10:36:52.1944030Z         	Error Trace:	message_test.go:388
2022-07-19T10:36:52.1945143Z         	Error:      	Expected nil, but got: Code: INVALID_ARGUMENT
2022-07-19T10:36:52.1948260Z         	            	rpc error: code = InvalidArgument desc = target: user.-80.primary: vttablet: rpc error: code = InvalidArgument desc = table sharded_message not found in schema (CallerID: unsecure_grpc_client)
2022-07-19T10:36:52.1951154Z         	            	target: user.80-.primary: vttablet: rpc error: code = InvalidArgument desc = table sharded_message not found in schema (CallerID: unsecure_grpc_client)
2022-07-19T10:36:52.1952601Z         	Test:       	TestReparenting
2022-07-19T10:36:52.1953736Z --- FAIL: TestReparenting (0.00s)
2022-07-19T10:36:52.1954919Z === RUN   TestConnection
2022-07-19T10:36:53.2056838Z     message_test.go:455: 
2022-07-19T10:36:53.2058851Z         	Error Trace:	message_test.go:455
2022-07-19T10:36:53.2060450Z         	Error:      	Expected nil, but got: Code: INVALID_ARGUMENT
2022-07-19T10:36:53.2064123Z         	            	rpc error: code = InvalidArgument desc = target: user.-80.primary: vttablet: rpc error: code = InvalidArgument desc = table sharded_message not found in schema (CallerID: unsecure_grpc_client)
2022-07-19T10:36:53.2068323Z         	            	target: user.80-.primary: vttablet: rpc error: code = InvalidArgument desc = table sharded_message not found in schema (CallerID: unsecure_grpc_client)
2022-07-19T10:36:53.2070517Z         	Test:       	TestConnection
2022-07-19T10:36:53.2072065Z --- FAIL: TestConnection (1.01s)
2022-07-19T10:36:53.2073130Z FAIL
2022-07-19T10:37:05.6304128Z FAIL	vitess.io/vitess/go/test/endtoend/messaging	52.892s
2022-07-19T10:37:05.6330069Z FAIL

seems related.

@shlomi-noach
Copy link
Copy Markdown
Contributor Author

I'm lost with the test failures.

@shlomi-noach
Copy link
Copy Markdown
Contributor Author

vtgate_queries, vtgate_reservedconn keep failing though I dont see how they're related. vstream* tests seem related but I can't make the connection.

@shlomi-noach
Copy link
Copy Markdown
Contributor Author

OK I get it!

				schematools.ReloadShard(
					reloadCtx,
					exec.ts,
					exec.tmc,
					exec.logger,
					exec.keyspace,
					result.Shard,
					result.Position,
					concurrency,
					false, /* includePrimary */

says false, /* includePrimary */; we really do want to reload the schema on the primary though!

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Copy Markdown
Contributor Author

OK, tests still failing, and they're consistent, and they're persistent, and it has to do with ApplySchema and therefore also have to do with ReloadSchema().

@shlomi-noach
Copy link
Copy Markdown
Contributor Author

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Comment thread go/vt/schemamanager/tablet_executor.go Outdated
}

if syncOperationExecuted {
if true || syncOperationExecuted {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop this true now that we found the real issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. refactoring and also reusing your defer approach to ensure we ReloadSchema even on early return.

dbussink added a commit to planetscale/vitess that referenced this pull request Jul 20, 2022
Doing this is almost always a bug anyway so we should lint for this to
avoid problems.

This would have caught the bug that was plaguing us in
vitessio#10739 as it's easy to overlook
this type of issue. So we'd rather prefer the linter catches it for next
time.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
@shlomi-noach
Copy link
Copy Markdown
Contributor Author

Thanks @dbussink for helping to find the root cause, which was reusing an iteration argument's address.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Copy link
Copy Markdown
Member

@dbussink dbussink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 🎉

GuptaManan100 pushed a commit that referenced this pull request Jul 20, 2022
…nce (#10763)

Doing this is almost always a bug anyway so we should lint for this to
avoid problems.

This would have caught the bug that was plaguing us in
#10739 as it's easy to overlook
this type of issue. So we'd rather prefer the linter catches it for next
time.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
@shlomi-noach shlomi-noach merged commit 94d155d into vitessio:main Jul 20, 2022
@shlomi-noach shlomi-noach deleted the apply-schema-no-reload-schema-on-execute branch July 20, 2022 10:29
rsajwani pushed a commit to planetscale/vitess that referenced this pull request Aug 1, 2022
vitessio#848)

* ApplySchema: do not ReloadSchema on ExecuteFetchAsDba

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* include primary in final ReloadSchema

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* experimental: always reload schema, even in Online DDL

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* RateLimiter: exit goroutine at Stop()

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* defer ticker.Stop()

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* fix shard iteration

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* ensure to ReloadSchema even on early return

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Cluster management Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants