Skip to content

roachtest: Add tpcc1k test to identify long running schema change txns.#39096

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
adityamaru27:roachtest-schema-changes
Aug 6, 2019
Merged

roachtest: Add tpcc1k test to identify long running schema change txns.#39096
craig[bot] merged 1 commit intocockroachdb:masterfrom
adityamaru27:roachtest-schema-changes

Conversation

@adityamaru27
Copy link
Copy Markdown
Contributor

@adityamaru27 adityamaru27 commented Jul 25, 2019

This change adds a roachtest which spins up a 5 node, 16 cpu cluster
with a tpcc-1k workload, and runs the schema changes outlined in #37083.
The need for this test arises to have a single source of truth for long
running schema change txns, and the impact of lowering the closed ts
interval.

@adityamaru27 adityamaru27 requested a review from dt July 25, 2019 13:31
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@adityamaru27
Copy link
Copy Markdown
Contributor Author

@dt I am still running the test with tpcc1k and will post the time taken for each schema change once it completes. Just wanted you to have a look if this fulfills what core requires, thanks.

@dt dt requested a review from ajwerner July 25, 2019 13:35
@adityamaru27 adityamaru27 force-pushed the roachtest-schema-changes branch from 3763bc5 to 5f137f5 Compare July 25, 2019 13:39
Copy link
Copy Markdown
Contributor

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru27, @ajwerner, and @dt)


pkg/cmd/roachtest/schemachange.go, line 435 at r1 (raw file):

			})
		},
		MinVersion: "v2.2.0",

I'd go ahead and make this 19.1 -- I think we know some of these take a very long time on 2.2, and I'm not actually sure they all work at all.

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Are there things we want to assert? Maybe that all of the statements finished at least once? Perhaps about how long things take? Can follow up after we collect some data.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @dt)


pkg/cmd/roachtest/schemachange.go, line 435 at r1 (raw file):

Previously, dt (David Taylor) wrote…

I'd go ahead and make this 19.1 -- I think we know some of these take a very long time on 2.2, and I'm not actually sure they all work at all.

👍

@adityamaru27 adityamaru27 force-pushed the roachtest-schema-changes branch from 5f137f5 to 8b14371 Compare July 29, 2019 13:49
@dt
Copy link
Copy Markdown
Contributor

dt commented Jul 29, 2019

@ajwerner runAndLogStmts asserts that the statement returned and didn't return an error, so I think we're good.

@adityamaru27
Copy link
Copy Markdown
Contributor Author

adityamaru27 commented Jul 29, 2019

Following are timings of the schema change statements:

CREATE INDEX ON tpcc.order (o_carrier_id); took 4m36.580333752s
CREATE TABLE tpcc.customerpks (c_w_id INT, c_d_id INT, c_id INT, FOREIGN KEY (c_w_id, c_d_id, c_id) REFERENCES tpcc.customer (c_w_id, c_d_id, c_id)); took 3m22.794456877s
CREATE TABLE tpcc.orderpks (o_w_id, o_d_id, o_id, PRIMARY KEY(o_w_id, o_d_id, o_id)) AS select o_w_id, o_d_id, o_id from tpcc.order; took 5m3.489214029s
ALTER TABLE tpcc.order ADD COLUMN orderdiscount INT DEFAULT 0; took 2h38m56.06200872s
ALTER TABLE tpcc.order ADD CONSTRAINT nodiscount CHECK (orderdiscount = 0); took 4m54.510359222s
ALTER TABLE tpcc.order VALIDATE CONSTRAINT nodiscount; took 17.937219ms
ALTER TABLE tpcc.orderpks ADD CONSTRAINT warehouse_id FOREIGN KEY (o_w_id) REFERENCES tpcc.warehouse (w_id); took 1m56.803291808s
ALTER TABLE tpcc.orderpks VALIDATE CONSTRAINT warehouse_id; took 13.167725ms
ALTER TABLE tpcc.orderpks RENAME TO tpcc.readytodrop; took 103.749034ms
TRUNCATE TABLE tpcc.readytodrop CASCADE; took 2m14.603070924s
DROP TABLE tpcc.readytodrop CASCADE; took 82.851648ms

Importing fixtures + queries took a total of 3h12m5.065694801s.
@ajwerner @dt are these long enough, or do we need to bump it to 2k?

@adityamaru27 adityamaru27 changed the title [WIP] roachtest: Add tpcc1k test to identify long running schema change txns. roachtest: Add tpcc1k test to identify long running schema change txns. Jul 29, 2019
@dt
Copy link
Copy Markdown
Contributor

dt commented Jul 29, 2019

All these look OK to me (they're longer than 45s) except for ALTER TABLE tpcc.order VALIDATE CONSTRAINT nodiscount; took 17.937219ms... but that one is already so short it looks to me like it didn't actually do anything at all so I doubt 2k would change anything. @lucy-zhang did we make VALIDATE CONSTRAINT a no-op if it is already valid? if so, we might need to make a different constraint with NOT VALID to check here.

@thoszhang
Copy link
Copy Markdown

Yeah, VALIDATE CONSTRAINT is a no-op if the constraint is validated.

@dt
Copy link
Copy Markdown
Contributor

dt commented Jul 31, 2019

Thanks! Yeah, let's make a second constraint with NOT VALID and then validate it separately.

@adityamaru27 adityamaru27 force-pushed the roachtest-schema-changes branch from 8b14371 to a5aa7e4 Compare August 1, 2019 19:50
@adityamaru27
Copy link
Copy Markdown
Contributor Author

adityamaru27 commented Aug 1, 2019

Will update the time for the new VALIDATE command once it is complete, but it is definitely >10s.

Time taken for tpcc.district VALIDATE : 7m27s

@adityamaru27 adityamaru27 force-pushed the roachtest-schema-changes branch 2 times, most recently from bed498f to c9de72d Compare August 1, 2019 20:25
This change adds a roachtest which spins up a 5 node, 16 cpu cluster
with a tpcc-1k workload, and runs the schema changes outlined in cockroachdb#37083.
The need for this test arises to have a single source of truth for long
running schema change txns, and the impact of lowering the closed ts
interval.

Release note: None
@adityamaru27 adityamaru27 force-pushed the roachtest-schema-changes branch from c9de72d to 80cd61a Compare August 5, 2019 22:10
@adityamaru27
Copy link
Copy Markdown
Contributor Author

@dt can i merge this? We can collect some data and make tweaks if needed over the next few days.

@dt
Copy link
Copy Markdown
Contributor

dt commented Aug 6, 2019

oh, yeah, sorry, I thought I LGTM'ed it awhile ago.

@adityamaru27
Copy link
Copy Markdown
Contributor Author

TFTR
bors r+

craig bot pushed a commit that referenced this pull request Aug 6, 2019
39096: roachtest: Add tpcc1k test to identify long running schema change txns. r=adityamaru27 a=adityamaru27

This change adds a roachtest which spins up a 5 node, 16 cpu cluster
with a tpcc-1k workload, and runs the schema changes outlined in #37083.
The need for this test arises to have a single source of truth for long
running schema change txns, and the impact of lowering the closed ts
interval.

Co-authored-by: Aditya Maru <adityamaru@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 6, 2019

Build succeeded

@craig craig bot merged commit 80cd61a into cockroachdb:master Aug 6, 2019
@adityamaru27 adityamaru27 deleted the roachtest-schema-changes branch August 6, 2019 03:14
ajwerner pushed a commit to ajwerner/cockroach that referenced this pull request Aug 23, 2019
This PR dramatically shortens the closed timestamp target interval. With this
setting the experimental_follower_read_timestamp() will now be 3.7 seconds in
the past as opposed to the current 48 seconds in the past. This relatively
aggressive setting of 5s is intended to shake out issues.

This value has been verified to work on the `schemachange/mixed/tpcc` roachtest
introduced in cockroachdb#39096 and for vanilla TPC-C runs around where we have
established a baseline in the past using tpccbench. Specifically several runs
at 2300 warehouses on 3x c5d.4xlarge nodes which is right at the passing
boundary without the change were run only a very small difference in
efficiency or tail latency was observed.

It seems reasonable to attempt to live with this value on master for a while
and see what happens.

Fixes cockroachdb#37083.

Release note: None
craig bot pushed a commit that referenced this pull request Aug 23, 2019
39643: closedts: shorten target_duration from 30s to 5s r=ajwerner a=ajwerner

This PR dramatically shortens the closed timestamp target interval. With this
setting the experimental_follower_read_timestamp() will now be 8.5 seconds in
the past as opposed to the current 48 seconds in the past. This relatively
aggressive setting of 5s is intended to shake out issues as we head into the stability
period. 

This value has been verified to work on the `schemachange/mixed/tpcc` roachtest
introduced in #39096 and for vanilla TPC-C runs around where we have
established a baseline in the past using tpccbench. Specifically several runs
at 2300 warehouses on 3x c5d.4xlarge nodes which is right at the passing
boundary without the change were run only a very small difference in
efficiency or tail latency was observed.

It seems reasonable to attempt to live with this value on master for a while
and see what happens.

Fixes #37083.

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
ajwerner pushed a commit to ajwerner/cockroach that referenced this pull request Dec 13, 2019
This PR dramatically shortens the closed timestamp target interval. With this
setting the experimental_follower_read_timestamp() will now be 4.8 seconds in
the past as opposed to the current 48 seconds in the past. This relatively
aggressive setting of 3s is intended to shake out issues.

This value has been verified to work on the `schemachange/mixed/tpcc` roachtest
introduced in cockroachdb#39096 and for vanilla TPC-C runs around where we have
established a baseline in the past using tpccbench. Specifically several runs
at 2300 warehouses on 3x c5d.4xlarge nodes which is right at the passing
boundary without the change were run only a very small difference in
efficiency or tail latency was observed.

It seems reasonable to attempt to live with this value on master for a while
and see what happens.

Fixes cockroachdb#37083.

Release note (enterprise change): Shorten the default closed timestamp target
interval from 30s to 3s leading to permit follower reads at 4.8 seconds in
the past rather than the previous 48 seconds.
ajwerner pushed a commit to ajwerner/cockroach that referenced this pull request Dec 13, 2019
This PR dramatically shortens the closed timestamp target interval. With this
setting the experimental_follower_read_timestamp() will now be 4.8 seconds in
the past as opposed to the current 48 seconds in the past. This relatively
aggressive setting of 3s is intended to shake out issues.

This value has been verified to work on the `schemachange/mixed/tpcc` roachtest
introduced in cockroachdb#39096 and for vanilla TPC-C runs around where we have
established a baseline in the past using tpccbench. Specifically several runs
at 2300 warehouses on 3x c5d.4xlarge nodes which is right at the passing
boundary without the change were run only a very small difference in
efficiency or tail latency was observed.

It seems reasonable to attempt to live with this value on master for a while
and see what happens.

Fixes cockroachdb#37083.

Release note (enterprise change): Shorten the default closed timestamp target
interval from 30s to 3s leading to permit follower reads at 4.8 seconds in
the past rather than the previous 48 seconds.
ajwerner pushed a commit to ajwerner/cockroach that referenced this pull request Dec 16, 2019
This PR dramatically shortens the closed timestamp target interval. With this
setting the experimental_follower_read_timestamp() will now be 4.8 seconds in
the past as opposed to the current 48 seconds in the past. This relatively
aggressive setting of 3s is intended to shake out issues.

This value has been verified to work on the `schemachange/mixed/tpcc` roachtest
introduced in cockroachdb#39096 and for vanilla TPC-C runs around where we have
established a baseline in the past using tpccbench. Specifically several runs
at 2300 warehouses on 3x c5d.4xlarge nodes which is right at the passing
boundary without the change were run only a very small difference in
efficiency or tail latency was observed.

It seems reasonable to attempt to live with this value on master for a while
and see what happens.

Fixes cockroachdb#37083.

Release note (enterprise change): Shorten the default closed timestamp target
interval from 30s to 3s leading to permit follower reads at 4.8 seconds in
the past rather than the previous 48 seconds.
craig bot pushed a commit that referenced this pull request Dec 17, 2019
43147: closedts: shorten target_duration from 30s to 3s r=ajwerner a=ajwerner

This PR dramatically shortens the closed timestamp target interval. With this
setting the experimental_follower_read_timestamp() will now be 4.8 seconds in
the past as opposed to the current 48 seconds in the past. This relatively
aggressive setting of 3s is intended to shake out issues.

This value has been verified to work on the `schemachange/mixed/tpcc` roachtest
introduced in #39096 and for vanilla TPC-C runs around where we have
established a baseline in the past using tpccbench. Specifically several runs
at 2300 warehouses on 3x c5d.4xlarge nodes which is right at the passing
boundary without the change were run only a very small difference in
efficiency or tail latency was observed.

It seems reasonable to attempt to live with this value on master for a while
and see what happens.

Fixes #37083.

Release note (enterprise change): Shorten the default closed timestamp target
interval from 30s to 3s leading to permit follower reads at 4.8 seconds in
the past rather than the previous 48 seconds.

43226: sql: fix namespace name resolution fallback code r=arulajmani a=arulajmani

Previously, the fallback for name resolution was broken. If an entry
is not present in the new `system.namespace`, we should look it up
in the old `system.namespace`. This was not happening, as we would
only fallback to the old `system.namespace` if the entry was a table.
Thus, databases created before the new `system.namespace` table was
created were not being resolved at all. This PR fixes that.

Fixes #43141

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: arulajmani <arulajmani@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants