Skip to content

roachtest: add INSERT ... SELECT ... table copy test#23449

Merged
nvb merged 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/roachtestCopy
Mar 7, 2018
Merged

roachtest: add INSERT ... SELECT ... table copy test#23449
nvb merged 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/roachtestCopy

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Mar 6, 2018

Closes #15713.

This change adds a test that first imports a fully-populated Bank table. It
then creates a second empty Bank schema. Finally, it performs a series of
paginated INSERT ... SELECT ... statements to copy all data from the first
table into the second table. This stresses both large transactions and the
ability for range splits to keep up with a large amount of data movement.

Release note: None

@nvb nvb requested a review from tbg March 6, 2018 03:46
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg
Copy link
Copy Markdown
Member

tbg commented Mar 6, 2018

:lgtm:


Review status: 0 of 1 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


pkg/cmd/roachtest/copy.go, line 43 at r1 (raw file):

		m := newMonitor(ctx, c, c.All())
		m.Go(func(ctx context.Context) error {
			run := func(stmt string) {

Use the c.Conn that you're creating below instead.


pkg/cmd/roachtest/copy.go, line 100 at r1 (raw file):

			// TODO(nvanbenschoten): should we assert that the number of ranges
			// in the new table is above some threshold?

In light of the recent runaway split test, that would be nice.


pkg/cmd/roachtest/copy.go, line 107 at r1 (raw file):

	}

	rows := 10000000

1E7 is a lot easier for me to make sense of.


pkg/cmd/roachtest/copy.go, line 114 at r1 (raw file):

		func(t *test) {
			if local {
				rows = 1000000

1E6


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member

tbg commented Mar 6, 2018

Reviewed 1 of 1 files at r1.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


Comments from Reviewable

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Mar 6, 2018

Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/cmd/roachtest/copy.go, line 43 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Use the c.Conn that you're creating below instead.

Done.


pkg/cmd/roachtest/copy.go, line 100 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

In light of the recent runaway split test, that would be nice.

Any opinion on the best way to make this assertion? The way I have it now is that it estimates the lower and upper expected bounds for the number of ranges that should exist given the size of the payload and an estimate for the rest of the overhead for a single row. That seems a little fragile though.


pkg/cmd/roachtest/copy.go, line 107 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

1E7 is a lot easier for me to make sense of.

Done.


pkg/cmd/roachtest/copy.go, line 114 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

1E6

Done.


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member

tbg commented Mar 6, 2018

I'm not seeing the edits you Doned, but I assume you'll push them.


Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


pkg/cmd/roachtest/copy.go, line 100 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Any opinion on the best way to make this assertion? The way I have it now is that it estimates the lower and upper expected bounds for the number of ranges that should exist given the size of the payload and an estimate for the rest of the overhead for a single row. That seems a little fragile though.

I guess ideally you would aggregate the MVCCStats for the table and use that to compute the number of expected ranges. I think it's fine to give what you're doing a try first, though. Your estimate doesn't have to be very exact, we just want to make sure the range count isn't completely off the mark.


Comments from Reviewable

nvb added 2 commits March 6, 2018 19:09
Closes cockroachdb#15713.

This change adds a test that first imports a fully-populated Bank table. It
then creates a second empty Bank schema. Finally, it performs a series of
paginated `INSERT ... SELECT ...` statements to copy all data from the first
table into the second table. This stresses both large transactions and the
ability for range splits to keep up with a large amount of data movement.

Release note: None
This change adds a `Progress` method to `roachtest.test` which records
the progress of the current operation. The progress is reset on the next
call to `Status`.

Release note: None
@nvb nvb force-pushed the nvanbenschoten/roachtestCopy branch from 1b28658 to 69b6f40 Compare March 7, 2018 00:09
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Mar 7, 2018

Just pushed the update. PTAL.


Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion.


pkg/cmd/roachtest/copy.go, line 100 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

I guess ideally you would aggregate the MVCCStats for the table and use that to compute the number of expected ranges. I think it's fine to give what you're doing a try first, though. Your estimate doesn't have to be very exact, we just want to make sure the range count isn't completely off the mark.

Yeah, you're right that it doesn't need to be exact. Done.


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member

tbg commented Mar 7, 2018

:lgtm_strong:


Reviewed 1 of 1 files at r2.
Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/cmd/roachtest/copy.go, line 103 at r2 (raw file):

			rc := rangeCount()
			c.l.printf("range count after copy = %d\n", rc)
			highExp := (rows * rowEstimate) / (32 << 20 /* 32MB */)

That range is approximately [40,80] for these parameters, right? Seems reasonable.


Comments from Reviewable

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Mar 7, 2018

Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/cmd/roachtest/copy.go, line 103 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

That range is approximately [40,80] for these parameters, right? Seems reasonable.

Yeah, and the test was consistently ending up with 56 ranges in the new table!


Comments from Reviewable

@nvb nvb merged commit f392321 into cockroachdb:master Mar 7, 2018
@nvb nvb deleted the nvanbenschoten/roachtestCopy branch March 7, 2018 06:01
@jseldess
Copy link
Copy Markdown
Contributor

jseldess commented Mar 7, 2018

@tbg
Copy link
Copy Markdown
Member

tbg commented Mar 7, 2018

@jseldess I don't think we can quite remove it, but it looks wanting an update. I think the current state is

  • kv.raft.command.max_size is still in effect, though @andreimatei mentioned that we would get better at batching deletions (and perhaps inserts too?) so less and less workloads would actually be able to create large individual Raft commands (note that this is not the same as having a large transactions: many large txns will never create large Raft commands, but the reverse is usually true)
  • kv.transaction.max_intents has been removed; transactions that write lots of intents work a lot better now thanks to @spencerkimball's changes
  • still, long-running write-heavy transactions may never finish and it's good to avoid them, for example via pagination and breaking them up into multiple transactions whenever that is at all possible

@nvanbenschoten this test fails if you don't paginate, right? Ideally we just wouldn't ever send that large a batch down to Raft. It seems that the SQL layer could preclude that without too much trouble?

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Mar 8, 2018

transactions that write lots of intents work a lot better now thanks to @spencerkimball's changes

I meant to run these paginated INSERTs both inside and outside of a transaction. As you mentioned, the former should also work now. I'll make that change in a new PR.

this test fails if you don't paginate, right?

If I don't paginate and I don't increase kv.raft.command.max_size, then yes it fails.

Ideally we just wouldn't ever send that large a batch down to Raft. It seems that the SQL layer could preclude that without too much trouble?

As you mentioned, #23373 should get us most of the way here. I'll extend this test further once that change is in to see if it works without pagination.

nvb added a commit to nvb/cockroach that referenced this pull request Mar 8, 2018
Follow up to cockroachdb#23449.

This change adds a variant to the roachtest `copy` test that
runs the paginated copy from one table to another inside of
a transaction. The changes made in cockroachdb#21078 allow this to work
even with millions of rows in the source table.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants