Skip to content

storageccl: add setting to control export file size and paginate export#44482

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
ajwerner:ajwerner/paginate-export-request-2
Feb 7, 2020
Merged

storageccl: add setting to control export file size and paginate export#44482
craig[bot] merged 2 commits intocockroachdb:masterfrom
ajwerner:ajwerner/paginate-export-request-2

Conversation

@ajwerner
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner commented Jan 29, 2020

This commit adopts the API change in #44440. It adds a hidden cluster setting
to control the target size. The testing is minimal.

This PR comes in two commits:

  1. Add a parameter to the ExportSST request to control the target size
  2. Add two cluster settings
    • Control the target size
    • Control how much over the target size before an error is generated

Closes #43356

Release note: None.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@ajwerner ajwerner force-pushed the ajwerner/paginate-export-request-2 branch from 921d52c to 375bd60 Compare January 29, 2020 16:00
@ajwerner
Copy link
Copy Markdown
Contributor Author

Relates to #43356

@ajwerner ajwerner requested a review from pbardea January 29, 2020 16:12
@ajwerner ajwerner marked this pull request as ready for review January 29, 2020 16:12
@ajwerner
Copy link
Copy Markdown
Contributor Author

@pbardea I plan on attacking both of the listed items above in separate PRs.

@ajwerner ajwerner force-pushed the ajwerner/paginate-export-request-2 branch 2 times, most recently from 1b37a85 to 82e123a Compare January 29, 2020 20:49
@ajwerner
Copy link
Copy Markdown
Contributor Author

@pbardea this is RFAL.

craig bot pushed a commit that referenced this pull request Jan 30, 2020
44489: sql: bugfixes around writing the old primary key in pk changes r=rohany a=rohany

This PR fixes two bugs:
* The logic for when to rewrite the old primary key was broken
  resulting in the old primary key not being rewritten in many cases.
* The old primary key being created was not properly dealing with
  its dangling interleave information.

This PR makes the design decision to not re-interleave the copy of the old primary index if it was interleaved.

Release note: None

44551: jobs: always set start time of a job r=spaskob a=spaskob

When starting a job via CreateAndStartJob if
making the job started fails the job will stil be
in system.jobs and can be adopted by another
node later but the started time will be 0 in this
case. We add a check and set it if necessary.

Release note: none.

44553: engine: redefine targetSize and add maxSize to ExportToSst r=itsbilal a=ajwerner

This PR is a follow-up of work from #44440 motivated by problems unearthed while typing #44482. The PR comes in two commits:

1) Re-define `targetSize` from being a target below which most requests would remain to being the size above which the export stops.
2) Add a `maxSize` parameter above which the `ExportToSst` call will fail.

See the individual commits for more details. 

Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
Co-authored-by: Spas Bojanov <spas@cockroachlabs.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@ajwerner ajwerner force-pushed the ajwerner/paginate-export-request-2 branch from 82e123a to bd2ccc7 Compare February 3, 2020 21:30
@ajwerner
Copy link
Copy Markdown
Contributor Author

ajwerner commented Feb 3, 2020

@dt with @pbardea headed to vacation I'd love if you can step in on reviewing this one. It's not very complicated but it is currently an ugly frankenstein of arguments and cluster settings. I'm thinking I should make the MaxSize like the TargetSize on the request but I'd love to have you tell me that explicitly.

@ajwerner ajwerner requested a review from dt February 3, 2020 23:24
@ajwerner
Copy link
Copy Markdown
Contributor Author

ajwerner commented Feb 3, 2020

Turns out I have no idea what I'm talking about. Carry on.

@ajwerner ajwerner removed the request for review from dt February 3, 2020 23:29
Copy link
Copy Markdown
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1, 2 of 3 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @pbardea)


pkg/ccl/storageccl/export_test.go, line 218 at r2 (raw file):

		res6 = exportAndSlurp(t, res5.end)
		expect(t, res6, 100, 100, 100, 100)
	})

Could we add a couple of tests here that:

  1. Sets the overage to 1b and expects a failure.
  2. Sets the target size to 0 and ensures that we still get 1 file from the table with 100 rows.

@ajwerner
Copy link
Copy Markdown
Contributor Author

ajwerner commented Feb 5, 2020

@pbardea did the mix of where the cluster settings get interpreted not bother you? I could decide whether or not I hate it.

@pbardea
Copy link
Copy Markdown
Contributor

pbardea commented Feb 6, 2020

That didn't bother me too much when I read through it. I interpreted the targetSize arg to act as a switch to enable a paginated request, but if we're always going to use the cluster setting (and I assume that we will since we're offering a cluster setting to control this knob) than it may make sense to change the args of the export request to take a bool enabling pagination instead. Then the determination of the target size could be handled entirely in the export request.

Copy link
Copy Markdown
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/ccl/storageccl/export.go, line 157 at r2 (raw file):

	var maxSize uint64
	if allowedOverhead := ExportRequestMaxAllowedFileSizeOverage.Get(&cArgs.EvalCtx.ClusterSettings().SV); allowedOverhead > 0 {
		maxSize = targetSize + uint64(allowedOverhead)

I also just wanted to check that it's fine for the maxSize to be set in the case that the targetSize is not set. I assume that it will just provide an unused overhead to an unlimited file size and will never come into play, but wanted to double check.

@ajwerner ajwerner force-pushed the ajwerner/paginate-export-request-2 branch from bd2ccc7 to 9b1c8ea Compare February 6, 2020 16:04
Copy link
Copy Markdown
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

I wanted to keep the target size on the request potentially for later interop with the DistSender after #44341 goes in.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pbardea)


pkg/ccl/storageccl/export.go, line 157 at r2 (raw file):

Previously, pbardea (Paul Bardea) wrote…

I also just wanted to check that it's fine for the maxSize to be set in the case that the targetSize is not set. I assume that it will just provide an unused overhead to an unlimited file size and will never come into play, but wanted to double check.

Good catch.


pkg/ccl/storageccl/export_test.go, line 218 at r2 (raw file):

Previously, pbardea (Paul Bardea) wrote…

Could we add a couple of tests here that:

  1. Sets the overage to 1b and expects a failure.
  2. Sets the target size to 0 and ensures that we still get 1 file from the table with 100 rows.

Done.

@ajwerner
Copy link
Copy Markdown
Contributor Author

ajwerner commented Feb 6, 2020

@pbardea I'd like to bors this soon. If you want to take a pass please do it this afternoon

@pbardea
Copy link
Copy Markdown
Contributor

pbardea commented Feb 6, 2020

It's good to go! Sorry for the dealy.

Andrew Werner added 2 commits February 6, 2020 19:09
The TargetFileSize parameter dictates the target size for individual SST files
to be exported. If a range contains more data than the target size, multiple
files are produced.

Release note: None
This commit adopts the API change in cockroachdb#44440 and the previous commit. It adds
a hidden cluster setting to control the target size. There's not a ton of
testing but there's some.

I'm omitting a release note because the setting is hidden.

Release note: None.
@ajwerner ajwerner force-pushed the ajwerner/paginate-export-request-2 branch from 9b1c8ea to 08edafa Compare February 7, 2020 00:15
@ajwerner
Copy link
Copy Markdown
Contributor Author

ajwerner commented Feb 7, 2020

TFTR!

bors r=pbardea

craig bot pushed a commit that referenced this pull request Feb 7, 2020
44482: storageccl: add setting to control export file size and paginate export r=pbardea a=ajwerner

This commit adopts the API change in #44440. It adds a hidden cluster setting
to control the target size. The testing is minimal.

This PR comes in two commits:

1) Add a parameter to the ExportSST request to control the target size
2) Add two cluster settings
   * Control the target size
   * Control how much over the target size before an error is generated

Closes #43356

Release note: None.

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 7, 2020

Build succeeded

@craig craig bot merged commit 08edafa into cockroachdb:master Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

storage: paginate export requests with a size limit

3 participants