storageccl: add setting to control export file size and paginate export#44482
Conversation
921d52c to
375bd60
Compare
|
Relates to #43356 |
|
@pbardea I plan on attacking both of the listed items above in separate PRs. |
1b37a85 to
82e123a
Compare
|
@pbardea this is RFAL. |
44489: sql: bugfixes around writing the old primary key in pk changes r=rohany a=rohany This PR fixes two bugs: * The logic for when to rewrite the old primary key was broken resulting in the old primary key not being rewritten in many cases. * The old primary key being created was not properly dealing with its dangling interleave information. This PR makes the design decision to not re-interleave the copy of the old primary index if it was interleaved. Release note: None 44551: jobs: always set start time of a job r=spaskob a=spaskob When starting a job via CreateAndStartJob if making the job started fails the job will stil be in system.jobs and can be adopted by another node later but the started time will be 0 in this case. We add a check and set it if necessary. Release note: none. 44553: engine: redefine targetSize and add maxSize to ExportToSst r=itsbilal a=ajwerner This PR is a follow-up of work from #44440 motivated by problems unearthed while typing #44482. The PR comes in two commits: 1) Re-define `targetSize` from being a target below which most requests would remain to being the size above which the export stops. 2) Add a `maxSize` parameter above which the `ExportToSst` call will fail. See the individual commits for more details. Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu> Co-authored-by: Spas Bojanov <spas@cockroachlabs.com> Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
82e123a to
bd2ccc7
Compare
|
@dt with @pbardea headed to vacation I'd love if you can step in on reviewing this one. It's not very complicated but it is currently an ugly frankenstein of arguments and cluster settings. I'm thinking I should make the |
|
Turns out I have no idea what I'm talking about. Carry on. |
pbardea
left a comment
There was a problem hiding this comment.
Reviewed 4 of 4 files at r1, 2 of 3 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @pbardea)
pkg/ccl/storageccl/export_test.go, line 218 at r2 (raw file):
res6 = exportAndSlurp(t, res5.end) expect(t, res6, 100, 100, 100, 100) })
Could we add a couple of tests here that:
- Sets the overage to 1b and expects a failure.
- Sets the target size to 0 and ensures that we still get 1 file from the table with 100 rows.
|
@pbardea did the mix of where the cluster settings get interpreted not bother you? I could decide whether or not I hate it. |
|
That didn't bother me too much when I read through it. I interpreted the targetSize arg to act as a switch to enable a paginated request, but if we're always going to use the cluster setting (and I assume that we will since we're offering a cluster setting to control this knob) than it may make sense to change the args of the export request to take a bool enabling pagination instead. Then the determination of the target size could be handled entirely in the export request. |
pbardea
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/ccl/storageccl/export.go, line 157 at r2 (raw file):
var maxSize uint64 if allowedOverhead := ExportRequestMaxAllowedFileSizeOverage.Get(&cArgs.EvalCtx.ClusterSettings().SV); allowedOverhead > 0 { maxSize = targetSize + uint64(allowedOverhead)
I also just wanted to check that it's fine for the maxSize to be set in the case that the targetSize is not set. I assume that it will just provide an unused overhead to an unlimited file size and will never come into play, but wanted to double check.
bd2ccc7 to
9b1c8ea
Compare
ajwerner
left a comment
There was a problem hiding this comment.
TFTR!
I wanted to keep the target size on the request potentially for later interop with the DistSender after #44341 goes in.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @pbardea)
pkg/ccl/storageccl/export.go, line 157 at r2 (raw file):
Previously, pbardea (Paul Bardea) wrote…
I also just wanted to check that it's fine for the maxSize to be set in the case that the targetSize is not set. I assume that it will just provide an unused overhead to an unlimited file size and will never come into play, but wanted to double check.
Good catch.
pkg/ccl/storageccl/export_test.go, line 218 at r2 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Could we add a couple of tests here that:
- Sets the overage to 1b and expects a failure.
- Sets the target size to 0 and ensures that we still get 1 file from the table with 100 rows.
Done.
|
@pbardea I'd like to bors this soon. If you want to take a pass please do it this afternoon |
|
It's good to go! Sorry for the dealy. |
The TargetFileSize parameter dictates the target size for individual SST files to be exported. If a range contains more data than the target size, multiple files are produced. Release note: None
This commit adopts the API change in cockroachdb#44440 and the previous commit. It adds a hidden cluster setting to control the target size. There's not a ton of testing but there's some. I'm omitting a release note because the setting is hidden. Release note: None.
9b1c8ea to
08edafa
Compare
|
TFTR! bors r=pbardea |
44482: storageccl: add setting to control export file size and paginate export r=pbardea a=ajwerner This commit adopts the API change in #44440. It adds a hidden cluster setting to control the target size. The testing is minimal. This PR comes in two commits: 1) Add a parameter to the ExportSST request to control the target size 2) Add two cluster settings * Control the target size * Control how much over the target size before an error is generated Closes #43356 Release note: None. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Build succeeded |
This commit adopts the API change in #44440. It adds a hidden cluster setting
to control the target size. The testing is minimal.
This PR comes in two commits:
Closes #43356
Release note: None.