Skip to content

sstable: add RewriteKeySuffixes function#1384

Merged
dt merged 3 commits intocockroachdb:masterfrom
dt:sst-rewrite-slow
Nov 28, 2021
Merged

sstable: add RewriteKeySuffixes function#1384
dt merged 3 commits intocockroachdb:masterfrom
dt:sst-rewrite-slow

Conversation

@dt
Copy link
Copy Markdown
Contributor

@dt dt commented Nov 23, 2021

Pulling this out of #1377, as it is just the Reader->Writer loop implementation,
along with the associated tests and benchmark, to get that benchmark in place
for any other Reader or Writer performance work to check, and so #1377 can then
be just about the parallel-block version.


This adds a function to the sstable package that can replace some suffix
with another suffix from every key in an sstable, assuming enforcing
that the sstable consists of only Sets.

For now, this is done by opening a new Writer and simply iterating every
key from the original SST using a reader, replacing the suffix and then
passing it to the writer. However a follow-up change may improve this
implementation by leveraging knowledge of the reader and writer internal
details, such as processing data blocks in parallel or copying, rather
than recomputing filters and props.

For now this simple implementation is added as-is, to get tests and an
initial benchmark in place.

name                                                       time/op
RewriteSST/NoCompression/keys=100/ReaderWriterLoop-10        29.8µs ± 2%
RewriteSST/NoCompression/keys=10000/ReaderWriterLoop-10      1.30ms ± 1%
RewriteSST/NoCompression/keys=1000000/ReaderWriterLoop-10     129ms ± 3%
RewriteSST/Snappy/keys=100/ReaderWriterLoop-10               31.7µs ± 2%
RewriteSST/Snappy/keys=10000/ReaderWriterLoop-10             1.40ms ± 1%
RewriteSST/Snappy/keys=1000000/ReaderWriterLoop-10            137ms ± 3%

name                                                       speed
RewriteSST/NoCompression/keys=100/ReaderWriterLoop-10       194MB/s ± 2%
RewriteSST/NoCompression/keys=10000/ReaderWriterLoop-10     368MB/s ± 1%
RewriteSST/NoCompression/keys=1000000/ReaderWriterLoop-10   371MB/s ± 3%
RewriteSST/Snappy/keys=100/ReaderWriterLoop-10             70.3MB/s ± 2%
RewriteSST/Snappy/keys=10000/ReaderWriterLoop-10           87.3MB/s ± 1%
RewriteSST/Snappy/keys=1000000/ReaderWriterLoop-10         88.1MB/s ± 3%

name                                                       alloc/op
RewriteSST/NoCompression/keys=100/ReaderWriterLoop-10         302kB ± 0%
RewriteSST/NoCompression/keys=10000/ReaderWriterLoop-10       551kB ± 0%
RewriteSST/NoCompression/keys=1000000/ReaderWriterLoop-10    25.5MB ± 0%
RewriteSST/Snappy/keys=100/ReaderWriterLoop-10                302kB ± 0%
RewriteSST/Snappy/keys=10000/ReaderWriterLoop-10              551kB ± 0%
RewriteSST/Snappy/keys=1000000/ReaderWriterLoop-10           25.5MB ± 0%

name                                                       allocs/op
RewriteSST/NoCompression/keys=100/ReaderWriterLoop-10          92.0 ± 0%
RewriteSST/NoCompression/keys=10000/ReaderWriterLoop-10         122 ± 0%
RewriteSST/NoCompression/keys=1000000/ReaderWriterLoop-10       168 ± 0%
RewriteSST/Snappy/keys=100/ReaderWriterLoop-10                 92.0 ± 0%
RewriteSST/Snappy/keys=10000/ReaderWriterLoop-10                122 ± 0%
RewriteSST/Snappy/keys=1000000/ReaderWriterLoop-10              168 ± 0%

dt added 2 commits November 23, 2021 20:36
This refactors the data-driven tests slightly. It adds the ability to
specify filters and a testing comparer that has a 4b suffix splitter,
while also adding a new command "get" that can test said filters and a
new "verbose" flag to layout that can be used for debugging.

Also pull the routine for reading options into a helper so that it can
be reused in a later change for reading rewriter options too.
This is just moving reading curKey from the function to its callsites.
@dt dt requested a review from jbowens November 23, 2021 20:51
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@dt dt force-pushed the sst-rewrite-slow branch 2 times, most recently from 1067583 to e171631 Compare November 23, 2021 21:12
Copy link
Copy Markdown
Contributor

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 3 files at r1, 1 of 1 files at r2, 4 of 5 files at r3, all commit messages.
Reviewable status: 6 of 7 files reviewed, 1 unresolved discussion (waiting on @dt and @jbowens)


sstable/suffix_rewriter.go, line 40 at r3 (raw file):

			if i := w.split(k.UserKey); i > len(k.UserKey)-len(from) {
				return nil, errors.New("cannot replace outside split suffix")
			}

This allows for a partial replace of the suffix (in the split sense), right? I think we should require whole suffix replacements to be a little less error prone.

Copy link
Copy Markdown
Contributor Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 7 files reviewed, 1 unresolved discussion (waiting on @jbowens)


sstable/suffix_rewriter.go, line 40 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

This allows for a partial replace of the suffix (in the split sense), right? I think we should require whole suffix replacements to be a little less error prone.

Is it an error though to only partially replace the suffix? that seems... fine? I mean, technically this implementation doesn't care what we replace at all, and this rule is mostly here for consistency with the future optimized version that copies rather than recomputes filters, but there too, it is fine if we only replace some suffix of the suffix right, since either way the prefix that is in the filter is unchanged?

This adds a function to the sstable package that can replace some suffix
with another suffix from every key in an sstable, assuming enforcing
that the sstable consists of only Sets.

For now, this is done by opening a new Writer and simply iterating every
key from the original SST using a reader, replacing the suffix and then
passing it to the writer. However a follow-up change may improve this
implementation by leveraging knowledge of the reader and writer internal
details, such as processing data blocks in parallel or copying, rather
than recomputing filters and props.

For now this simple implementation is added as-is, to get tests and an
initial benchmark in place.

```
name                                                       time/op
RewriteSST/NoCompression/keys=100/ReaderWriterLoop-10        29.8µs ± 2%
RewriteSST/NoCompression/keys=10000/ReaderWriterLoop-10      1.30ms ± 1%
RewriteSST/NoCompression/keys=1000000/ReaderWriterLoop-10     129ms ± 3%
RewriteSST/Snappy/keys=100/ReaderWriterLoop-10               31.7µs ± 2%
RewriteSST/Snappy/keys=10000/ReaderWriterLoop-10             1.40ms ± 1%
RewriteSST/Snappy/keys=1000000/ReaderWriterLoop-10            137ms ± 3%

name                                                       speed
RewriteSST/NoCompression/keys=100/ReaderWriterLoop-10       194MB/s ± 2%
RewriteSST/NoCompression/keys=10000/ReaderWriterLoop-10     368MB/s ± 1%
RewriteSST/NoCompression/keys=1000000/ReaderWriterLoop-10   371MB/s ± 3%
RewriteSST/Snappy/keys=100/ReaderWriterLoop-10             70.3MB/s ± 2%
RewriteSST/Snappy/keys=10000/ReaderWriterLoop-10           87.3MB/s ± 1%
RewriteSST/Snappy/keys=1000000/ReaderWriterLoop-10         88.1MB/s ± 3%

name                                                       alloc/op
RewriteSST/NoCompression/keys=100/ReaderWriterLoop-10         302kB ± 0%
RewriteSST/NoCompression/keys=10000/ReaderWriterLoop-10       551kB ± 0%
RewriteSST/NoCompression/keys=1000000/ReaderWriterLoop-10    25.5MB ± 0%
RewriteSST/Snappy/keys=100/ReaderWriterLoop-10                302kB ± 0%
RewriteSST/Snappy/keys=10000/ReaderWriterLoop-10              551kB ± 0%
RewriteSST/Snappy/keys=1000000/ReaderWriterLoop-10           25.5MB ± 0%

name                                                       allocs/op
RewriteSST/NoCompression/keys=100/ReaderWriterLoop-10          92.0 ± 0%
RewriteSST/NoCompression/keys=10000/ReaderWriterLoop-10         122 ± 0%
RewriteSST/NoCompression/keys=1000000/ReaderWriterLoop-10       168 ± 0%
RewriteSST/Snappy/keys=100/ReaderWriterLoop-10                 92.0 ± 0%
RewriteSST/Snappy/keys=10000/ReaderWriterLoop-10                122 ± 0%
RewriteSST/Snappy/keys=1000000/ReaderWriterLoop-10              168 ± 0%

```
@dt dt force-pushed the sst-rewrite-slow branch from e171631 to 8309953 Compare November 25, 2021 14:32
Copy link
Copy Markdown
Contributor Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 7 files reviewed, 1 unresolved discussion (waiting on @jbowens)


sstable/suffix_rewriter.go, line 40 at r3 (raw file):

Previously, dt (David Taylor) wrote…

Is it an error though to only partially replace the suffix? that seems... fine? I mean, technically this implementation doesn't care what we replace at all, and this rule is mostly here for consistency with the future optimized version that copies rather than recomputes filters, but there too, it is fine if we only replace some suffix of the suffix right, since either way the prefix that is in the filter is unchanged?

Eh, I guess it is always easier to be more restrictive for now and change our mind later. Switched to == len check and added too-short test case too now.

Copy link
Copy Markdown
Contributor

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 7 files reviewed, 1 unresolved discussion (waiting on @dt and @jbowens)


sstable/suffix_rewriter.go, line 40 at r3 (raw file):

Previously, dt (David Taylor) wrote…

Eh, I guess it is always easier to be more restrictive for now and change our mind later. Switched to == len check and added too-short test case too now.

👍 I think it's a clearer interface to have only one concept of suffix, the one defined by Split

@dt
Copy link
Copy Markdown
Contributor Author

dt commented Nov 28, 2021

TFTR!

@dt dt merged commit badc275 into cockroachdb:master Nov 28, 2021
@dt dt deleted the sst-rewrite-slow branch November 28, 2021 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants