sstable: add RewriteKeySuffixes function#1384
Conversation
This refactors the data-driven tests slightly. It adds the ability to specify filters and a testing comparer that has a 4b suffix splitter, while also adding a new command "get" that can test said filters and a new "verbose" flag to layout that can be used for debugging. Also pull the routine for reading options into a helper so that it can be reused in a later change for reading rewriter options too.
This is just moving reading curKey from the function to its callsites.
1067583 to
e171631
Compare
jbowens
left a comment
There was a problem hiding this comment.
Reviewed 1 of 3 files at r1, 1 of 1 files at r2, 4 of 5 files at r3, all commit messages.
Reviewable status: 6 of 7 files reviewed, 1 unresolved discussion (waiting on @dt and @jbowens)
sstable/suffix_rewriter.go, line 40 at r3 (raw file):
if i := w.split(k.UserKey); i > len(k.UserKey)-len(from) { return nil, errors.New("cannot replace outside split suffix") }
This allows for a partial replace of the suffix (in the split sense), right? I think we should require whole suffix replacements to be a little less error prone.
dt
left a comment
There was a problem hiding this comment.
Reviewable status: 6 of 7 files reviewed, 1 unresolved discussion (waiting on @jbowens)
sstable/suffix_rewriter.go, line 40 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
This allows for a partial replace of the suffix (in the
splitsense), right? I think we should require whole suffix replacements to be a little less error prone.
Is it an error though to only partially replace the suffix? that seems... fine? I mean, technically this implementation doesn't care what we replace at all, and this rule is mostly here for consistency with the future optimized version that copies rather than recomputes filters, but there too, it is fine if we only replace some suffix of the suffix right, since either way the prefix that is in the filter is unchanged?
This adds a function to the sstable package that can replace some suffix with another suffix from every key in an sstable, assuming enforcing that the sstable consists of only Sets. For now, this is done by opening a new Writer and simply iterating every key from the original SST using a reader, replacing the suffix and then passing it to the writer. However a follow-up change may improve this implementation by leveraging knowledge of the reader and writer internal details, such as processing data blocks in parallel or copying, rather than recomputing filters and props. For now this simple implementation is added as-is, to get tests and an initial benchmark in place. ``` name time/op RewriteSST/NoCompression/keys=100/ReaderWriterLoop-10 29.8µs ± 2% RewriteSST/NoCompression/keys=10000/ReaderWriterLoop-10 1.30ms ± 1% RewriteSST/NoCompression/keys=1000000/ReaderWriterLoop-10 129ms ± 3% RewriteSST/Snappy/keys=100/ReaderWriterLoop-10 31.7µs ± 2% RewriteSST/Snappy/keys=10000/ReaderWriterLoop-10 1.40ms ± 1% RewriteSST/Snappy/keys=1000000/ReaderWriterLoop-10 137ms ± 3% name speed RewriteSST/NoCompression/keys=100/ReaderWriterLoop-10 194MB/s ± 2% RewriteSST/NoCompression/keys=10000/ReaderWriterLoop-10 368MB/s ± 1% RewriteSST/NoCompression/keys=1000000/ReaderWriterLoop-10 371MB/s ± 3% RewriteSST/Snappy/keys=100/ReaderWriterLoop-10 70.3MB/s ± 2% RewriteSST/Snappy/keys=10000/ReaderWriterLoop-10 87.3MB/s ± 1% RewriteSST/Snappy/keys=1000000/ReaderWriterLoop-10 88.1MB/s ± 3% name alloc/op RewriteSST/NoCompression/keys=100/ReaderWriterLoop-10 302kB ± 0% RewriteSST/NoCompression/keys=10000/ReaderWriterLoop-10 551kB ± 0% RewriteSST/NoCompression/keys=1000000/ReaderWriterLoop-10 25.5MB ± 0% RewriteSST/Snappy/keys=100/ReaderWriterLoop-10 302kB ± 0% RewriteSST/Snappy/keys=10000/ReaderWriterLoop-10 551kB ± 0% RewriteSST/Snappy/keys=1000000/ReaderWriterLoop-10 25.5MB ± 0% name allocs/op RewriteSST/NoCompression/keys=100/ReaderWriterLoop-10 92.0 ± 0% RewriteSST/NoCompression/keys=10000/ReaderWriterLoop-10 122 ± 0% RewriteSST/NoCompression/keys=1000000/ReaderWriterLoop-10 168 ± 0% RewriteSST/Snappy/keys=100/ReaderWriterLoop-10 92.0 ± 0% RewriteSST/Snappy/keys=10000/ReaderWriterLoop-10 122 ± 0% RewriteSST/Snappy/keys=1000000/ReaderWriterLoop-10 168 ± 0% ```
dt
left a comment
There was a problem hiding this comment.
Reviewable status: 4 of 7 files reviewed, 1 unresolved discussion (waiting on @jbowens)
sstable/suffix_rewriter.go, line 40 at r3 (raw file):
Previously, dt (David Taylor) wrote…
Is it an error though to only partially replace the suffix? that seems... fine? I mean, technically this implementation doesn't care what we replace at all, and this rule is mostly here for consistency with the future optimized version that copies rather than recomputes filters, but there too, it is fine if we only replace some suffix of the suffix right, since either way the prefix that is in the filter is unchanged?
Eh, I guess it is always easier to be more restrictive for now and change our mind later. Switched to == len check and added too-short test case too now.
jbowens
left a comment
There was a problem hiding this comment.
Reviewable status: 4 of 7 files reviewed, 1 unresolved discussion (waiting on @dt and @jbowens)
sstable/suffix_rewriter.go, line 40 at r3 (raw file):
Previously, dt (David Taylor) wrote…
Eh, I guess it is always easier to be more restrictive for now and change our mind later. Switched to == len check and added too-short test case too now.
👍 I think it's a clearer interface to have only one concept of suffix, the one defined by Split
|
TFTR! |
Pulling this out of #1377, as it is just the Reader->Writer loop implementation,
along with the associated tests and benchmark, to get that benchmark in place
for any other Reader or Writer performance work to check, and so #1377 can then
be just about the parallel-block version.
This adds a function to the sstable package that can replace some suffix
with another suffix from every key in an sstable, assuming enforcing
that the sstable consists of only Sets.
For now, this is done by opening a new Writer and simply iterating every
key from the original SST using a reader, replacing the suffix and then
passing it to the writer. However a follow-up change may improve this
implementation by leveraging knowledge of the reader and writer internal
details, such as processing data blocks in parallel or copying, rather
than recomputing filters and props.
For now this simple implementation is added as-is, to get tests and an
initial benchmark in place.