internal/manifest: encode/decode range keys in manifest#1541
internal/manifest: encode/decode range keys in manifest#1541nicktrav merged 1 commit intocockroachdb:masterfrom
Conversation
itsbilal
left a comment
There was a problem hiding this comment.
great work! I like the fact that the format lets us instantiate a
FileMetadata without a Compare.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens, @nicktrav, and @sumeerbhola)
internal/manifest/version_edit.go, line 58 at r1 (raw file):
// Pebble tags. tagNewFileRangeKeys = 300
Nit: maybe we should call this tagNewFile4RangeKeys to be clearer that it extends tagNewFile4?
nicktrav
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)
internal/manifest/version.go, line 248 at r1 (raw file):
) // boundsMarker returns a marker byte whose bits encode as the following:
Note to self: after talking with @sumeerbhola, using two bits seems like a better way to go than half a byte :) It gives us 6 bits that we could potentially use in the future, plus bitmasks are more standard than half byte masks :)
jbowens
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @nicktrav and @sumeerbhola)
internal/manifest/version.go, line 248 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Note to self: after talking with @sumeerbhola, using two bits seems like a better way to go than half a byte :) It gives us 6 bits that we could potentially use in the future, plus bitmasks are more standard than half byte masks :)
Maybe it's over optimizing, but could we use one of those extra bits for the key-pairs count?
I'm thinking one byte with three used bits:
maskContainsPointKeys— set if the file contains range keysmaskSmallest—1if the smallest comes from the point bound,0if it comes from the range bound. Required to be0ifb & maskContainsPointKeys == 0maskLargest—1if the largest comes from the point bound,0if it comes from the range bound. Required to be0ifb & maskContainsPointKeys == 0
nicktrav
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @nicktrav and @sumeerbhola)
internal/manifest/version.go, line 248 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Maybe it's over optimizing, but could we use one of those extra bits for the key-pairs count?
I'm thinking one byte with three used bits:
maskContainsPointKeys— set if the file contains range keysmaskSmallest—1if the smallest comes from the point bound,0if it comes from the range bound. Required to be0ifb & maskContainsPointKeys == 0maskLargest—1if the largest comes from the point bound,0if it comes from the range bound. Required to be0ifb & maskContainsPointKeys == 0
Love it.
0808cb4 to
54f768c
Compare
nicktrav
left a comment
There was a problem hiding this comment.
Reviewable status: 2 of 5 files reviewed, all discussions resolved (waiting on @itsbilal and @sumeerbhola)
internal/manifest/version_edit.go, line 58 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Nit: maybe we should call this
tagNewFile4RangeKeysto be clearer that it extendstagNewFile4?
Done.
|
Going to keep open until we make the 22.1 branch cut. |
23a2bf3 to
47b0226
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 3 of 5 files at r3.
Reviewable status: 3 of 5 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @nicktrav)
internal/manifest/version.go, line 267 at r3 (raw file):
// - if the table contains point keys // - if the table's smallest key is a point key // - if the table's largest key is a point key
The PR description needs to be updated for this revised scheme.
internal/manifest/version_edit.go, line 58 at r3 (raw file):
// Pebble tags. tagNewFile4RangeKeys = 300
I can't remember which tags belong to the same "space", i.e., which ones can't collide. And are all these encoded as unsigned varints?
I am wondering since it isn't clear to me why the value 300 was chosen, instead of 104?
May be a good time to add code comments for these tags.
47b0226 to
e2f4e1c
Compare
nicktrav
left a comment
There was a problem hiding this comment.
Reviewable status: 2 of 5 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
internal/manifest/version.go, line 267 at r3 (raw file):
Previously, sumeerbhola wrote…
The PR description needs to be updated for this revised scheme.
Good catch. Update the PR description. The commit message was already updated.
internal/manifest/version_edit.go, line 58 at r3 (raw file):
And are all these encoded as unsigned varints?
I believe this to be true.
isn't clear to me why the value 300 was chosen, instead of 104?
No particular reason for 300 - we have the 1xx values for RocksDB tags that we inherited, and 2xx for column families. I was a little wary of using something in those ranges for fear of incompatibilities with stores being migrated from Cockroach to Pebble, but maybe that ship has sailed already. That said, I just took another look at the corresponding code in RocksDB, and their latest tag is still 103, so I guess we could use 104, and even revert the name of the tag identifier to tagNewFile5, as it's literally the "fifth new file format" (in the LevelDB -> RocksDB -> Pebble lineage).
I'll make the suggested changes, but happy to revert. @jbowens - any thoughts here?
May be a good time to add code comments for these tags.
Agree. It's also time we documented the encoding. I'll make sure this gets rolled in.
nicktrav
left a comment
There was a problem hiding this comment.
Reviewable status: 2 of 5 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
internal/manifest/version_edit.go, line 58 at r3 (raw file):
Cockroach to Pebble
Woops. RocksDB to Pebble.
nicktrav
left a comment
There was a problem hiding this comment.
Now that the 22.1 branch is cut, this can land.
Reviewable status: 2 of 5 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 1 of 5 files at r3, 1 of 1 files at r4.
Reviewable status: 4 of 5 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
internal/manifest/version_edit.go, line 58 at r3 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Cockroach to Pebble
Woops. RocksDB to Pebble.
I am fine if we want to be conservative wrt the tag numbering, as long as we document our reasoning.
I think a single byte unsigned varint can encode [0, 128).
jbowens
left a comment
There was a problem hiding this comment.
Reviewed 2 of 5 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: 4 of 5 files reviewed, 5 unresolved discussions (waiting on @itsbilal, @nicktrav, and @sumeerbhola)
-- commits, line 18 at r4:
nit: s/tagNewFile4RangeKeys/tagNewFile5/
-- commits, line 36 at r4:
nit: this sentence about 'two additional bytes' is out of date now, right?
internal/manifest/version_edit.go, line 58 at r3 (raw file):
Previously, sumeerbhola wrote…
I am fine if we want to be conservative wrt the tag numbering, as long as we document our reasoning.
I think a single byte unsigned varint can encode [0, 128).
104 sounds good to me
internal/manifest/version_test.go, line 463 at r4 (raw file):
} fmt.Fprintf(&b, "%s\n", m.DebugString(base.DefaultFormatter, true)) fmt.Fprintf(&b, " bounds: (smallest=%s,largest=%s) (0x%02x)\n", smallest, largest, bounds)
nit: could use %08b to format the bounds marker so that it's easier to interpret as a bit field
Currently, only point key bounds for a table are present in a manifest
entry. With the addition of range keys, the manifest needs to track the
point and range key bounds, in addition to providing a mechanism to
infer the overall bounds for the table.
Adopt the following encoding scheme to incorporate range keys into the
manifest:
- if the table only contains point keys, the existing encoding applies -
a new file entry is written with `tagNewFile4` and the smallest and
largest point keys are used as the smallest and largest bounds for the
table.
- else, the table contains range keys, and the new file is written with
a new tag, `tagNewFile5` that extends `tagNewFile4`:
- a "marker" byte is written that indicates if the table also has
point keys, and how the bounds can be used to reconstruct the
overall table bounds. In increasing order of bit significance:
- if the table contains point keys, `1`, else `0`
- if the smallest key is a point key, `1`, else `0`
- if the largest key is a point key, `1`, else `0`
- the smallest/largest pairs are written - first the point key bounds
(if present), then the range key bounds.
- any custom tags are written (e.g. table marked for compaction, etc.).
This new encoding scheme uses no additional space in the manifest if
there are only point keys, and remains backwards compatible with the
existing manifest format. If range keys are present, each new file
requires a single additional byte that encodes whether the table
contains point keys in addition to how to reconstruct the overall bounds
from the point and/or range keys in the table.
This approach also has the advantage of encoding enough information to
be able to fully reconstruct the overall table bounds at decode time,
rather than waiting until a `Compare` function is available to perform
the comparisons.
e2f4e1c to
0cbb980
Compare
nicktrav
left a comment
There was a problem hiding this comment.
Reviewable status: 3 of 5 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
Previously, jbowens (Jackson Owens) wrote…
nit: s/
tagNewFile4RangeKeys/tagNewFile5/
Done.
Previously, jbowens (Jackson Owens) wrote…
nit: this sentence about 'two additional bytes' is out of date now, right?
Done.
internal/manifest/version_edit.go, line 58 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
104sounds good to me
👍
internal/manifest/version_test.go, line 463 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: could use
%08bto format the bounds marker so that it's easier to interpret as a bit field
Good call. Done.
|
TFTRs! |
Currently, only point key bounds for a table are present in a manifest
entry. With the addition of range keys, the manifest needs to track the
point and range key bounds, in addition to providing a mechanism to
infer the overall bounds for the table.
Adopt the following encoding scheme to incorporate range keys into the
manifest:
if the table only contains point keys, the existing encoding applies -
a new file entry is written with
tagNewFile4and the smallest andlargest point keys are used as the smallest and largest bounds for the
table.
else, the table contains range keys, and the new file is written with
a new tag,
tagNewFile4RangeKeysthat extendstagNewFile4:a "marker" byte is written that indicates if the table also has
point keys, and how the bounds can be used to reconstruct the
overall table bounds. In increasing order of bit significance:
1, else01, else01, else0the smallest/largest pairs are written - first the point key bounds
(if present), then the range key bounds.
any custom tags are written (e.g. table marked for compaction, etc.).
This new encoding scheme uses no additional space in the manifest if
there are only point keys, and remains backwards compatible with the
existing manifest format. If range keys are present, each new file
requires two additional bytes (a varint with value
1or2for thesmallest/largest keypair count, plus a byte for the bounds marker) on
top of what it takes to encode the smallest/largest point key bounds.
This approach also has the advantage of encoding enough information to
be able to fully reconstruct the overall table bounds at decode time,
rather than waiting until a
Comparefunction is available to performthe comparisons.