rowenc: Christmas cleanup#74357
Conversation
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 23 of 23 files at r1, 15 of 15 files at r2, 2 of 2 files at r3, 135 of 135 files at r4, 53 of 53 files at r5, 2 of 2 files at r6, 55 of 55 files at r7, 23 of 23 files at r8, 11 of 11 files at r9, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @HonoreDB, @jordanlewis, @RaduBerinde, and @yuzefovich)
pkg/sql/rowenc/valueside/encode.go, line 105 at r8 (raw file):
// columns are encoded in a single value, the difference relative to the // previous column ID is encoded for each column (to minimize space usage). type ColumnIDDiff uint32
super-nit: a nice alternate name could be ColumnIDDelta
pkg/sql/rowenc/valueside/encode.go, line 110 at r8 (raw file):
// This is used when we use value encodings not to write KV Values but other // purposes, for example transferring a value over DistSQL (in the row engine). const NoColumnID = ColumnIDDiff(encoding.NoColumnID)
nit: NoColumnIDDiffwould make it more clear it's a ColumnIDDiff, not a ColumnID. Or maybe UnspecifiedColumnIDDiff?
c2e25b9 to
16619c7
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
TFTR and Happy New Year!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @dt, @HonoreDB, @jordanlewis, @mgartner, and @yuzefovich)
pkg/sql/rowenc/valueside/encode.go, line 105 at r8 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
super-nit: a nice alternate name could be
ColumnIDDelta
Done.
pkg/sql/rowenc/valueside/encode.go, line 110 at r8 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit:
NoColumnIDDiffwould make it more clear it's a ColumnIDDiff, not a ColumnID. Or maybeUnspecifiedColumnIDDiff?
Meh, I left it, I think of it as the value for the "no column ID" situation. The type makes it clear what it is.
16619c7 to
d3bf164
Compare
pkg/ccl/partitionccl/partition.go
Outdated
| } | ||
| value, err = rowenc.EncodeTableValue( | ||
| value, descpb.ColumnID(encoding.NoColumnID), datum, scratch, | ||
| value, descpb.NoColumnID, datum, scratch, |
There was a problem hiding this comment.
I've been thinking that these ids are getting annoying from a dependency hierarchy perspective. I hate the relationship to tree too. I don't know exactly where to slot it, but I want the ID types to be extremely low in the import tree and then, maybe, get forwarded up to catalog or something. What I have in mind is a catid package which can be imported by tree, descpb, etc. I'm running into this now because I want to move the new schema changer state (scpb) below descpb so it can be imported and am being thwarted by these IDs. I'll make that change after this goes in and would love to move encoding.NoColumnID to catid.NoColumnID or perhaps even catid.NoColumn. Thoughts?
pkg/sql/inverted/expression.go
Outdated
| // If the inverted column stores an encoded datum, the encoding is | ||
| // DatumEncoding_ASCENDING_KEY, and is performed using | ||
| // EncodeTableKey(nil /* prefix */, val tree.Datum, encoding.Ascending). | ||
| // rowside.Encode(nil /* prefix */, val tree.Datum, encoding.Ascending). |
| // descriptors. | ||
| // | ||
| // See also: docs/tech-notes/encoding.md. | ||
| package keyside |
There was a problem hiding this comment.
why keyside, what's a side? keyenc?
| // SQL values (columns) per roachpb.Value. | ||
| // | ||
| // See also: docs/tech-notes/encoding.md. | ||
| package valueside |
There was a problem hiding this comment.
rowside is a bit clearer now in the context of valueside. I'm cool with the names but would also be cool with keyenc/valenc. I see that valueenc is frustrating with the ee in the middle.
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 10 of 199 files at r10, 35 of 135 files at r13, 9 of 53 files at r14, 11 of 55 files at r16, 18 of 32 files at r17, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @HonoreDB, @jordanlewis, @mgartner, @RaduBerinde, and @yuzefovich)
pkg/sql/rowenc/valueside/encode.go, line 110 at r8 (raw file):
Previously, RaduBerinde wrote…
Meh, I left it, I think of it as the value for the "no column ID" situation. The type makes it clear what it is.
👍
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @dt, @HonoreDB, @jordanlewis, @mgartner, @RaduBerinde, and @yuzefovich)
pkg/ccl/partitionccl/partition.go, line 118 at r11 (raw file):
Previously, ajwerner wrote…
I've been thinking that these ids are getting annoying from a dependency hierarchy perspective. I hate the relationship to
treetoo. I don't know exactly where to slot it, but I want the ID types to be extremely low in the import tree and then, maybe, get forwarded up tocatalogor something. What I have in mind is acatidpackage which can be imported by tree, descpb, etc. I'm running into this now because I want to move the new schema changer state (scpb) belowdescpbso it can be imported and am being thwarted by these IDs. I'll make that change after this goes in and would love to moveencoding.NoColumnIDtocatid.NoColumnIDor perhaps evencatid.NoColumn. Thoughts?
I agree they should be defined in a lower package I'd like the keyside/valueside code to not depend on the catalog or descpb.
Note though that the thing in encoding is not a simple column ID, it can also be a delta between column IDs. I introduce a type for that in valueside in the next-to-last commit in this PR.
This change reduces uses of the entire TableDescriptor. Release note: None
This change adds a `descpb.NoColumnID` constant so we don't have to use `descpb.ColumnID(encoding.NoColumnID)` everywhere. Release note: None
This commit removes the `DatumAlloc.scratch` field which is always nil. Release note: None
This commit moves `DatumAlloc` from `sql/rowenc` to `sql/sem/tree`. This is a fairly low-level construct and is not used exclusively for row encoding/decoding. Release note: None
The `rowenc` package contains a hefty mix of functions operating at different conceptual levels - some use higher level constructs like table and index descriptors, others are lower level utilities. The naming of these functions doesn't help, for example `EncodeTableKey` sounds like a top-level function but is actually a low level utility that appends a single value to a key This commit moves the lower level utilities for encoding datum values into keys to the `rowenc/keyside` package. Release note: None
The code around array decoding was confusing; we now have a single `decodeArray` variant which is the inverse of `encodeArray`. Release note: None
This commit moves the lower level utilities for encoding datum values into values to the `rowenc/valueside` package. Release note: None
When encoding multiple values, we encode the differences between the ColumnIDs. This difference is passed to `valueside.Encode`, but it is passed as a `descpb.ColumnID`. This commit introduces a new type to make this distinction more evident. Release note: None
We have a few copies of the same logic of decoding multiple SQL values from a Value, in particular in range feed handling code. This commit adds a helper type that centralizes this logic, reducing duplication. Note that the query executione engines retain their own specialized implementations. Release note: None
d3bf164 to
5b04801
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @dt, @HonoreDB, @jordanlewis, @mgartner, and @yuzefovich)
pkg/sql/inverted/expression.go, line 30 at r14 (raw file):
Previously, ajwerner wrote…
rowside?
Should be keyside, fixed.
pkg/sql/rowenc/valueside/doc.go, line 28 at r16 (raw file):
Previously, ajwerner wrote…
rowsideis a bit clearer now in the context ofvalueside. I'm cool with the names but would also be cool withkeyenc/valenc. I see thatvalueencis frustrating with theeein the middle.
I think rowenc/keyenc are better package names but not in actual usage, e.g. rowside.Encode() is better than rowenc.Encode() IMO. That said, I don't have a problem switching if someone feels strongly (if not, I'll leave them to avoid reworking the commits).
|
bors r+ |
|
Build succeeded: |
This PR contains assorted cleanups of the row encoding code. This code has evolved organically and it is pretty hard to navigate all the details.
rowenc: reduce reliance on the entire TableDescriptor
This change reduces uses of the entire TableDescriptor.
Release note: None
descpb: add and use NoColumnID constant
This change adds a
descpb.NoColumnIDconstant so we don't have touse
descpb.ColumnID(encoding.NoColumnID)everywhere.Release note: None
rowenc: remove DatumAlloc.scratch
This commit removes the
DatumAlloc.scratchfield which is alwaysnil.
Release note: None
sql: move
DatumAllocto sem/treeThis commit moves
DatumAllocfromsql/rowenctosql/sem/tree.This is a fairly low-level construct and is not used exclusively for
row encoding/decoding.
Release note: None
rowenc: move low-level key encoding functions to subpackage
The
rowencpackage contains a hefty mix of functions operating atdifferent conceptual levels - some use higher level constructs like
table and index descriptors, others are lower level utilities. The
naming of these functions doesn't help, for example
EncodeTableKeysounds like a top-level function but is actually a low level utility
that appends a single value to a key
This commit moves the lower level utilities for encoding datum values
into keys to the
rowenc/keysidepackage.Release note: None
rowenc: minor cleanup of array decoding code
The code around array decoding was confusing; we now have a single
decodeArrayvariant which is the inverse ofencodeArray.Release note: None
rowenc: move low-level value encoding functions to subpackage
This commit moves the lower level utilities for encoding datum values
into values to the
rowenc/valuesidepackage.Release note: None
valueside: introduce ColumnIDDiff type
When encoding multiple values, we encode the differences between the
ColumnIDs. This difference is passed to
valueside.Encode, but it ispassed as a
descpb.ColumnID. This commit introduces a new type tomake this distinction more evident.
Release note: None
valueside: add Decoder helper
We have a few copies of the same logic of decoding multiple SQL values
from a Value, in particular in range feed handling code.
This commit adds a helper type that centralizes this logic, reducing
duplication.
Note that the query executione engines retain their own specialized
implementations.
Release note: None