Skip to content

rowenc: Christmas cleanup#74357

Merged
craig[bot] merged 9 commits intocockroachdb:masterfrom
RaduBerinde:rowenc-cleanup
Jan 9, 2022
Merged

rowenc: Christmas cleanup#74357
craig[bot] merged 9 commits intocockroachdb:masterfrom
RaduBerinde:rowenc-cleanup

Conversation

@RaduBerinde
Copy link
Copy Markdown
Member

This PR contains assorted cleanups of the row encoding code. This code has evolved organically and it is pretty hard to navigate all the details.

rowenc: reduce reliance on the entire TableDescriptor

This change reduces uses of the entire TableDescriptor.

Release note: None

descpb: add and use NoColumnID constant

This change adds a descpb.NoColumnID constant so we don't have to
use descpb.ColumnID(encoding.NoColumnID) everywhere.

Release note: None

rowenc: remove DatumAlloc.scratch

This commit removes the DatumAlloc.scratch field which is always
nil.

Release note: None

sql: move DatumAlloc to sem/tree

This commit moves DatumAlloc from sql/rowenc to sql/sem/tree.
This is a fairly low-level construct and is not used exclusively for
row encoding/decoding.

Release note: None

rowenc: move low-level key encoding functions to subpackage

The rowenc package contains a hefty mix of functions operating at
different conceptual levels - some use higher level constructs like
table and index descriptors, others are lower level utilities. The
naming of these functions doesn't help, for example EncodeTableKey
sounds like a top-level function but is actually a low level utility
that appends a single value to a key

This commit moves the lower level utilities for encoding datum values
into keys to the rowenc/keyside package.

Release note: None

rowenc: minor cleanup of array decoding code

The code around array decoding was confusing; we now have a single
decodeArray variant which is the inverse of encodeArray.

Release note: None

rowenc: move low-level value encoding functions to subpackage

This commit moves the lower level utilities for encoding datum values
into values to the rowenc/valueside package.

Release note: None

valueside: introduce ColumnIDDiff type

When encoding multiple values, we encode the differences between the
ColumnIDs. This difference is passed to valueside.Encode, but it is
passed as a descpb.ColumnID. This commit introduces a new type to
make this distinction more evident.

Release note: None

valueside: add Decoder helper

We have a few copies of the same logic of decoding multiple SQL values
from a Value, in particular in range feed handling code.

This commit adds a helper type that centralizes this logic, reducing
duplication.

Note that the query executione engines retain their own specialized
implementations.

Release note: None

@RaduBerinde RaduBerinde requested review from a team as code owners December 31, 2021 03:23
@RaduBerinde RaduBerinde requested a review from a team December 31, 2021 03:23
@RaduBerinde RaduBerinde requested a review from a team as a code owner December 31, 2021 03:23
@RaduBerinde RaduBerinde requested review from dt and removed request for a team December 31, 2021 03:23
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@shermanCRL shermanCRL requested review from HonoreDB and removed request for a team December 31, 2021 15:08
Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Very nice!

Reviewed 23 of 23 files at r1, 15 of 15 files at r2, 2 of 2 files at r3, 135 of 135 files at r4, 53 of 53 files at r5, 2 of 2 files at r6, 55 of 55 files at r7, 23 of 23 files at r8, 11 of 11 files at r9, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @HonoreDB, @jordanlewis, @RaduBerinde, and @yuzefovich)


pkg/sql/rowenc/valueside/encode.go, line 105 at r8 (raw file):

// columns are encoded in a single value, the difference relative to the
// previous column ID is encoded for each column (to minimize space usage).
type ColumnIDDiff uint32

super-nit: a nice alternate name could be ColumnIDDelta


pkg/sql/rowenc/valueside/encode.go, line 110 at r8 (raw file):

// This is used when we use value encodings not to write KV Values but other
// purposes, for example transferring a value over DistSQL (in the row engine).
const NoColumnID = ColumnIDDiff(encoding.NoColumnID)

nit: NoColumnIDDiffwould make it more clear it's a ColumnIDDiff, not a ColumnID. Or maybe UnspecifiedColumnIDDiff?

Copy link
Copy Markdown
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR and Happy New Year!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @dt, @HonoreDB, @jordanlewis, @mgartner, and @yuzefovich)


pkg/sql/rowenc/valueside/encode.go, line 105 at r8 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

super-nit: a nice alternate name could be ColumnIDDelta

Done.


pkg/sql/rowenc/valueside/encode.go, line 110 at r8 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: NoColumnIDDiffwould make it more clear it's a ColumnIDDiff, not a ColumnID. Or maybe UnspecifiedColumnIDDiff?

Meh, I left it, I think of it as the value for the "no column ID" situation. The type makes it clear what it is.

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
value, err = rowenc.EncodeTableValue(
value, descpb.ColumnID(encoding.NoColumnID), datum, scratch,
value, descpb.NoColumnID, datum, scratch,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking that these ids are getting annoying from a dependency hierarchy perspective. I hate the relationship to tree too. I don't know exactly where to slot it, but I want the ID types to be extremely low in the import tree and then, maybe, get forwarded up to catalog or something. What I have in mind is a catid package which can be imported by tree, descpb, etc. I'm running into this now because I want to move the new schema changer state (scpb) below descpb so it can be imported and am being thwarted by these IDs. I'll make that change after this goes in and would love to move encoding.NoColumnID to catid.NoColumnID or perhaps even catid.NoColumn. Thoughts?

// If the inverted column stores an encoded datum, the encoding is
// DatumEncoding_ASCENDING_KEY, and is performed using
// EncodeTableKey(nil /* prefix */, val tree.Datum, encoding.Ascending).
// rowside.Encode(nil /* prefix */, val tree.Datum, encoding.Ascending).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rowside?

// descriptors.
//
// See also: docs/tech-notes/encoding.md.
package keyside
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why keyside, what's a side? keyenc?

// SQL values (columns) per roachpb.Value.
//
// See also: docs/tech-notes/encoding.md.
package valueside
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rowside is a bit clearer now in the context of valueside. I'm cool with the names but would also be cool with keyenc/valenc. I see that valueenc is frustrating with the ee in the middle.

Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 199 files at r10, 35 of 135 files at r13, 9 of 53 files at r14, 11 of 55 files at r16, 18 of 32 files at r17, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @HonoreDB, @jordanlewis, @mgartner, @RaduBerinde, and @yuzefovich)


pkg/sql/rowenc/valueside/encode.go, line 110 at r8 (raw file):

Previously, RaduBerinde wrote…

Meh, I left it, I think of it as the value for the "no column ID" situation. The type makes it clear what it is.

👍

Copy link
Copy Markdown
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @dt, @HonoreDB, @jordanlewis, @mgartner, @RaduBerinde, and @yuzefovich)


pkg/ccl/partitionccl/partition.go, line 118 at r11 (raw file):

Previously, ajwerner wrote…

I've been thinking that these ids are getting annoying from a dependency hierarchy perspective. I hate the relationship to tree too. I don't know exactly where to slot it, but I want the ID types to be extremely low in the import tree and then, maybe, get forwarded up to catalog or something. What I have in mind is a catid package which can be imported by tree, descpb, etc. I'm running into this now because I want to move the new schema changer state (scpb) below descpb so it can be imported and am being thwarted by these IDs. I'll make that change after this goes in and would love to move encoding.NoColumnID to catid.NoColumnID or perhaps even catid.NoColumn. Thoughts?

I agree they should be defined in a lower package I'd like the keyside/valueside code to not depend on the catalog or descpb.

Note though that the thing in encoding is not a simple column ID, it can also be a delta between column IDs. I introduce a type for that in valueside in the next-to-last commit in this PR.

This change reduces uses of the entire TableDescriptor.

Release note: None
This change adds a `descpb.NoColumnID` constant so we don't have to
use `descpb.ColumnID(encoding.NoColumnID)` everywhere.

Release note: None
This commit removes the `DatumAlloc.scratch` field which is always
nil.

Release note: None
This commit moves `DatumAlloc` from `sql/rowenc` to `sql/sem/tree`.
This is a fairly low-level construct and is not used exclusively for
row encoding/decoding.

Release note: None
The `rowenc` package contains a hefty mix of functions operating at
different conceptual levels - some use higher level constructs like
table and index descriptors, others are lower level utilities. The
naming of these functions doesn't help, for example `EncodeTableKey`
sounds like a top-level function but is actually a low level utility
that appends a single value to a key

This commit moves the lower level utilities for encoding datum values
into keys to the `rowenc/keyside` package.

Release note: None
The code around array decoding was confusing; we now have a single
`decodeArray` variant which is the inverse of `encodeArray`.

Release note: None
This commit moves the lower level utilities for encoding datum values
into values to the `rowenc/valueside` package.

Release note: None
When encoding multiple values, we encode the differences between the
ColumnIDs. This difference is passed to `valueside.Encode`, but it is
passed as a `descpb.ColumnID`. This commit introduces a new type to
make this distinction more evident.

Release note: None
We have a few copies of the same logic of decoding multiple SQL values
from a Value, in particular in range feed handling code.

This commit adds a helper type that centralizes this logic, reducing
duplication.

Note that the query executione engines retain their own specialized
implementations.

Release note: None
Copy link
Copy Markdown
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @dt, @HonoreDB, @jordanlewis, @mgartner, and @yuzefovich)


pkg/sql/inverted/expression.go, line 30 at r14 (raw file):

Previously, ajwerner wrote…

rowside?

Should be keyside, fixed.


pkg/sql/rowenc/valueside/doc.go, line 28 at r16 (raw file):

Previously, ajwerner wrote…

rowside is a bit clearer now in the context of valueside. I'm cool with the names but would also be cool with keyenc/valenc. I see that valueenc is frustrating with the ee in the middle.

I think rowenc/keyenc are better package names but not in actual usage, e.g. rowside.Encode() is better than rowenc.Encode() IMO. That said, I don't have a problem switching if someone feels strongly (if not, I'll leave them to avoid reworking the commits).

@RaduBerinde
Copy link
Copy Markdown
Member Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 9, 2022

Build succeeded:

@craig craig bot merged commit ad27839 into cockroachdb:master Jan 9, 2022
@RaduBerinde RaduBerinde deleted the rowenc-cleanup branch January 15, 2022 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants