rowenc: Christmas cleanup by RaduBerinde · Pull Request #74357 · cockroachdb/cockroach

RaduBerinde · 2021-12-31T03:23:00Z

This PR contains assorted cleanups of the row encoding code. This code has evolved organically and it is pretty hard to navigate all the details.

rowenc: reduce reliance on the entire TableDescriptor

This change reduces uses of the entire TableDescriptor.

Release note: None

descpb: add and use NoColumnID constant

This change adds a descpb.NoColumnID constant so we don't have to
use descpb.ColumnID(encoding.NoColumnID) everywhere.

Release note: None

rowenc: remove DatumAlloc.scratch

This commit removes the DatumAlloc.scratch field which is always
nil.

Release note: None

sql: move `DatumAlloc` to sem/tree

This commit moves DatumAlloc from sql/rowenc to sql/sem/tree.
This is a fairly low-level construct and is not used exclusively for
row encoding/decoding.

Release note: None

rowenc: move low-level key encoding functions to subpackage

The rowenc package contains a hefty mix of functions operating at
different conceptual levels - some use higher level constructs like
table and index descriptors, others are lower level utilities. The
naming of these functions doesn't help, for example EncodeTableKey
sounds like a top-level function but is actually a low level utility
that appends a single value to a key

This commit moves the lower level utilities for encoding datum values
into keys to the rowenc/keyside package.

Release note: None

rowenc: minor cleanup of array decoding code

The code around array decoding was confusing; we now have a single
decodeArray variant which is the inverse of encodeArray.

Release note: None

rowenc: move low-level value encoding functions to subpackage

This commit moves the lower level utilities for encoding datum values
into values to the rowenc/valueside package.

Release note: None

valueside: introduce ColumnIDDiff type

When encoding multiple values, we encode the differences between the
ColumnIDs. This difference is passed to valueside.Encode, but it is
passed as a descpb.ColumnID. This commit introduces a new type to
make this distinction more evident.

Release note: None

valueside: add Decoder helper

We have a few copies of the same logic of decoding multiple SQL values
from a Value, in particular in range feed handling code.

This commit adds a helper type that centralizes this logic, reducing
duplication.

Note that the query executione engines retain their own specialized
implementations.

Release note: None

cockroach-teamcity · 2021-12-31T03:23:15Z

This change is

mgartner

Very nice!

Reviewed 23 of 23 files at r1, 15 of 15 files at r2, 2 of 2 files at r3, 135 of 135 files at r4, 53 of 53 files at r5, 2 of 2 files at r6, 55 of 55 files at r7, 23 of 23 files at r8, 11 of 11 files at r9, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @HonoreDB, @jordanlewis, @RaduBerinde, and @yuzefovich)

pkg/sql/rowenc/valueside/encode.go, line 105 at r8 (raw file):

// columns are encoded in a single value, the difference relative to the
// previous column ID is encoded for each column (to minimize space usage).
type ColumnIDDiff uint32

super-nit: a nice alternate name could be ColumnIDDelta

pkg/sql/rowenc/valueside/encode.go, line 110 at r8 (raw file):

// This is used when we use value encodings not to write KV Values but other
// purposes, for example transferring a value over DistSQL (in the row engine).
const NoColumnID = ColumnIDDiff(encoding.NoColumnID)

nit: NoColumnIDDiffwould make it more clear it's a ColumnIDDiff, not a ColumnID. Or maybe UnspecifiedColumnIDDiff?

RaduBerinde

TFTR and Happy New Year!

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @dt, @HonoreDB, @jordanlewis, @mgartner, and @yuzefovich)

pkg/sql/rowenc/valueside/encode.go, line 105 at r8 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

super-nit: a nice alternate name could be ColumnIDDelta

Done.

pkg/sql/rowenc/valueside/encode.go, line 110 at r8 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: NoColumnIDDiffwould make it more clear it's a ColumnIDDiff, not a ColumnID. Or maybe UnspecifiedColumnIDDiff?

Meh, I left it, I think of it as the value for the "no column ID" situation. The type makes it clear what it is.

ajwerner

LGTM

ajwerner · 2022-01-03T06:30:34Z

pkg/ccl/partitionccl/partition.go

 		}
 		value, err = rowenc.EncodeTableValue(
-			value, descpb.ColumnID(encoding.NoColumnID), datum, scratch,
+			value, descpb.NoColumnID, datum, scratch,


I've been thinking that these ids are getting annoying from a dependency hierarchy perspective. I hate the relationship to tree too. I don't know exactly where to slot it, but I want the ID types to be extremely low in the import tree and then, maybe, get forwarded up to catalog or something. What I have in mind is a catid package which can be imported by tree, descpb, etc. I'm running into this now because I want to move the new schema changer state (scpb) below descpb so it can be imported and am being thwarted by these IDs. I'll make that change after this goes in and would love to move encoding.NoColumnID to catid.NoColumnID or perhaps even catid.NoColumn. Thoughts?

ajwerner · 2022-01-03T06:34:22Z

pkg/sql/inverted/expression.go

 // If the inverted column stores an encoded datum, the encoding is
 // DatumEncoding_ASCENDING_KEY, and is performed using
-// EncodeTableKey(nil /* prefix */, val tree.Datum, encoding.Ascending).
+// rowside.Encode(nil /* prefix */, val tree.Datum, encoding.Ascending).


ajwerner · 2022-01-03T06:36:24Z

pkg/sql/rowenc/keyside/doc.go

+// descriptors.
+//
+// See also: docs/tech-notes/encoding.md.
+package keyside


why keyside, what's a side? keyenc?

ajwerner · 2022-01-03T06:40:07Z

pkg/sql/rowenc/valueside/doc.go

+//    SQL values (columns) per roachpb.Value.
+//
+// See also: docs/tech-notes/encoding.md.
+package valueside


rowside is a bit clearer now in the context of valueside. I'm cool with the names but would also be cool with keyenc/valenc. I see that valueenc is frustrating with the ee in the middle.

mgartner

Reviewed 10 of 199 files at r10, 35 of 135 files at r13, 9 of 53 files at r14, 11 of 55 files at r16, 18 of 32 files at r17, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @HonoreDB, @jordanlewis, @mgartner, @RaduBerinde, and @yuzefovich)

pkg/sql/rowenc/valueside/encode.go, line 110 at r8 (raw file):

Previously, RaduBerinde wrote…

Meh, I left it, I think of it as the value for the "no column ID" situation. The type makes it clear what it is.

👍

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @dt, @HonoreDB, @jordanlewis, @mgartner, @RaduBerinde, and @yuzefovich)

pkg/ccl/partitionccl/partition.go, line 118 at r11 (raw file):

Previously, ajwerner wrote…

I've been thinking that these ids are getting annoying from a dependency hierarchy perspective. I hate the relationship to tree too. I don't know exactly where to slot it, but I want the ID types to be extremely low in the import tree and then, maybe, get forwarded up to catalog or something. What I have in mind is a catid package which can be imported by tree, descpb, etc. I'm running into this now because I want to move the new schema changer state (scpb) below descpb so it can be imported and am being thwarted by these IDs. I'll make that change after this goes in and would love to move encoding.NoColumnID to catid.NoColumnID or perhaps even catid.NoColumn. Thoughts?

I agree they should be defined in a lower package I'd like the keyside/valueside code to not depend on the catalog or descpb.

Note though that the thing in encoding is not a simple column ID, it can also be a delta between column IDs. I introduce a type for that in valueside in the next-to-last commit in this PR.

This change reduces uses of the entire TableDescriptor. Release note: None

This change adds a `descpb.NoColumnID` constant so we don't have to use `descpb.ColumnID(encoding.NoColumnID)` everywhere. Release note: None

This commit removes the `DatumAlloc.scratch` field which is always nil. Release note: None

This commit moves `DatumAlloc` from `sql/rowenc` to `sql/sem/tree`. This is a fairly low-level construct and is not used exclusively for row encoding/decoding. Release note: None

The `rowenc` package contains a hefty mix of functions operating at different conceptual levels - some use higher level constructs like table and index descriptors, others are lower level utilities. The naming of these functions doesn't help, for example `EncodeTableKey` sounds like a top-level function but is actually a low level utility that appends a single value to a key This commit moves the lower level utilities for encoding datum values into keys to the `rowenc/keyside` package. Release note: None

The code around array decoding was confusing; we now have a single `decodeArray` variant which is the inverse of `encodeArray`. Release note: None

This commit moves the lower level utilities for encoding datum values into values to the `rowenc/valueside` package. Release note: None

When encoding multiple values, we encode the differences between the ColumnIDs. This difference is passed to `valueside.Encode`, but it is passed as a `descpb.ColumnID`. This commit introduces a new type to make this distinction more evident. Release note: None

We have a few copies of the same logic of decoding multiple SQL values from a Value, in particular in range feed handling code. This commit adds a helper type that centralizes this logic, reducing duplication. Note that the query executione engines retain their own specialized implementations. Release note: None

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @dt, @HonoreDB, @jordanlewis, @mgartner, and @yuzefovich)

pkg/sql/inverted/expression.go, line 30 at r14 (raw file):

Previously, ajwerner wrote…

rowside?

Should be keyside, fixed.

pkg/sql/rowenc/valueside/doc.go, line 28 at r16 (raw file):

Previously, ajwerner wrote…

rowside is a bit clearer now in the context of valueside. I'm cool with the names but would also be cool with keyenc/valenc. I see that valueenc is frustrating with the ee in the middle.

I think rowenc/keyenc are better package names but not in actual usage, e.g. rowside.Encode() is better than rowenc.Encode() IMO. That said, I don't have a problem switching if someone feels strongly (if not, I'll leave them to avoid reworking the commits).

RaduBerinde · 2022-01-09T15:31:14Z

bors r+

craig · 2022-01-09T16:11:17Z

Build succeeded:

GitHub CI (Cockroach)

RaduBerinde requested review from a team, ajwerner, jordanlewis and yuzefovich December 31, 2021 03:23

RaduBerinde requested review from a team as code owners December 31, 2021 03:23

RaduBerinde requested a review from a team December 31, 2021 03:23

RaduBerinde requested a review from a team as a code owner December 31, 2021 03:23

RaduBerinde requested review from dt and removed request for a team December 31, 2021 03:23

shermanCRL requested review from HonoreDB and removed request for a team December 31, 2021 15:08

mgartner approved these changes Dec 31, 2021

View reviewed changes

RaduBerinde force-pushed the rowenc-cleanup branch from c2e25b9 to 16619c7 Compare January 1, 2022 16:33

RaduBerinde commented Jan 1, 2022

View reviewed changes

RaduBerinde force-pushed the rowenc-cleanup branch from 16619c7 to d3bf164 Compare January 1, 2022 16:35

ajwerner approved these changes Jan 3, 2022

View reviewed changes

mgartner approved these changes Jan 3, 2022

View reviewed changes

RaduBerinde commented Jan 3, 2022

View reviewed changes

RaduBerinde added 7 commits January 7, 2022 12:38

rowenc: reduce reliance on the entire TableDescriptor

895f498

This change reduces uses of the entire TableDescriptor. Release note: None

descpb: add and use NoColumnID constant

87a274f

This change adds a `descpb.NoColumnID` constant so we don't have to use `descpb.ColumnID(encoding.NoColumnID)` everywhere. Release note: None

rowenc: remove DatumAlloc.scratch

2c3a7d5

This commit removes the `DatumAlloc.scratch` field which is always nil. Release note: None

sql: move DatumAlloc to sem/tree

2b7ba8c

This commit moves `DatumAlloc` from `sql/rowenc` to `sql/sem/tree`. This is a fairly low-level construct and is not used exclusively for row encoding/decoding. Release note: None

rowenc: minor cleanup of array decoding code

4252cfd

The code around array decoding was confusing; we now have a single `decodeArray` variant which is the inverse of `encodeArray`. Release note: None

rowenc: move low-level value encoding functions to subpackage

e1b8c05

This commit moves the lower level utilities for encoding datum values into values to the `rowenc/valueside` package. Release note: None

RaduBerinde added 2 commits January 7, 2022 12:40

RaduBerinde force-pushed the rowenc-cleanup branch from d3bf164 to 5b04801 Compare January 7, 2022 20:43

RaduBerinde commented Jan 7, 2022

View reviewed changes

craig bot merged commit ad27839 into cockroachdb:master Jan 9, 2022

RaduBerinde deleted the rowenc-cleanup branch January 15, 2022 22:58

ajwerner mentioned this pull request Jun 1, 2022

sql,kv,storage: push column batch generation into kvserver #82323

Open

10 tasks

Conversation

RaduBerinde commented Dec 31, 2021

rowenc: reduce reliance on the entire TableDescriptor

descpb: add and use NoColumnID constant

rowenc: remove DatumAlloc.scratch

sql: move DatumAlloc to sem/tree

rowenc: move low-level key encoding functions to subpackage

rowenc: minor cleanup of array decoding code

rowenc: move low-level value encoding functions to subpackage

valueside: introduce ColumnIDDiff type

valueside: add Decoder helper

Uh oh!

cockroach-teamcity commented Dec 31, 2021

Uh oh!

mgartner left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner Jan 3, 2022

Choose a reason for hiding this comment

Uh oh!

ajwerner Jan 3, 2022

Choose a reason for hiding this comment

Uh oh!

ajwerner Jan 3, 2022

Choose a reason for hiding this comment

Uh oh!

ajwerner Jan 3, 2022

Choose a reason for hiding this comment

Uh oh!

mgartner left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde commented Jan 9, 2022

Uh oh!

craig bot commented Jan 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sql: move `DatumAlloc` to sem/tree