sql: introduce special type for inverted index keys by RaduBerinde · Pull Request #77501 · cockroachdb/cockroach

RaduBerinde · 2022-03-08T19:10:25Z

Note: this is intended to merge to master after the 22.1 branch is cut.

Currently we pass around inverted index keys as having Bytes type,
but these datums behave in subtly different ways. The
encoding/decoding is special-cased to be a no-op. This leads to sharp
edges; for example, we put these keys in EncodedDatums but if we
ever try to decode them (via EnsureDecodedd), we get obscure
decoding errors.

This commit cleans up this situation by adding a special EncodedKey
type and corresponding DEncodedKey datum. It behaves like
Bytes/DBytes in most cases, except key encoding and decoding.

Release note: None

cockroach-teamcity · 2022-03-08T19:10:35Z

This change is

yuzefovich

Reviewed 27 of 27 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @RaduBerinde, and @rytaft)

-- commits, line 8 at r1:
nit: s/Decodedd/Decoded/.

pkg/sql/sem/tree/datum.go, line 1626 at r1 (raw file):

// Format implements the NodeFormatter interface.
func (d *DEncodedKey) Format(ctx *FmtCtx) {
	f := ctx.flags

nit: it seems like it'd be good to extract a helper to be shared by two Bytes datums.

pkg/sql/sem/tree/datum_alloc.go, line 151 at r1 (raw file):

}

// NewDEncodedKey allocates a DBytes.

nit: s/DBytes/DEncodedKey/.

mgartner

Great idea!

To prevent poor row count estimations I think we'll need to include the new type you've created here:

cockroach/pkg/sql/opt/props/histogram.go

Line 835 in 9b385e6

case types.StringFamily, types.BytesFamily, types.UuidFamily, types.INetFamily:

Otherwise, if a query filters by a value in the middle of a histogram bucket, the estimated row count will be the bucket's num_range divided by two.

Does the histo_col_type field need to change in table statistics? Or is it fine to leave that as BYTES for inverted index columns.

Reviewed 27 of 27 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)

pkg/sql/types/types.go, line 468 at r1 (raw file):

	// EncodedKey is a special type used internally for inverted index keys, which
	// do not fully encode an object.
	EncodedKey = &T{

nit: Unless you think it's likely this will be used for other purposes in the future, it might make sense to make the name of the type and family more specific to inverted index keys.

mgartner · 2022-03-08T21:58:06Z

To prevent poor row count estimations...

See #68740 for some more context on this.

mgartner · 2022-03-08T22:03:02Z

To prevent poor row count estimations...

Hmm, I guess these datums are the upper and lower bound of the histogram buckets, so if the type remains as BYTES, perhaps no change is needed here.

RaduBerinde

The statistics creation and props code still uses DBytes (props.makeSpanFromInvertedSpan), I don't think it's worth changing that. I will add some more comments around the code though.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @RaduBerinde, and @rytaft)

pkg/sql/types/types.go, line 468 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: Unless you think it's likely this will be used for other purposes in the future, it might make sense to make the name of the type and family more specific to inverted index keys.

I think I'd like to keep it, it's a sound concept of its own, maybe there will be some use case for passing around encoded keys. I could imagine potential usecases around composite types like collated strings.

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @rytaft, and @yuzefovich)

pkg/sql/sem/tree/datum.go, line 1626 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: it seems like it'd be good to extract a helper to be shared by two Bytes datums.

I just "rerouted" to the DBytes implementation.

mgartner

The statistics creation and props code still uses DBytes (props.makeSpanFromInvertedSpan), I don't think it's worth changing that. I will add some more comments around the code though.

SGTM. Thanks!

Reviewed 4 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @RaduBerinde, @rytaft, and @yuzefovich)

pkg/sql/sem/tree/datum.go, line 1564 at r2 (raw file):

}

// DEncodedKey is the bytes Datum. The underlying type is a string because we want

nit: DEncodedKey is not the bytes datum. It'd be helpful to explain the contrast between the type and DBytes here.

pkg/sql/types/types.go, line 468 at r1 (raw file):

Previously, RaduBerinde wrote…

I think I'd like to keep it, it's a sound concept of its own, maybe there will be some use case for passing around encoded keys. I could imagine potential usecases around composite types like collated strings.

Sounds good to me. In that case, I think the comment should be more general and explain the behavior of the type (e.g. "EncodedKey is similar to DBytes but has a different behavior for encoding and decoding ...") before mentioning the inverted index key use-case.

RaduBerinde

TFTRs!

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @RaduBerinde, @rytaft, and @yuzefovich)

pkg/sql/types/types.go, line 468 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Sounds good to me. In that case, I think the comment should be more general and explain the behavior of the type (e.g. "EncodedKey is similar to DBytes but has a different behavior for encoding and decoding ...") before mentioning the inverted index key use-case.

Done, PTAL

mgartner

Reviewed 1 of 5 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft and @yuzefovich)

pkg/sql/types/types.go, line 468 at r1 (raw file):

Previously, RaduBerinde wrote…

Done, PTAL

Looks great thanks!

Currently we pass around inverted index keys as having `Bytes` type, but these datums behave in subtly different ways. The encoding/decoding is special-cased to be a no-op. This leads to sharp edges; for example, we put these keys in `EncodedDatum`s but if we ever try to decode them (via `EnsureDecoded`), we get obscure decoding errors. This commit cleans up this situation by adding a special `EncodedKey` type and corresponding `DEncodedKey` datum. It behaves like `Bytes/DBytes` in most cases, except key encoding and decoding. Release note: None

RaduBerinde · 2022-03-15T22:17:57Z

bors r+

craig · 2022-03-16T00:27:48Z

Build succeeded:

GitHub CI (Cockroach)

RaduBerinde requested review from mgartner, rytaft and yuzefovich March 8, 2022 19:10

RaduBerinde requested a review from a team as a code owner March 8, 2022 19:10

RaduBerinde requested review from a team March 8, 2022 19:10

RaduBerinde requested a review from a team as a code owner March 8, 2022 19:10

yuzefovich approved these changes Mar 8, 2022

View reviewed changes

mgartner reviewed Mar 8, 2022

View reviewed changes

RaduBerinde commented Mar 9, 2022

View reviewed changes

RaduBerinde force-pushed the inv-key-type branch from faf90d1 to 49928f3 Compare March 10, 2022 04:03

RaduBerinde commented Mar 10, 2022

View reviewed changes

mgartner reviewed Mar 10, 2022

View reviewed changes

RaduBerinde force-pushed the inv-key-type branch from 49928f3 to 5b7b2da Compare March 10, 2022 18:03

RaduBerinde commented Mar 10, 2022

View reviewed changes

mgartner approved these changes Mar 10, 2022

View reviewed changes

RaduBerinde force-pushed the inv-key-type branch 3 times, most recently from 07f4098 to d17cef9 Compare March 13, 2022 05:32

RaduBerinde force-pushed the inv-key-type branch from d17cef9 to cd7e855 Compare March 15, 2022 17:29

craig bot merged commit be54b85 into cockroachdb:master Mar 16, 2022

RaduBerinde deleted the inv-key-type branch March 19, 2022 01:33

ajwerner mentioned this pull request Jun 1, 2022

sql,kv,storage: push column batch generation into kvserver #82323

Open

10 tasks

mgartner mentioned this pull request Jun 7, 2022

sql: support geospatial index for index recommendation indexrec #82293

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: introduce special type for inverted index keys#77501

sql: introduce special type for inverted index keys#77501
craig[bot] merged 1 commit intocockroachdb:masterfrom
RaduBerinde:inv-key-type

RaduBerinde commented Mar 8, 2022

Uh oh!

cockroach-teamcity commented Mar 8, 2022

Uh oh!

yuzefovich left a comment

Uh oh!

mgartner left a comment

Uh oh!

mgartner commented Mar 8, 2022

Uh oh!

mgartner commented Mar 8, 2022

Uh oh!

RaduBerinde left a comment

Uh oh!

RaduBerinde left a comment

Uh oh!

mgartner left a comment

Uh oh!

RaduBerinde left a comment

Uh oh!

mgartner left a comment

Uh oh!

RaduBerinde commented Mar 15, 2022

Uh oh!

craig bot commented Mar 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

RaduBerinde commented Mar 8, 2022

Uh oh!

cockroach-teamcity commented Mar 8, 2022

Uh oh!

yuzefovich left a comment

Choose a reason for hiding this comment

Uh oh!

mgartner left a comment

Choose a reason for hiding this comment

Uh oh!

mgartner commented Mar 8, 2022

Uh oh!

mgartner commented Mar 8, 2022

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

mgartner left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

mgartner left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde commented Mar 15, 2022

Uh oh!

craig bot commented Mar 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants