sql: introduce special type for inverted index keys#77501
sql: introduce special type for inverted index keys#77501craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 27 of 27 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @RaduBerinde, and @rytaft)
-- commits, line 8 at r1:
nit: s/Decodedd/Decoded/.
pkg/sql/sem/tree/datum.go, line 1626 at r1 (raw file):
// Format implements the NodeFormatter interface. func (d *DEncodedKey) Format(ctx *FmtCtx) { f := ctx.flags
nit: it seems like it'd be good to extract a helper to be shared by two Bytes datums.
pkg/sql/sem/tree/datum_alloc.go, line 151 at r1 (raw file):
} // NewDEncodedKey allocates a DBytes.
nit: s/DBytes/DEncodedKey/.
mgartner
left a comment
There was a problem hiding this comment.
Great idea!
To prevent poor row count estimations I think we'll need to include the new type you've created here:
cockroach/pkg/sql/opt/props/histogram.go
Line 835 in 9b385e6
Otherwise, if a query filters by a value in the middle of a histogram bucket, the estimated row count will be the bucket's num_range divided by two.
Does the histo_col_type field need to change in table statistics? Or is it fine to leave that as BYTES for inverted index columns.
Reviewed 27 of 27 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
pkg/sql/types/types.go, line 468 at r1 (raw file):
// EncodedKey is a special type used internally for inverted index keys, which // do not fully encode an object. EncodedKey = &T{
nit: Unless you think it's likely this will be used for other purposes in the future, it might make sense to make the name of the type and family more specific to inverted index keys.
See #68740 for some more context on this. |
Hmm, I guess these datums are the upper and lower bound of the histogram buckets, so if the type remains as |
RaduBerinde
left a comment
There was a problem hiding this comment.
The statistics creation and props code still uses DBytes (props.makeSpanFromInvertedSpan), I don't think it's worth changing that. I will add some more comments around the code though.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @RaduBerinde, and @rytaft)
pkg/sql/types/types.go, line 468 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: Unless you think it's likely this will be used for other purposes in the future, it might make sense to make the name of the type and family more specific to inverted index keys.
I think I'd like to keep it, it's a sound concept of its own, maybe there will be some use case for passing around encoded keys. I could imagine potential usecases around composite types like collated strings.
faf90d1 to
49928f3
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @rytaft, and @yuzefovich)
pkg/sql/sem/tree/datum.go, line 1626 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: it seems like it'd be good to extract a helper to be shared by two Bytes datums.
I just "rerouted" to the DBytes implementation.
mgartner
left a comment
There was a problem hiding this comment.
The statistics creation and props code still uses DBytes (props.makeSpanFromInvertedSpan), I don't think it's worth changing that. I will add some more comments around the code though.
SGTM. Thanks!
Reviewed 4 of 5 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @RaduBerinde, @rytaft, and @yuzefovich)
pkg/sql/sem/tree/datum.go, line 1564 at r2 (raw file):
} // DEncodedKey is the bytes Datum. The underlying type is a string because we want
nit: DEncodedKey is not the bytes datum. It'd be helpful to explain the contrast between the type and DBytes here.
pkg/sql/types/types.go, line 468 at r1 (raw file):
Previously, RaduBerinde wrote…
I think I'd like to keep it, it's a sound concept of its own, maybe there will be some use case for passing around encoded keys. I could imagine potential usecases around composite types like collated strings.
Sounds good to me. In that case, I think the comment should be more general and explain the behavior of the type (e.g. "EncodedKey is similar to DBytes but has a different behavior for encoding and decoding ...") before mentioning the inverted index key use-case.
49928f3 to
5b7b2da
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
TFTRs!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @RaduBerinde, @rytaft, and @yuzefovich)
pkg/sql/types/types.go, line 468 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Sounds good to me. In that case, I think the comment should be more general and explain the behavior of the type (e.g. "EncodedKey is similar to DBytes but has a different behavior for encoding and decoding ...") before mentioning the inverted index key use-case.
Done, PTAL
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 1 of 5 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft and @yuzefovich)
pkg/sql/types/types.go, line 468 at r1 (raw file):
Previously, RaduBerinde wrote…
Done, PTAL
Looks great thanks!
07f4098 to
d17cef9
Compare
Currently we pass around inverted index keys as having `Bytes` type, but these datums behave in subtly different ways. The encoding/decoding is special-cased to be a no-op. This leads to sharp edges; for example, we put these keys in `EncodedDatum`s but if we ever try to decode them (via `EnsureDecoded`), we get obscure decoding errors. This commit cleans up this situation by adding a special `EncodedKey` type and corresponding `DEncodedKey` datum. It behaves like `Bytes/DBytes` in most cases, except key encoding and decoding. Release note: None
d17cef9 to
cd7e855
Compare
|
bors r+ |
|
Build succeeded: |
Note: this is intended to merge to master after the 22.1 branch is cut.
Currently we pass around inverted index keys as having
Bytestype,but these datums behave in subtly different ways. The
encoding/decoding is special-cased to be a no-op. This leads to sharp
edges; for example, we put these keys in
EncodedDatums but if weever try to decode them (via
EnsureDecodedd), we get obscuredecoding errors.
This commit cleans up this situation by adding a special
EncodedKeytype and corresponding
DEncodedKeydatum. It behaves likeBytes/DBytesin most cases, except key encoding and decoding.Release note: None