sql: directly specify columns in TableReader#75114
sql: directly specify columns in TableReader#75114craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
e9abbf5 to
425f9be
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
I'm very happy to see this happening, great work!
Reviewed 72 of 74 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @stevendanna)
-- commits, line 35 at r2:
nit: it might be worth to call out this change with columns by name in a release note.
pkg/sql/colfetcher/colbatch_scan.go, line 210 at r2 (raw file):
// Make sure that render expressions are deserialized right away so that we // don't have to re-parse them multiple times. if post.RenderExprs != nil {
Did you add this because it was in populateTableArgsLegacy? I think we should not include this here - over there, we need the render expressions to be deserialized because we might remap the indexed vars, but IIUC this PR doesn't have to do that.
pkg/sql/execinfra/version.go, line 42 at r2 (raw file):
// ATTENTION: When updating these fields, add a brief description of what // changed to the version history below. const Version execinfrapb.DistSQLVersion = 55
These also need to be bumped (we forgot to do it for version 56).
pkg/sql/execinfrapb/flow_diagram.go, line 155 at r2 (raw file):
col, err := tbl.FindColumnWithID(colID) if i > 0 { b.WriteByte(',')
If we need to scan many columns, then the box will become pretty wide. Thoughts on writing one column per line? Or maybe have a limit of, say of up to 100 characters per line?
pkg/sql/row/fetcher.go, line 294 at r2 (raw file):
var err error // Scan through the entire columns map to see which columns are
nit: the comment needs an update.
pkg/sql/row/fetcher.go, line 1112 at r2 (raw file):
// NextRowInto calls NextRow and copies the results into the given EncDatumRow // slice according to the given column map. // mapping.
nit: leftover.
pkg/sql/row/fetcher.go, line 1167 at r2 (raw file):
// mapping. // // Values for columns that are not in the map are ignored. cDatums in
nit: s/cDatums/Datums/.
pkg/sql/row/fetcher.go, line 1193 at r2 (raw file):
continue } if err := encDatum.EnsureDecoded(col.GetType(), rf.alloc); err != nil {
Does col.GetType() return the correct type for the inverted column? It might be worth making sure that inverted_.* logic tests run through the row-by-row engine correctly (maybe they already do).
pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 730 at r2 (raw file):
ORDER BY message LIKE 'fetched:%' DESC, ordinality ASC ---- fetched: /xyz/zyx/3.0/2/1 -> <undecoded>
This seems suspicious - we're scanning INDEX zyx (z, y, x), and only need the value of x, so I think the previous output was correct.
pkg/sql/opt/exec/execbuilder/testdata/explain_analyze_plans, line 338 at r2 (raw file):
distribution: <hidden> vectorized: <hidden> rows read from KV: 1 (8 B)
These changes to stats also seem suspicious.
pkg/sql/opt/exec/execbuilder/testdata/schema_change_in_txn_nonmetamorphic, line 34 at r2 (raw file):
CPut /Table/56/1/101/1/1 -> /TUPLE/4:4:Decimal/0.00/2:6:Decimal/1101.2 query T
nit: this seems like an unrelated addition of a test.
pkg/sql/opt/exec/execbuilder/testdata/vectorize_local, line 72 at r2 (raw file):
distribution: <hidden> vectorized: <hidden> rows read from KV: 2 (16 B)
ditto
425f9be to
fe3f650
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @stevendanna and @yuzefovich)
pkg/sql/colfetcher/colbatch_scan.go, line 210 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Did you add this because it was in
populateTableArgsLegacy? I think we should not include this here - over there, we need the render expressions to be deserialized because we might remap the indexed vars, but IIUC this PR doesn't have to do that.
Yup, removed.
pkg/sql/execinfra/version.go, line 42 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
These also need to be bumped (we forgot to do it for version 56).
🤦
pkg/sql/execinfrapb/flow_diagram.go, line 155 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
If we need to scan many columns, then the box will become pretty wide. Thoughts on writing one column per line? Or maybe have a limit of, say of up to 100 characters per line?
Done.
pkg/sql/row/fetcher.go, line 1193 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Does
col.GetType()return the correct type for the inverted column? It might be worth making sure thatinverted_.*logic tests run through the row-by-row engine correctly (maybe they already do).
Yeah, we're passing in the full column descriptors and the callers do the inverted column type replacement before setting up the fetcher.
pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 730 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This seems suspicious - we're scanning
INDEX zyx (z, y, x), and only need the value ofx, so I think the previous output was correct.
In the optimizer, if a scan is constrained it must return the columns it is constrained on. This is why the plan above has the scan returning x,y,z and then it projects away column x. We could fix that in the optimizer if it's a problem - filed #75153.
pkg/sql/opt/exec/execbuilder/testdata/explain_analyze_plans, line 338 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
These changes to stats also seem suspicious.
Great catch, I didn't notice that! I did not override the new method in the rowFetcherStatCollector. I now unembedded the row.Fetcher so that we have to implement all methods explicitly.
pkg/sql/opt/exec/execbuilder/testdata/vectorize_local, line 72 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
ditto
Fixed.
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 2 of 74 files at r2, 10 of 10 files at r3, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @stevendanna)
pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 730 at r2 (raw file):
Previously, RaduBerinde wrote…
In the optimizer, if a scan is constrained it must return the columns it is constrained on. This is why the plan above has the scan returning
x,y,zand then it projects away columnx. We could fix that in the optimizer if it's a problem - filed #75153.
I see, thanks.
pkg/sql/execinfrapb/flow_diagram_external_test.go, line 1 at r3 (raw file):
// Copyright 2018 The Cockroach Authors.
nit: s/2018/2022/.
pkg/sql/execinfrapb/flow_diagram_external_test.go, line 48 at r3 (raw file):
this_is_a_very_long_name_15 INT, this_is_a_very_long_name_16 INT, x INT,
super nit: the indentation is different.
fe3f650 to
673d88c
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @stevendanna and @yuzefovich)
pkg/sql/execinfrapb/flow_diagram_external_test.go, line 48 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
super nit: the indentation is different.
Done.
673d88c to
f8aecf2
Compare
The internal columns of the TableReader (as well as the row fetcher) are all the columns of the table, with only a subset of values actually produced. This design decision has been carried over way past the point where it makes sense (I admit, it's questionable whether it ever made sense). For one, "all the columns" is ambiguous (does it contain non-public columns? does it include system columns?) leading to various flags and inherent fragility. Second, it relies on the execution engine to figure out (based on the PostProcessSpec) which columns are actually needed, which the optimizer already figures out for us now. This commit changes the TableReader spec and the interface of row.Fetcher to always produce a given specific set of column IDs. The diagram for table readers now specifies the columns by name. The JoinReader, InvertedJoiner, ZigzagJoiner are not changed in this commit (but they should be cleaned up as well). Release note (sql change): The distributed plan diagram now lists scanned column names for TableReaders.
f8aecf2 to
059d5d8
Compare
|
bors r+ |
|
Build failed (retrying...): |
|
Build succeeded: |
Note: the first commit is #74922.The internal columns of the TableReader (as well as the row fetcher)
are all the columns of the table, with only a subset of values
actually produced. This design decision has been carried over way past
the point where it makes sense (I admit, it's questionable whether it
ever made sense). For one, "all the columns" is ambiguous (does it
contain non-public columns? does it include system columns?) leading
to various flags and inherent fragility. Second, it relies on the
execution engine to figure out (based on the PostProcessSpec) which
columns are actually needed, which the optimizer already figures out
for us now.
This commit changes the TableReader spec and the interface of
row.Fetcher to always produce a given specific set of column IDs. The
diagram for table readers now specifies the columns by name.
The JoinReader, InvertedJoiner, ZigzagJoiner are not changed in this
commit (but they should be cleaned up as well).
Release note: None