sql: directly specify columns in TableReader by RaduBerinde · Pull Request #75114 · cockroachdb/cockroach

RaduBerinde · 2022-01-18T23:16:49Z

~~Note: the first commit is #74922.~~

The internal columns of the TableReader (as well as the row fetcher)
are all the columns of the table, with only a subset of values
actually produced. This design decision has been carried over way past
the point where it makes sense (I admit, it's questionable whether it
ever made sense). For one, "all the columns" is ambiguous (does it
contain non-public columns? does it include system columns?) leading
to various flags and inherent fragility. Second, it relies on the
execution engine to figure out (based on the PostProcessSpec) which
columns are actually needed, which the optimizer already figures out
for us now.

This commit changes the TableReader spec and the interface of
row.Fetcher to always produce a given specific set of column IDs. The
diagram for table readers now specifies the columns by name.

The JoinReader, InvertedJoiner, ZigzagJoiner are not changed in this
commit (but they should be cleaned up as well).

Release note: None

cockroach-teamcity · 2022-01-18T23:16:56Z

This change is

yuzefovich

I'm very happy to see this happening, great work!

Reviewed 72 of 74 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @stevendanna)

-- commits, line 35 at r2:
nit: it might be worth to call out this change with columns by name in a release note.

pkg/sql/colfetcher/colbatch_scan.go, line 210 at r2 (raw file):

	// Make sure that render expressions are deserialized right away so that we
	// don't have to re-parse them multiple times.
	if post.RenderExprs != nil {

Did you add this because it was in populateTableArgsLegacy? I think we should not include this here - over there, we need the render expressions to be deserialized because we might remap the indexed vars, but IIUC this PR doesn't have to do that.

pkg/sql/execinfra/version.go, line 42 at r2 (raw file):

// ATTENTION: When updating these fields, add a brief description of what
// changed to the version history below.
const Version execinfrapb.DistSQLVersion = 55

These also need to be bumped (we forgot to do it for version 56).

pkg/sql/execinfrapb/flow_diagram.go, line 155 at r2 (raw file):

		col, err := tbl.FindColumnWithID(colID)
		if i > 0 {
			b.WriteByte(',')

If we need to scan many columns, then the box will become pretty wide. Thoughts on writing one column per line? Or maybe have a limit of, say of up to 100 characters per line?

pkg/sql/row/fetcher.go, line 294 at r2 (raw file):

	var err error

	// Scan through the entire columns map to see which columns are

nit: the comment needs an update.

pkg/sql/row/fetcher.go, line 1112 at r2 (raw file):

// NextRowInto calls NextRow and copies the results into the given EncDatumRow
// slice according to the given column map.
// mapping.

nit: leftover.

pkg/sql/row/fetcher.go, line 1167 at r2 (raw file):

// mapping.
//
// Values for columns that are not in the map are ignored. cDatums in

nit: s/cDatums/Datums/.

pkg/sql/row/fetcher.go, line 1193 at r2 (raw file):

			continue
		}
		if err := encDatum.EnsureDecoded(col.GetType(), rf.alloc); err != nil {

Does col.GetType() return the correct type for the inverted column? It might be worth making sure that inverted_.* logic tests run through the row-by-row engine correctly (maybe they already do).

pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 730 at r2 (raw file):

 ORDER BY message LIKE 'fetched:%' DESC, ordinality ASC
----
fetched: /xyz/zyx/3.0/2/1 -> <undecoded>

This seems suspicious - we're scanning INDEX zyx (z, y, x), and only need the value of x, so I think the previous output was correct.

pkg/sql/opt/exec/execbuilder/testdata/explain_analyze_plans, line 338 at r2 (raw file):

distribution: <hidden>
vectorized: <hidden>
rows read from KV: 1 (8 B)

These changes to stats also seem suspicious.

pkg/sql/opt/exec/execbuilder/testdata/schema_change_in_txn_nonmetamorphic, line 34 at r2 (raw file):

CPut /Table/56/1/101/1/1 -> /TUPLE/4:4:Decimal/0.00/2:6:Decimal/1101.2

query T

nit: this seems like an unrelated addition of a test.

pkg/sql/opt/exec/execbuilder/testdata/vectorize_local, line 72 at r2 (raw file):

distribution: <hidden>
vectorized: <hidden>
rows read from KV: 2 (16 B)

ditto

RaduBerinde

TFTR!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @stevendanna and @yuzefovich)

pkg/sql/colfetcher/colbatch_scan.go, line 210 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Did you add this because it was in populateTableArgsLegacy? I think we should not include this here - over there, we need the render expressions to be deserialized because we might remap the indexed vars, but IIUC this PR doesn't have to do that.

Yup, removed.

pkg/sql/execinfra/version.go, line 42 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

These also need to be bumped (we forgot to do it for version 56).

🤦

pkg/sql/execinfrapb/flow_diagram.go, line 155 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

If we need to scan many columns, then the box will become pretty wide. Thoughts on writing one column per line? Or maybe have a limit of, say of up to 100 characters per line?

Done.

pkg/sql/row/fetcher.go, line 1193 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Does col.GetType() return the correct type for the inverted column? It might be worth making sure that inverted_.* logic tests run through the row-by-row engine correctly (maybe they already do).

Yeah, we're passing in the full column descriptors and the callers do the inverted column type replacement before setting up the fetcher.

pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 730 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This seems suspicious - we're scanning INDEX zyx (z, y, x), and only need the value of x, so I think the previous output was correct.

In the optimizer, if a scan is constrained it must return the columns it is constrained on. This is why the plan above has the scan returning x,y,z and then it projects away column x. We could fix that in the optimizer if it's a problem - filed #75153.

pkg/sql/opt/exec/execbuilder/testdata/explain_analyze_plans, line 338 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

These changes to stats also seem suspicious.

Great catch, I didn't notice that! I did not override the new method in the rowFetcherStatCollector. I now unembedded the row.Fetcher so that we have to implement all methods explicitly.

pkg/sql/opt/exec/execbuilder/testdata/vectorize_local, line 72 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

ditto

Fixed.

yuzefovich

Reviewed 2 of 74 files at r2, 10 of 10 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @stevendanna)

pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 730 at r2 (raw file):

Previously, RaduBerinde wrote…

In the optimizer, if a scan is constrained it must return the columns it is constrained on. This is why the plan above has the scan returning x,y,z and then it projects away column x. We could fix that in the optimizer if it's a problem - filed #75153.

I see, thanks.

pkg/sql/execinfrapb/flow_diagram_external_test.go, line 1 at r3 (raw file):

// Copyright 2018 The Cockroach Authors.

nit: s/2018/2022/.

pkg/sql/execinfrapb/flow_diagram_external_test.go, line 48 at r3 (raw file):

		  this_is_a_very_long_name_15 INT,
		  this_is_a_very_long_name_16 INT,
			x INT,

super nit: the indentation is different.

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @stevendanna and @yuzefovich)

pkg/sql/execinfrapb/flow_diagram_external_test.go, line 48 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

super nit: the indentation is different.

Done.

The internal columns of the TableReader (as well as the row fetcher) are all the columns of the table, with only a subset of values actually produced. This design decision has been carried over way past the point where it makes sense (I admit, it's questionable whether it ever made sense). For one, "all the columns" is ambiguous (does it contain non-public columns? does it include system columns?) leading to various flags and inherent fragility. Second, it relies on the execution engine to figure out (based on the PostProcessSpec) which columns are actually needed, which the optimizer already figures out for us now. This commit changes the TableReader spec and the interface of row.Fetcher to always produce a given specific set of column IDs. The diagram for table readers now specifies the columns by name. The JoinReader, InvertedJoiner, ZigzagJoiner are not changed in this commit (but they should be cleaned up as well). Release note (sql change): The distributed plan diagram now lists scanned column names for TableReaders.

RaduBerinde · 2022-01-20T20:10:03Z

bors r+

craig · 2022-01-20T21:19:09Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2022-01-21T03:16:37Z

Build succeeded:

GitHub CI (Cockroach)

RaduBerinde requested a review from yuzefovich January 18, 2022 23:16

RaduBerinde requested review from a team as code owners January 18, 2022 23:16

RaduBerinde requested a review from a team January 18, 2022 23:16

RaduBerinde requested a review from a team as a code owner January 18, 2022 23:16

RaduBerinde requested review from stevendanna and removed request for a team January 18, 2022 23:16

RaduBerinde force-pushed the tablereader-columns branch from e9abbf5 to 425f9be Compare January 19, 2022 02:00

yuzefovich reviewed Jan 19, 2022

View reviewed changes

RaduBerinde force-pushed the tablereader-columns branch from 425f9be to fe3f650 Compare January 19, 2022 16:41

RaduBerinde commented Jan 19, 2022

View reviewed changes

yuzefovich approved these changes Jan 19, 2022

View reviewed changes

RaduBerinde force-pushed the tablereader-columns branch from fe3f650 to 673d88c Compare January 19, 2022 20:39

RaduBerinde commented Jan 19, 2022

View reviewed changes

RaduBerinde force-pushed the tablereader-columns branch from 673d88c to f8aecf2 Compare January 20, 2022 01:55

RaduBerinde force-pushed the tablereader-columns branch from f8aecf2 to 059d5d8 Compare January 20, 2022 17:32

craig bot merged commit 488a06b into cockroachdb:master Jan 21, 2022

cockroach-teamcity mentioned this pull request Jan 21, 2022

sql: directly specify columns in TableReader cockroachdb/docs#12791

Closed

RaduBerinde deleted the tablereader-columns branch January 27, 2022 20:51

ajwerner mentioned this pull request Jun 1, 2022

sql,kv,storage: push column batch generation into kvserver #82323

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: directly specify columns in TableReader#75114

sql: directly specify columns in TableReader#75114
craig[bot] merged 1 commit intocockroachdb:masterfrom
RaduBerinde:tablereader-columns

RaduBerinde commented Jan 18, 2022 •

edited

Loading

Uh oh!

cockroach-teamcity commented Jan 18, 2022

Uh oh!

yuzefovich left a comment

Uh oh!

RaduBerinde left a comment

Uh oh!

yuzefovich left a comment

Uh oh!

RaduBerinde left a comment

Uh oh!

RaduBerinde commented Jan 20, 2022

Uh oh!

craig bot commented Jan 20, 2022

Uh oh!

craig bot commented Jan 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RaduBerinde commented Jan 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Jan 18, 2022

Uh oh!

yuzefovich left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

yuzefovich left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde commented Jan 20, 2022

Uh oh!

craig bot commented Jan 20, 2022

Uh oh!

craig bot commented Jan 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RaduBerinde commented Jan 18, 2022 •

edited

Loading