Skip to content

sql: directly specify columns in TableReader#75114

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
RaduBerinde:tablereader-columns
Jan 21, 2022
Merged

sql: directly specify columns in TableReader#75114
craig[bot] merged 1 commit intocockroachdb:masterfrom
RaduBerinde:tablereader-columns

Conversation

@RaduBerinde
Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde commented Jan 18, 2022

Note: the first commit is #74922.

The internal columns of the TableReader (as well as the row fetcher)
are all the columns of the table, with only a subset of values
actually produced. This design decision has been carried over way past
the point where it makes sense (I admit, it's questionable whether it
ever made sense). For one, "all the columns" is ambiguous (does it
contain non-public columns? does it include system columns?) leading
to various flags and inherent fragility. Second, it relies on the
execution engine to figure out (based on the PostProcessSpec) which
columns are actually needed, which the optimizer already figures out
for us now.

This commit changes the TableReader spec and the interface of
row.Fetcher to always produce a given specific set of column IDs. The
diagram for table readers now specifies the columns by name.

The JoinReader, InvertedJoiner, ZigzagJoiner are not changed in this
commit (but they should be cleaned up as well).

Release note: None

@RaduBerinde RaduBerinde requested review from a team as code owners January 18, 2022 23:16
@RaduBerinde RaduBerinde requested a review from a team January 18, 2022 23:16
@RaduBerinde RaduBerinde requested a review from a team as a code owner January 18, 2022 23:16
@RaduBerinde RaduBerinde requested review from stevendanna and removed request for a team January 18, 2022 23:16
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very happy to see this happening, great work!

Reviewed 72 of 74 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @stevendanna)


-- commits, line 35 at r2:
nit: it might be worth to call out this change with columns by name in a release note.


pkg/sql/colfetcher/colbatch_scan.go, line 210 at r2 (raw file):

	// Make sure that render expressions are deserialized right away so that we
	// don't have to re-parse them multiple times.
	if post.RenderExprs != nil {

Did you add this because it was in populateTableArgsLegacy? I think we should not include this here - over there, we need the render expressions to be deserialized because we might remap the indexed vars, but IIUC this PR doesn't have to do that.


pkg/sql/execinfra/version.go, line 42 at r2 (raw file):

// ATTENTION: When updating these fields, add a brief description of what
// changed to the version history below.
const Version execinfrapb.DistSQLVersion = 55

These also need to be bumped (we forgot to do it for version 56).


pkg/sql/execinfrapb/flow_diagram.go, line 155 at r2 (raw file):

		col, err := tbl.FindColumnWithID(colID)
		if i > 0 {
			b.WriteByte(',')

If we need to scan many columns, then the box will become pretty wide. Thoughts on writing one column per line? Or maybe have a limit of, say of up to 100 characters per line?


pkg/sql/row/fetcher.go, line 294 at r2 (raw file):

	var err error

	// Scan through the entire columns map to see which columns are

nit: the comment needs an update.


pkg/sql/row/fetcher.go, line 1112 at r2 (raw file):

// NextRowInto calls NextRow and copies the results into the given EncDatumRow
// slice according to the given column map.
// mapping.

nit: leftover.


pkg/sql/row/fetcher.go, line 1167 at r2 (raw file):

// mapping.
//
// Values for columns that are not in the map are ignored. cDatums in

nit: s/cDatums/Datums/.


pkg/sql/row/fetcher.go, line 1193 at r2 (raw file):

			continue
		}
		if err := encDatum.EnsureDecoded(col.GetType(), rf.alloc); err != nil {

Does col.GetType() return the correct type for the inverted column? It might be worth making sure that inverted_.* logic tests run through the row-by-row engine correctly (maybe they already do).


pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 730 at r2 (raw file):

 ORDER BY message LIKE 'fetched:%' DESC, ordinality ASC
----
fetched: /xyz/zyx/3.0/2/1 -> <undecoded>

This seems suspicious - we're scanning INDEX zyx (z, y, x), and only need the value of x, so I think the previous output was correct.


pkg/sql/opt/exec/execbuilder/testdata/explain_analyze_plans, line 338 at r2 (raw file):

distribution: <hidden>
vectorized: <hidden>
rows read from KV: 1 (8 B)

These changes to stats also seem suspicious.


pkg/sql/opt/exec/execbuilder/testdata/schema_change_in_txn_nonmetamorphic, line 34 at r2 (raw file):

CPut /Table/56/1/101/1/1 -> /TUPLE/4:4:Decimal/0.00/2:6:Decimal/1101.2

query T

nit: this seems like an unrelated addition of a test.


pkg/sql/opt/exec/execbuilder/testdata/vectorize_local, line 72 at r2 (raw file):

distribution: <hidden>
vectorized: <hidden>
rows read from KV: 2 (16 B)

ditto

Copy link
Copy Markdown
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @stevendanna and @yuzefovich)


pkg/sql/colfetcher/colbatch_scan.go, line 210 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Did you add this because it was in populateTableArgsLegacy? I think we should not include this here - over there, we need the render expressions to be deserialized because we might remap the indexed vars, but IIUC this PR doesn't have to do that.

Yup, removed.


pkg/sql/execinfra/version.go, line 42 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

These also need to be bumped (we forgot to do it for version 56).

🤦


pkg/sql/execinfrapb/flow_diagram.go, line 155 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

If we need to scan many columns, then the box will become pretty wide. Thoughts on writing one column per line? Or maybe have a limit of, say of up to 100 characters per line?

Done.


pkg/sql/row/fetcher.go, line 1193 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Does col.GetType() return the correct type for the inverted column? It might be worth making sure that inverted_.* logic tests run through the row-by-row engine correctly (maybe they already do).

Yeah, we're passing in the full column descriptors and the callers do the inverted column type replacement before setting up the fetcher.


pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 730 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This seems suspicious - we're scanning INDEX zyx (z, y, x), and only need the value of x, so I think the previous output was correct.

In the optimizer, if a scan is constrained it must return the columns it is constrained on. This is why the plan above has the scan returning x,y,z and then it projects away column x. We could fix that in the optimizer if it's a problem - filed #75153.


pkg/sql/opt/exec/execbuilder/testdata/explain_analyze_plans, line 338 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

These changes to stats also seem suspicious.

Great catch, I didn't notice that! I did not override the new method in the rowFetcherStatCollector. I now unembedded the row.Fetcher so that we have to implement all methods explicitly.


pkg/sql/opt/exec/execbuilder/testdata/vectorize_local, line 72 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

ditto

Fixed.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 74 files at r2, 10 of 10 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @stevendanna)


pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 730 at r2 (raw file):

Previously, RaduBerinde wrote…

In the optimizer, if a scan is constrained it must return the columns it is constrained on. This is why the plan above has the scan returning x,y,z and then it projects away column x. We could fix that in the optimizer if it's a problem - filed #75153.

I see, thanks.


pkg/sql/execinfrapb/flow_diagram_external_test.go, line 1 at r3 (raw file):

// Copyright 2018 The Cockroach Authors.

nit: s/2018/2022/.


pkg/sql/execinfrapb/flow_diagram_external_test.go, line 48 at r3 (raw file):

		  this_is_a_very_long_name_15 INT,
		  this_is_a_very_long_name_16 INT,
			x INT,

super nit: the indentation is different.

Copy link
Copy Markdown
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @stevendanna and @yuzefovich)


pkg/sql/execinfrapb/flow_diagram_external_test.go, line 48 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

super nit: the indentation is different.

Done.

The internal columns of the TableReader (as well as the row fetcher)
are all the columns of the table, with only a subset of values
actually produced. This design decision has been carried over way past
the point where it makes sense (I admit, it's questionable whether it
ever made sense). For one, "all the columns" is ambiguous (does it
contain non-public columns? does it include system columns?) leading
to various flags and inherent fragility. Second, it relies on the
execution engine to figure out (based on the PostProcessSpec) which
columns are actually needed, which the optimizer already figures out
for us now.

This commit changes the TableReader spec and the interface of
row.Fetcher to always produce a given specific set of column IDs. The
diagram for table readers now specifies the columns by name.

The JoinReader, InvertedJoiner, ZigzagJoiner are not changed in this
commit (but they should be cleaned up as well).

Release note (sql change): The distributed plan diagram now lists
scanned column names for TableReaders.
@RaduBerinde
Copy link
Copy Markdown
Member Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 20, 2022

Build failed (retrying...):

@craig craig bot merged commit 488a06b into cockroachdb:master Jan 21, 2022
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 21, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants