Skip to content

sql: use inaccessible columns for expression indexes#66620

Merged
craig[bot] merged 6 commits into
cockroachdb:masterfrom
mgartner:inaccessible-columns
Jun 23, 2021
Merged

sql: use inaccessible columns for expression indexes#66620
craig[bot] merged 6 commits into
cockroachdb:masterfrom
mgartner:inaccessible-columns

Conversation

@mgartner

@mgartner mgartner commented Jun 18, 2021

Copy link
Copy Markdown
Contributor

sql: rename expression-based indexes to expression indexes

This commit replaces the wordy references to "expression-based indexes"
with "expression indexes", matching the related RFC.

Release note: None

sql: add inaccessible columns

This commit adds the Inaccessible boolean field to column descriptors.
An inaccessible column does not appear in star expansion and cannot be
referenced in queries. It cannot be viewed when inspecting a table via
SHOW CREATE TABLE and is not shown as an attribute of its table in
pg_catalog.pg_attribute. A column cannot be both hidden and
inaccessible.

Inaccessible columns will be used for implementing expression indexes.

Release note: None

sql: use inaccessible columns for expression indexes

Expression indexes (a work-in-progress) now use inaccessible virtual
computed columns to represent expressions. This commit also makes some
expression index related errors more clear.

Release note: None

sql: improve type-related expression index errors

This commit improves error messages caused by mistyped expressions in
expression indexes.

Release note: None

sql: add docs URL to inverted index errors

This commit adds the inverted index documentation URL to errors related
to inverted indexes. Previously, the error message directed the user to
the documentation without a URL, putting the burden on them to track
down the correct webpage.

Release note: None

sql: add AccessibleColumns method to TableDescriptor interface

The TableDescriptor interface now includes an AccessibleColumns
method that returns all public column with Inaccessible=false. This
method simplifies code that needs to access only accessible columns.

Release note: None

@mgartner mgartner requested a review from a team June 18, 2021 04:21
@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@RaduBerinde RaduBerinde left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was playing around with the change but something isn't working:

root@127.67.216.62:26257/defaultdb> set experimental_enable_expression_indexes = true;
SET

Time: 0ms total (execution 0ms / network 0ms)

root@127.67.216.62:26257/defaultdb> create table t (a int, b int, index((a+b)));
ERROR: unimplemented: only simple columns are supported as index elements
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
See: https://go.crdb.dev/issue-v/9682/v21.2

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/opt_catalog.go, line 678 at r2 (raw file):

			if col.IsInaccessible() {
				visibility = cat.Inaccessible
			}

[nit] maybe swap the order and use else if


pkg/sql/logictest/testdata/logic_test/expression_index, line 59 at r4 (raw file):

CREATE INDEX err ON t ((comp + 10))

statement error type of index element NULL is ambiguous.*\nHINT: consider adding a type cast to the expression

We should add at least one test with one of these errors when the index is defined directly in CREATE TABLE.


pkg/sql/logictest/testdata/logic_test/expression_index, line 75 at r4 (raw file):


statement error index element a \+ b of type int is not allowed as the last column in an inverted index.*\nHINT: see the documentation for more information about inverted indexes
CREATE INVERTED INDEX err ON t (a, (a + b));

We should add a test with a tuple. ON (a, ((a+b))) or ON (a, (row (a,b))) might do it

@RaduBerinde RaduBerinde left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, maybe we don't yet support them in CREATE TABLE statements?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)

@mgartner mgartner force-pushed the inaccessible-columns branch from fd68a86 to db67862 Compare June 18, 2021 17:18

@mgartner mgartner left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, maybe we don't yet support them in CREATE TABLE statements?

Correct. I'll add that in #65703. I wanted to bring the CREATE INDEX behavior up to spec before reworking that PR.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)


pkg/sql/opt_catalog.go, line 678 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] maybe swap the order and use else if

Done.


pkg/sql/logictest/testdata/logic_test/expression_index, line 59 at r4 (raw file):

Previously, RaduBerinde wrote…

We should add at least one test with one of these errors when the index is defined directly in CREATE TABLE.

Absolutely. Will do in #65703.


pkg/sql/logictest/testdata/logic_test/expression_index, line 75 at r4 (raw file):

Previously, RaduBerinde wrote…

We should add a test with a tuple. ON (a, ((a+b))) or ON (a, (row (a,b))) might do it

Done.

@RaduBerinde RaduBerinde left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 9 of 15 files at r1, 2 of 22 files at r5, 10 of 10 files at r6, 8 of 10 files at r7, 5 of 5 files at r8.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/exec_util.go, line 369 at r8 (raw file):

var experimentalExpressionIndexesMode = settings.RegisterBoolSetting(
	"sql.defaults.experimental_expression_indexes.enabled",

We should add the old setting to retiredSettings in settings package.

@mgartner mgartner force-pushed the inaccessible-columns branch from db67862 to d897347 Compare June 19, 2021 00:31

@mgartner mgartner left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde)


pkg/sql/exec_util.go, line 369 at r8 (raw file):

Previously, RaduBerinde wrote…

We should add the old setting to retiredSettings in settings package.

Done. Looks like we forgot to do this for other retired experimental settings like the settings for partial indexes and partitioned inverted index 😬

@mgartner mgartner force-pushed the inaccessible-columns branch from d897347 to 42914ae Compare June 19, 2021 20:28
@mgartner mgartner requested a review from a team as a code owner June 19, 2021 20:28
@mgartner mgartner requested review from a team June 19, 2021 20:28

@rytaft rytaft left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 9 of 15 files at r1, 2 of 22 files at r5, 24 of 24 files at r9, 11 of 11 files at r10, 11 of 11 files at r11, 6 of 6 files at r12.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)


pkg/sql/create_index.go, line 396 at r12 (raw file):

							typ.Name(),
						),
						"see the documentation for more information about inverted indexes",

is there a specific docs page we can link to?


pkg/sql/show_create.go, line 87 at r10 (raw file):

	for i, col := range desc.PublicColumns() {
		// Inaccessible columns are not displayed in SHOW CREATE TABLE.
		if col.IsInaccessible() {

Is there a way to make these columns not get returned by desc.PublicColumns()? Or is that more trouble than it's worth?


pkg/sql/logictest/testdata/logic_test/pg_catalog, line 467 at r11 (raw file):

61          t6                                                        2332901747    0        0          1546506610  2631952481  0            0
4179599057  primary                                                   2332901747    0        0          1546506610  2631952481  0            0
4179599058  t6_crdb_internal_idx_expr_idx                             2332901747    0        0          1546506610  2631952481  0            0

Are you planning to remove crdb_internal from the index name? Seems like it adds a lot of unnecessary clunkiness.

@mgartner mgartner left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)


pkg/sql/create_index.go, line 396 at r12 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

is there a specific docs page we can link to?

Good idea. Added the URL to these errors messages in a new commit.


pkg/sql/show_create.go, line 87 at r10 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Is there a way to make these columns not get returned by desc.PublicColumns()? Or is that more trouble than it's worth?

Great question. You might remember that I tried to do this when I originally attempted to create inaccessible columns about a month ago. It required some unsavory changes to the tabledesc.newColumnCache.

It seems that "public" designates where a column sits on the mutation axis: public, writable, deletable, not the visibility axis: visible, hidden, inaccessible. The documentation for the Column interface's Public method states that a public column is any "active, readable column in the table descriptor". I think the name "public" is causing confusion for us. Maybe "active" is a better description. I'm happy to make that change if we can get consensus on a more clear name.

For now, I've added an AccessibleColumns function that returns public, inaccessible=false columns. This cleans up a bit of code.


pkg/sql/logictest/testdata/logic_test/pg_catalog, line 467 at r11 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Are you planning to remove crdb_internal from the index name? Seems like it adds a lot of unnecessary clunkiness.

Yes. I'm planning on following Postgres's naming convention for anonymous expression indexes in a future PR:

marcus=# CREATE TABLE t (a INT, b INT);
CREATE TABLE

marcus=# CREATE INDEX ON t ((a + b));
CREATE INDEX

marcus=# CREATE INDEX ON t ((a + b), (b + 100));
CREATE INDEX

marcus=# \d t
                 Table "public.t"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          |
 b      | integer |           |          |
Indexes:
    "t_expr_expr1_idx" btree ((a + b), (b + 100))
    "t_expr_idx" btree ((a + b))

@rytaft rytaft left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r13, 6 of 6 files at r14.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner)


pkg/sql/show_create.go, line 87 at r10 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Great question. You might remember that I tried to do this when I originally attempted to create inaccessible columns about a month ago. It required some unsavory changes to the tabledesc.newColumnCache.

It seems that "public" designates where a column sits on the mutation axis: public, writable, deletable, not the visibility axis: visible, hidden, inaccessible. The documentation for the Column interface's Public method states that a public column is any "active, readable column in the table descriptor". I think the name "public" is causing confusion for us. Maybe "active" is a better description. I'm happy to make that change if we can get consensus on a more clear name.

For now, I've added an AccessibleColumns function that returns public, inaccessible=false columns. This cleans up a bit of code.

Nice!


pkg/sql/catalog/tabledesc/table_desc.go, line 362 at r14 (raw file):

// VisibleColumns returns a slice of Column interfaces containing the
// table's visible columns, in the canonical order. Visible columns are public
// columns with Hidden=false.

Does this mean inaccessible columns would be included here? Is that desirable?


pkg/sql/logictest/testdata/logic_test/pg_catalog, line 467 at r11 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Yes. I'm planning on following Postgres's naming convention for anonymous expression indexes in a future PR:

marcus=# CREATE TABLE t (a INT, b INT);
CREATE TABLE

marcus=# CREATE INDEX ON t ((a + b));
CREATE INDEX

marcus=# CREATE INDEX ON t ((a + b), (b + 100));
CREATE INDEX

marcus=# \d t
                 Table "public.t"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          |
 b      | integer |           |          |
Indexes:
    "t_expr_expr1_idx" btree ((a + b), (b + 100))
    "t_expr_idx" btree ((a + b))

Sounds good

@otan

otan commented Jun 21, 2021

Copy link
Copy Markdown
Contributor

pkg/sql/create_index.go, line 396 at r14 (raw file):

							typ.Name(),
						),
						"see the documentation for more information about inverted indexes: https://www.cockroachlabs.com/docs/stable/inverted-indexes.html",

nit: use docs.URL

snoop otan out

@mgartner mgartner left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)


pkg/sql/create_index.go, line 396 at r14 (raw file):

Previously, otan (Oliver Tan) wrote…

nit: use docs.URL

snoop otan out

Brilliant. Thanks!


pkg/sql/catalog/tabledesc/table_desc.go, line 362 at r14 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Does this mean inaccessible columns would be included here? Is that desirable?

Good point. I don't think it is desirable. The only place this function is used is in the context of IMPORT statements, as far as I can tell. In those cases, it is used to determine which columns to write to. Inaccessible columns are all computed columns (at least for now), so it is invalid to write to them directly. I've updated the function so that it only returns hidden=false and inaccessible=false columns.

Following these code paths allowed to me to discover a bug whenIMPORT-ing into a table with virtual computed columns: #66694. I'm glad you drew our attention to this!

mgartner added 4 commits June 22, 2021 08:41
This commit replaces the wordy references to "expression-based indexes"
with "expression indexes", matching the related RFC.

Release note: None
This commit adds the `Inaccessible` boolean field to column descriptors.
An inaccessible column does not appear in star expansion and cannot be
referenced in queries. It cannot be viewed when inspecting a table via
SHOW CREATE TABLE and is not shown as an attribute of its table in
pg_catalog.pg_attribute. A column cannot be both hidden and
inaccessible.

Inaccessible columns will be used for implementing expression indexes.

Release note: None
Expression indexes (a work-in-progress) now use inaccessible virtual
computed columns to represent expressions. This commit also makes some
expression index related errors more clear.

Release note: None
This commit improves error messages caused by mistyped expressions in
expression indexes.

Release note: None
@mgartner mgartner force-pushed the inaccessible-columns branch from 9873fc4 to edcff6b Compare June 22, 2021 15:45
@mgartner

mgartner commented Jun 22, 2021

Copy link
Copy Markdown
Contributor Author

When I get the build to pass, I'll go ahead and merge this for now. I will address any late comments in future PRs—there's still a lot of little details I'll continue to work on to get expression indexes working as expected.

mgartner added 2 commits June 22, 2021 12:51
This commit adds the inverted index documentation URL to errors related
to inverted indexes. Previously, the error message directed the user to
the documentation without a URL, putting the burden on them to track
down the correct webpage.

Release note: None
The `TableDescriptor` interface now includes an `AccessibleColumns`
method that returns all public column with `Inaccessible=false`. This
method simplifies code that needs to access only accessible columns.

Release note: None
@mgartner mgartner force-pushed the inaccessible-columns branch from edcff6b to dd3019c Compare June 22, 2021 19:51

@rytaft rytaft left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 33 of 33 files at r15, 11 of 11 files at r16, 11 of 11 files at r17, 6 of 6 files at r18, 5 of 5 files at r19, 5 of 6 files at r20, 7 of 7 files at r21, 6 of 6 files at r22.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)


pkg/sql/catalog/tabledesc/table_desc.go, line 362 at r14 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Good point. I don't think it is desirable. The only place this function is used is in the context of IMPORT statements, as far as I can tell. In those cases, it is used to determine which columns to write to. Inaccessible columns are all computed columns (at least for now), so it is invalid to write to them directly. I've updated the function so that it only returns hidden=false and inaccessible=false columns.

Following these code paths allowed to me to discover a bug whenIMPORT-ing into a table with virtual computed columns: #66694. I'm glad you drew our attention to this!

Nice!

@mgartner

Copy link
Copy Markdown
Contributor Author

TFTRs!

bors r+

@mgartner

Copy link
Copy Markdown
Contributor Author

bors r+

@craig

craig Bot commented Jun 23, 2021

Copy link
Copy Markdown
Contributor

Already running a review

@craig

craig Bot commented Jun 23, 2021

Copy link
Copy Markdown
Contributor

Build succeeded:

@craig craig Bot merged commit ebc0046 into cockroachdb:master Jun 23, 2021
@mgartner mgartner deleted the inaccessible-columns branch June 23, 2021 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants