Skip to content

opt: synthesize check constraints on enum columns#49284

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
rohany:enum-checks
Jun 15, 2020
Merged

opt: synthesize check constraints on enum columns#49284
craig[bot] merged 1 commit intocockroachdb:masterfrom
rohany:enum-checks

Conversation

@rohany
Copy link
Copy Markdown
Contributor

@rohany rohany commented May 19, 2020

Fixes #49263.

This PR teaches the optimizer how to synthesize check constraints on
columns of an ENUM type, allowing queries like:

CREATE TYPE t AS ENUM ('howdy', 'hello');
CREATE TABLE tt (x t, y INT, PRIMARY KEY (x, y));
SELECT x, y FROM tt WHERE y = 2

to be planned using constrained spans on the enum values, rather than a
full table scan.

Release note (performance improvement): Allow the optimizer to use enum
information to generate better query plans.

@rohany rohany requested review from RaduBerinde and jordanlewis May 19, 2020 19:33
@rohany rohany requested a review from a team as a code owner May 19, 2020 19:33
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@rohany
Copy link
Copy Markdown
Contributor Author

rohany commented May 19, 2020

Left some todos in here that need to get resolved as well.

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rohany)


pkg/sql/opt_catalog.go, line 540 at r1 (raw file):

	inboundFKs  []optForeignKeyConstraint

	// checkConstraints is the set of synthesized check constraints for this

not just synthesized -- you've included all constraints below


pkg/sql/opt_catalog.go, line 663 at r1 (raw file):

	}
	// Move all existing and synthesized checks into the opt table.
	ot.checkConstraints = make([]cat.CheckConstraint, 0, len(desc.Checks)+len(synthesizedChecks))

why were we looking at desc.ActiveChecks() before, but now this is only looking at desc.Checks?


pkg/sql/resolver.go, line 178 at r1 (raw file):

		return nil, errors.AssertionFailedf("%s was not a type descriptor", rawDesc)
	}
	// TODO (rohany): Should we perform lookups on the parent database andschema

andschema -> and schema

Copy link
Copy Markdown
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rytaft)


pkg/sql/opt_catalog.go, line 540 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

not just synthesized -- you've included all constraints below

Done.


pkg/sql/opt_catalog.go, line 663 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

why were we looking at desc.ActiveChecks() before, but now this is only looking at desc.Checks?

Not sure -- looking at the code it seems to be the same thing, but I'll stay consistent.


pkg/sql/resolver.go, line 178 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

andschema -> and schema

Done.

Copy link
Copy Markdown
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rytaft)


pkg/sql/opt/exec/execbuilder/testdata/enums, line 54 at r2 (raw file):

      └── [/'hi' - /'hi']

# TODO (rohany): Why is the NOT NULL important here? I couldn't get this

Do you know the answer to this?

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/opt/exec/execbuilder/testdata/enums, line 54 at r2 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Do you know the answer to this?

I think we just haven't added support for it yet, but I think we probably could... @RaduBerinde probably knows better since he was working with check constraints and their interactions with nulls recently....

@rytaft
Copy link
Copy Markdown
Collaborator

rytaft commented May 21, 2020


pkg/sql/opt/exec/execbuilder/testdata/enums, line 54 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think we just haven't added support for it yet, but I think we probably could... @RaduBerinde probably knows better since he was working with check constraints and their interactions with nulls recently....

This comment is probably relevant:

		// Check constraints that are guaranteed to not evaluate to NULL
		// are the only ones converted into filters. This is because a NULL
		// constraint is interpreted as passing, whereas a NULL filter is not.

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)


pkg/sql/opt/exec/execbuilder/testdata/enums, line 54 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This comment is probably relevant:

		// Check constraints that are guaranteed to not evaluate to NULL
		// are the only ones converted into filters. This is because a NULL
		// constraint is interpreted as passing, whereas a NULL filter is not.

The problem is that CHECK (expr) "passes" if the expression evaluates to NULL so we can't blindly convert it to a filter. We could do a better job by adding an OR x IS NULL in this case (in general, in cases where the result is NULL iff a given variable is NULL).

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)


pkg/sql/opt/exec/execbuilder/testdata/enums, line 54 at r2 (raw file):

Previously, RaduBerinde wrote…

The problem is that CHECK (expr) "passes" if the expression evaluates to NULL so we can't blindly convert it to a filter. We could do a better job by adding an OR x IS NULL in this case (in general, in cases where the result is NULL iff a given variable is NULL).

If you want to play around with that, this is where that logic would live: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optbuilder/select.go#L599

@rohany
Copy link
Copy Markdown
Contributor Author

rohany commented May 21, 2020

TFTRs! I'm going to hold off on merging this until some other things I'm working on answer the TODO's in here.

If you want to play around with that, this is where that logic would live: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optbuilder/select.go#L599

Sure, it would be fun to take a look at that after.

Copy link
Copy Markdown
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome - can't wait to see this working in concert with partitioning!

@rohany
Copy link
Copy Markdown
Contributor Author

rohany commented May 27, 2020

I've been thinking about how these checks can interact with the alter type command. For context, when we add a value to an enum type, it doesn't directly become "writeable". We first need to make sure that all nodes know how to interpret an enum value before we write it, so the enum members themselves go through a mini schema change, where they progress from "readable" to "writeable".

It seems like the check constraints in the optimizer can catch most cases of attempting to insert a value that is only in the readable state. However, we cannot restrict this check constraint to be only the readable values of the enum -- another node with the value in the writeable state could have inserted that value before the current node has the value in the writeable state. So it seems like we want to instead have a notion of "checks to enforce on reads" and "checks to enforce on writes" in the optimizer. Does this make sense?

@rohany
Copy link
Copy Markdown
Contributor Author

rohany commented Jun 1, 2020

hey @RaduBerinde, @rytaft did you get a chance to read my comment here?

@RaduBerinde
Copy link
Copy Markdown
Member

That sounds right to me, we would need the check constraints to enforce on writes to be different than the check constraints that we can use to make assumptions about existing data. They are kind of different already, in that the latter are a subset (because of non-validated constraints, or because of NULL handling in constraint conditions).

@RaduBerinde
Copy link
Copy Markdown
Member

As a potential (maybe temporary) solution to avoid correctness issues, we can use a big hammer and treat any constraint involving an enum that is being changed as non-validated.

@mgartner mgartner self-requested a review June 12, 2020 16:57
@rohany
Copy link
Copy Markdown
Contributor Author

rohany commented Jun 15, 2020

As a potential (maybe temporary) solution to avoid correctness issues, we can use a big hammer and treat any constraint involving an enum that is being changed as non-validated.

I think that this is the correct solution. It looks like the optimizer already has logic of using only validated constraints on reads, so I can just piggyback onto that here. I'm planning on just synthesizing an extra unvalidated constraint containing the "read only" enum members to be verified on mutation operations, and then keeping the validated constraint with all members of the enum to be used on reads.

Given that I don't have to do a bigger change here when ALTER TYPE is done, I'm comfortable with merging this.

Can y'all take a quick second look at this?

This PR teaches the optimizer how to synthesize check constraints on
columns of an ENUM type, allowing queries like:

```
CREATE TYPE t AS ENUM ('howdy', 'hello');
CREATE TABLE tt (x t, y INT, PRIMARY KEY (x, y));
SELECT x, y FROM tt WHERE y = 2
```

to be planned using constrained spans on the enum values, rather than a
full table scan.

Release note (performance improvement): Allow the optimizer to use enum
information to generate better query plans.
Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 7 of 7 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rohany)

@rohany
Copy link
Copy Markdown
Contributor Author

rohany commented Jun 15, 2020

TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 15, 2020

Build succeeded

@craig craig bot merged commit 5cff000 into cockroachdb:master Jun 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

opt: teach the optimizer to generate check constraints on enum columns

5 participants