Skip to content

sql: implement virtual index lookup join#48226

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
jordanlewis:virtual-lookup
May 10, 2020
Merged

sql: implement virtual index lookup join#48226
craig[bot] merged 2 commits intocockroachdb:masterfrom
jordanlewis:virtual-lookup

Conversation

@jordanlewis
Copy link
Copy Markdown
Member

@jordanlewis jordanlewis commented Apr 30, 2020

This commit adds the capability to plan inner lookup joins into virtual
tables with indexes. This should be a big boon to certain complex,
chained joins that tend to come up from ORMs querying database metadata.

Closes #48178.

Release note (performance improvement): the optimizer and execution
engine can now plan lookup joins into virtual indexes, avoiding full
scans against virtual tables.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@rohany
Copy link
Copy Markdown
Contributor

rohany commented Apr 30, 2020

:pogchamp:

@jordanlewis
Copy link
Copy Markdown
Member Author

This still needs a lot of tests. Also, I need to figure out how to adjust the optimizer costing to nearly always prefer lookup joins against virtual indexes. Right now, for the big query we're testing, it still plans hash/merge joins.

The query is:

SELECT
        t2.oid::REGCLASS AS to_table,
        a1.attname AS column,
        a2.attname AS primary_key,
        c.conname AS name,
        c.confupdtype AS on_update,
        c.confdeltype AS on_delete,
        c.convalidated AS valid
FROM
        pg_constraint AS c
        JOIN pg_class AS t1 ON c.conrelid = t1.oid
        JOIN pg_class AS t2 ON c.confrelid = t2.oid
        JOIN pg_attribute AS a1 ON
                        (a1.attnum = c.conkey[1]) AND (a1.attrelid = t1.oid)
        JOIN pg_attribute AS a2 ON
                        (a2.attnum = c.confkey[1]) AND (a2.attrelid = t2.oid)
        JOIN pg_namespace AS t3 ON c.connamespace = t3.oid
WHERE
        ((c.contype = 'f') AND (t1.oid = 'b'::regclass))
        AND (t3.nspname = ANY (current_schemas(false)))
ORDER BY
        c.conname

cc @rafiss

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 30, 2020

❌ The GitHub CI (Cockroach) build has failed on 8ca4b0e0.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 30, 2020

❌ The GitHub CI (Cockroach) build has failed on d7936624.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@jordanlewis
Copy link
Copy Markdown
Member Author

@RaduBerinde, how would you recommend I adjust costs here? Perhaps the ideal thing would be to put some basic statistics in - we should have row counts on system.descriptor, and then maybe we could mark each of the virtual indexes that look up into system.descriptor about that row count.

Or, should we just up-weight the cost of scans in virtual tables overall?

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For virtual tables it would make sense to not distinguish costs between "random" access and sequential access. But I'd rather not complicate things too much. With no stats, we pretend they have 1000 rows, which should be plenty to prefer lookup joins in most cases (and we can increase it for vtables if need be). The query you pasted above looks like this with your change:

  sort
   └── project
        ├── inner-join (hash)
        │    ├── select
        │    │    ├── scan t3
        │    │    └── filters
        │    │         └── nspname = ANY current_schemas(false)
        │    ├── inner-join (lookup pg_attribute@pg_attribute_attrelid_idx)
        │    │    ├── lookup columns are key
        │    │    ├── project
        │    │    │    ├── inner-join (lookup pg_attribute@pg_attribute_attrelid_idx)
        │    │    │    │    ├── lookup columns are key
        │    │    │    │    ├── project
        │    │    │    │    │    ├── inner-join (lookup pg_class@pg_class_oid_idx)
        │    │    │    │    │    │    ├── lookup columns are key
        │    │    │    │    │    │    ├── inner-join (lookup pg_class@pg_class_oid_idx)
        │    │    │    │    │    │    │    ├── lookup columns are key
        │    │    │    │    │    │    │    ├── select
        │    │    │    │    │    │    │    │    ├── scan c
        │    │    │    │    │    │    │    │    └── filters
        │    │    │    │    │    │    │    │         ├── conrelid = 'b'::REGCLASS
        │    │    │    │    │    │    │    │         └── contype = 'f'
        │    │    │    │    │    │    │    └── filters (true)
        │    │    │    │    │    │    └── filters
        │    │    │    │    │    │         └── t1.oid = 'b'::REGCLASS
        │    │    │    │    │    └── projections
        │    │    │    │    │         └── conkey[1]
        │    │    │    │    └── filters
        │    │    │    │         ├── column108 = a1.attnum
        │    │    │    │         └── a1.attrelid = 'b'::REGCLASS
        │    │    │    └── projections
        │    │    │         └── confkey[1]
        │    │    └── filters
        │    │         └── column131 = a2.attnum
        │    └── filters
        │         └── connamespace = t3.oid
        └── projections

It's all lookup joins except one at the top (pg_namespace has no index).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rohany)

@jordanlewis
Copy link
Copy Markdown
Member Author

Huh, that's not the plan I got when I tried it. I don't understand.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented May 4, 2020

❌ The GitHub CI (Cockroach) build has failed on d2893cec.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented May 4, 2020

❌ The GitHub CI (Cockroach) build has failed on 55a4d540.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Copy Markdown
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real comments on the SQL exec portion -- this is super slick!

@jordanlewis jordanlewis force-pushed the virtual-lookup branch 2 times, most recently from e179e52 to 96276f7 Compare May 7, 2020 04:04
@jordanlewis
Copy link
Copy Markdown
Member Author

@RaduBerinde would you mind taking a quick look at this guy? Thanks!

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I've been pretty swamped. This looks great, nice work!

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)


pkg/sql/opt_exec_factory.go, line 718 at r1 (raw file):

		onCond = nil
	}
	var n exec.Node

[nit] can just do n := below


pkg/sql/opt_exec_factory.go, line 692 at r3 (raw file):

}

func (ef *execFactory) constructVirtualLookupJoin(

[nit] add a short comment or improve the naming to constructVirtualTableLookupJoin


pkg/sql/opt_exec_factory.go, line 707 at r3 (raw file):

	}
	if len(eqCols) > 1 {
		panic("vtable indexes with more than one column aren't supported yet")

why not just return an AssertionFailedf error


pkg/sql/virtual_schema.go, line 307 at r3 (raw file):

		return errors.AssertionFailedf("datum row count and column count differ: %d vs %d", r, c)
	}
	for i, col := range columns {

[nit] since this gets evaluated per row, maybe avoid the ResultColumn copy into col


pkg/sql/virtual_schema.go, line 320 at r3 (raw file):

	}
	return nil

[nit] extra line


pkg/sql/virtual_table.go, line 216 at r3 (raw file):

	// lookupCols is the projection on vtableCols to apply.
	lookupCols exec.TableColumnOrdinalSet

[nit] In other nodes, we separated the "runtime" state into another struct (which can be embedded). I think it has been helpful especially when just looking at planning/opt-factory side code.


pkg/sql/walk.go, line 945 at r3 (raw file):

	reflect.TypeOf(&valuesNode{}):            "values",
	reflect.TypeOf(&virtualTableNode{}):      "virtual table values",
	reflect.TypeOf(&vTableLookupJoinNode{}):  "virtual table lookup",

[nit] virtual table lookup join or virtual-table-lookup-join (for consistency with lookup-join)


pkg/sql/logictest/testdata/logic_test/orms, line 75 at r3 (raw file):

----
name              primary  unique  indkey  column_indexes  column_names  definition
customers_id_idx  false    false   2       {1,2}           {name,id}     CREATE INDEX customers_id_idx ON test.public.customers USING btree (id ASC)

Isn't this flaky, at least in principle? May want to add ORDER BY to the array_aggs


pkg/sql/opt/xform/custom_funcs.go, line 1323 at r3 (raw file):

	if md.Table(scanPrivate.Table).IsVirtualTable() {
		if joinType != opt.InnerJoinOp {
			return

It wouldn't be hard to support LeftJoin no? Not sure how useful though

@jordanlewis jordanlewis requested a review from a team as a code owner May 9, 2020 05:01
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented May 9, 2020

❌ The GitHub CI (Cockroach) build has failed on 9e4f6f17.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented May 9, 2020

❌ The GitHub CI (Cockroach) build has failed on 14ba9109.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Copy Markdown
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rohany)


pkg/sql/opt_exec_factory.go, line 707 at r3 (raw file):

Previously, RaduBerinde wrote…

why not just return an AssertionFailedf error

If this error fires it means the binary is basically corrupted, haha - if someone defines a vtable w/ composite index it should really blow up badly in tests rather than make a usable binary. I made this an assertion error though, i think we will fail with this same problem elsewhere loudly.


pkg/sql/virtual_table.go, line 274 at r2 (raw file):

Previously, rohany (Rohan Yadav) wrote…

[nit]: paren on next line

Done.


pkg/sql/virtual_table.go, line 216 at r3 (raw file):

Previously, RaduBerinde wrote…

[nit] In other nodes, we separated the "runtime" state into another struct (which can be embedded). I think it has been helpful especially when just looking at planning/opt-factory side code.

Good point, thanks. Done.


pkg/sql/logictest/testdata/logic_test/orms, line 75 at r3 (raw file):

Previously, RaduBerinde wrote…

Isn't this flaky, at least in principle? May want to add ORDER BY to the array_aggs

I want to keep this query as-is, it's a copy of a real world orm query. The fact that this changed is a little sad but I think ok.


pkg/sql/opt/xform/custom_funcs.go, line 1323 at r3 (raw file):

Previously, RaduBerinde wrote…

It wouldn't be hard to support LeftJoin no? Not sure how useful though

Fair, was just being lazy. Done, and added some tests (and a giant ORM query that uses outer lookup join, yay)

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented May 9, 2020

❌ The GitHub CI (Cockroach) build has failed on f0ad1733.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented May 9, 2020

❌ The GitHub CI (Cockroach) build has failed on f0ad1733.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)


pkg/sql/opt_exec_factory.go, line 707 at r3 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

If this error fires it means the binary is basically corrupted, haha - if someone defines a vtable w/ composite index it should really blow up badly in tests rather than make a usable binary. I made this an assertion error though, i think we will fail with this same problem elsewhere loudly.

Thanks! There could in principle be bugs anywhere along the way (in the catalog, in the optimizer, in the exec builder, in the exec factory). An assertion error should be no less visible than a panic, especially in tests.


pkg/sql/logictest/testdata/logic_test/orms, line 75 at r3 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I want to keep this query as-is, it's a copy of a real world orm query. The fact that this changed is a little sad but I think ok.

Fine with me, if you think the output is deterministic (I don't see why not since it's all executed on a single node).


pkg/sql/opt/xform/custom_funcs.go, line 1323 at r3 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Fair, was just being lazy. Done, and added some tests (and a giant ORM query that uses outer lookup join, yay)

Awesome!!j

Commit 4047d45 introduced a bug which
caused reads from pg_aggregate to mistakenly change the 0 OID to have a
"-" name alias, causing some inconsistent behavior in system tables.

Release note: None
This commit adds the capability to plan lookup joins into virtual
tables with indexes. This should be a big boon to certain complex,
chained joins that tend to come up from ORMs querying database metadata.

Release note (performance improvement): the optimizer and execution
engine can now plan lookup joins into virtual indexes, avoiding full
scans against virtual tables.
@jordanlewis
Copy link
Copy Markdown
Member Author

bors r+

1 similar comment
@jordanlewis
Copy link
Copy Markdown
Member Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 10, 2020

Build failed

@jordanlewis
Copy link
Copy Markdown
Member Author

Grr, flaky test "test missing log output".

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 10, 2020

Build failed

@jordanlewis
Copy link
Copy Markdown
Member Author

One more try:

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 10, 2020

Build failed

@jordanlewis
Copy link
Copy Markdown
Member Author

Ok, this was a different flake. We try again

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 10, 2020

Build succeeded

@craig craig bot merged commit c902bd9 into cockroachdb:master May 10, 2020
@jordanlewis
Copy link
Copy Markdown
Member Author

Yay

@rafiss we should retry some of the longer hibernate/django tests with this in, very curious to see if it makes a difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: support lookup joins into virtual table indexes

4 participants