[DNM] sql: use the correct type for inverted key columns during planning by rytaft · Pull Request #53202 · cockroachdb/cockroach

rytaft · 2020-08-21T15:14:05Z

This commit changes the column type for inverted key columns when constructing
scanNodes and TableReaderSpecs so that the type matches the data type actually
stored in the index. Prior to this commit, the type corresponded to the column
being indexed (e.g., geometry), rather than the actual type of the key column
(e.g., int). This is needed in order to enable vectorized execution with the
invertedFilterer processor.

Fixes #50695

Release note (performance improvement): Queries that use a geospatial inverted
index can now take advantage of vectorized execution for some parts of the
query plan, resulting in improved performance.

cockroach-teamcity · 2020-08-21T15:14:12Z

This change is

rytaft · 2020-08-21T15:15:27Z

This PR is incomplete, but I figured I would open it so I could get some eyes on it and get some help understanding what else I need to change in order to enable vectorized execution with some parts of the plan.

yuzefovich

One place that needs to be updated is removal of disabling of inverted filterer in colbuilder/execplan.go:579. I'd also try running some queries manually with SET vectorize=experimental_always to make sure that the vectorized flow does get planned.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @yuzefovich)

madelynnblue · 2020-08-21T16:34:02Z

One concern here: while the geo stuff happens to be Ints, the json stuff is not anything decodable, and so would break if ported to the new operator.

sumeerbhola

and even geospatial may become its own custom encoding and not DInt -- ongoing discussion at https://cockroachlabs.slack.com/archives/CPBH0NZHQ/p1598027962010700

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)

rytaft · 2020-08-21T19:30:03Z

One that needs to be updated is removal of disabling of inverted filterer in colbuilder/execplan.go:579. I'd also try running some queries manually with SET vectorize=experimental_always to make sure that the vectorized flow does get planned.

Seems like that worked, thanks!

One concern here: while the geo stuff happens to be Ints, the json stuff is not anything decodable, and so would break if ported to the new operator.

and even geospatial may become its own custom encoding and not DInt -- ongoing discussion at https://cockroachlabs.slack.com/archives/CPBH0NZHQ/p1598027962010700

Will the geospatial encoding likely change before we release 20.2? If not, maybe we could just use this simple hack for now and fix it later? Alternatively, is there a reason we can't just use DBytes for everything (including JSON and array)?

rytaft · 2020-08-23T19:02:25Z

So I finally got this working, but it required a lot more changes than I expected. I had to thread the types of the virtual columns through many different functions, and I had to modify the invertedFilterer to encode the key columns (since the vectorized engine only returned decoded columns). This all feels pretty ugly and feels like it's probably not the best way to fix this issue.

I'm not sure how much of a difference this will make in terms of performance and whether or not it's worth the added complexity and ugliness. Is there a particular benchmark we want to test here?

Let me know if people have suggestions or if I should just close this PR.

sumeerbhola

I'm not sure how much of a difference this will make in terms of performance and whether or not it's worth the added complexity and ugliness. Is there a particular benchmark we want to test here?

Thanks for working through this!
Can you try with https://github.com/cockroachlabs/sqlcmp and see if this helps -- I am very curious about the result.

The encoding of the inverted geo column will change in #53215 and no longer be the DInt encoding, so I think we'll need to figure out how to use DBytes for everything.

Reviewed 2 of 22 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)

pkg/sql/rowexec/inverted_joiner.go, line 263 at r1 (raw file):

		descpb.ScanLockingStrength_FOR_NONE, descpb.ScanLockingWaitPolicy_BLOCK,
		nil /* systemColumns */, nil, /* virtualColumns */
	)

Did the scan done by the invertedJoiner not return a decoded inverted column? I would have expected we would need to encode it before passing to the batchedInvertedExprEvaluator.

This commit changes the column type for inverted key columns when constructing scanNodes and TableReaderSpecs so that the type matches the data type actually stored in the index. Prior to this commit, the type corresponded to the column being indexed (e.g., geometry), rather than the actual type of the key column (e.g., int). This is needed in order to enable vectorized execution with the invertedFilterer processor. Fixes cockroachdb#50695 Release note (performance improvement): Queries that use a geospatial inverted index can now take advantage of vectorized execution for some parts of the query plan, resulting in improved performance.

rytaft · 2020-08-25T17:43:56Z

I ran the benchmark, and for these queries, vectorization doesn't seem to help (I've only included the queries that use invertedFilterer):

name                                   old ms     new ms     delta
Test/postgis_geography_tutorial/11.2b  3.61 ±41%  3.25 ±50%  -10.10%  (p=0.209 n=24+18)
Test/postgis_geography_tutorial/11.6   8.40 ±40%  8.72 ±23%   +3.82%  (p=0.578 n=16+14)
Test/postgis_geometry_tutorial/11.2b   4.87 ±28%  5.00 ±31%   +2.77%  (p=0.428 n=21+17)
Test/postgis_geometry_tutorial/11.6    5.92 ±45%  6.13 ±40%   +3.64%  (p=0.606 n=25+20)
Test/postgis_geometry_tutorial/12.1.2  3.85 ±33%  3.77 ±33%   -1.99%  (p=0.679 n=24+20)
Test/postgis_geometry_tutorial/12.2.3  5.95 ±66%  6.19 ±60%   +4.05%  (p=0.622 n=25+20)
Test/postgis_geometry_tutorial/12.2.4  6.86 ±36%  7.48 ±28%   +9.01%  (p=0.252 n=24+19)

This doesn't really surprise me, since the only thing these queries are using the vectorized execution engine for is the scan. If anything, vectorization is probably a bit worse here since there is probably some overhead to convert back and forth between the two data formats.

I bet that vectorization will be helpful for more complex queries, but given the complexity of this change and the lack of a compelling use case, seems like we should close this PR.

Any objections?

yuzefovich · 2020-08-25T18:18:19Z

I agree with your assessment about probably not getting much benefit (if any) from the vectorization on spatial queries and am in favor of closing this PR. Thanks for trying this out!

rytaft · 2020-08-25T18:32:39Z

Closing - thanks!

rytaft requested review from a team, sumeerbhola and yuzefovich August 21, 2020 15:14

yuzefovich reviewed Aug 21, 2020

View reviewed changes

sumeerbhola reviewed Aug 21, 2020

View reviewed changes

rytaft force-pushed the vectorized branch from 5d8a156 to 7663a38 Compare August 21, 2020 19:23

rytaft force-pushed the vectorized branch from 7663a38 to 92708db Compare August 23, 2020 18:54

rytaft force-pushed the vectorized branch from 92708db to 47449fd Compare August 23, 2020 21:58

rytaft requested a review from a team as a code owner August 23, 2020 21:58

sumeerbhola reviewed Aug 24, 2020

View reviewed changes

rytaft force-pushed the vectorized branch 2 times, most recently from afda217 to 000fb21 Compare August 25, 2020 14:24

rytaft force-pushed the vectorized branch from 000fb21 to 6cdb74a Compare August 25, 2020 16:45

rytaft closed this Aug 25, 2020

yuzefovich mentioned this pull request Sep 23, 2020

colexec/opt: add costing for the vectorized engine #49794

Closed

rytaft mentioned this pull request Dec 23, 2020

sql: use type bytes for virtual inverted columns #58241

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DNM] sql: use the correct type for inverted key columns during planning#53202

[DNM] sql: use the correct type for inverted key columns during planning#53202
rytaft wants to merge 1 commit intocockroachdb:masterfrom
rytaft:vectorized

rytaft commented Aug 21, 2020

Uh oh!

cockroach-teamcity commented Aug 21, 2020

Uh oh!

rytaft commented Aug 21, 2020

Uh oh!

yuzefovich left a comment •

edited

Loading

Uh oh!

madelynnblue commented Aug 21, 2020

Uh oh!

sumeerbhola left a comment

Uh oh!

rytaft commented Aug 21, 2020

Uh oh!

rytaft commented Aug 23, 2020

Uh oh!

sumeerbhola left a comment

Uh oh!

rytaft commented Aug 25, 2020

Uh oh!

yuzefovich commented Aug 25, 2020

Uh oh!

rytaft commented Aug 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

rytaft commented Aug 21, 2020

Uh oh!

cockroach-teamcity commented Aug 21, 2020

Uh oh!

rytaft commented Aug 21, 2020

Uh oh!

yuzefovich left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

madelynnblue commented Aug 21, 2020

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

rytaft commented Aug 21, 2020

Uh oh!

rytaft commented Aug 23, 2020

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

rytaft commented Aug 25, 2020

Uh oh!

yuzefovich commented Aug 25, 2020

Uh oh!

rytaft commented Aug 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yuzefovich left a comment •

edited

Loading