Skip to content

[DNM] sql: use the correct type for inverted key columns during planning#53202

Closed
rytaft wants to merge 1 commit intocockroachdb:masterfrom
rytaft:vectorized
Closed

[DNM] sql: use the correct type for inverted key columns during planning#53202
rytaft wants to merge 1 commit intocockroachdb:masterfrom
rytaft:vectorized

Conversation

@rytaft
Copy link
Copy Markdown
Collaborator

@rytaft rytaft commented Aug 21, 2020

This commit changes the column type for inverted key columns when constructing
scanNodes and TableReaderSpecs so that the type matches the data type actually
stored in the index. Prior to this commit, the type corresponded to the column
being indexed (e.g., geometry), rather than the actual type of the key column
(e.g., int). This is needed in order to enable vectorized execution with the
invertedFilterer processor.

Fixes #50695

Release note (performance improvement): Queries that use a geospatial inverted
index can now take advantage of vectorized execution for some parts of the
query plan, resulting in improved performance.

@rytaft rytaft requested review from a team, sumeerbhola and yuzefovich August 21, 2020 15:14
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@rytaft
Copy link
Copy Markdown
Collaborator Author

rytaft commented Aug 21, 2020

This PR is incomplete, but I figured I would open it so I could get some eyes on it and get some help understanding what else I need to change in order to enable vectorized execution with some parts of the plan.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One place that needs to be updated is removal of disabling of inverted filterer in colbuilder/execplan.go:579. I'd also try running some queries manually with SET vectorize=experimental_always to make sure that the vectorized flow does get planned.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @yuzefovich)

@madelynnblue
Copy link
Copy Markdown
Contributor

One concern here: while the geo stuff happens to be Ints, the json stuff is not anything decodable, and so would break if ported to the new operator.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and even geospatial may become its own custom encoding and not DInt -- ongoing discussion at https://cockroachlabs.slack.com/archives/CPBH0NZHQ/p1598027962010700

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)

@rytaft
Copy link
Copy Markdown
Collaborator Author

rytaft commented Aug 21, 2020

One that needs to be updated is removal of disabling of inverted filterer in colbuilder/execplan.go:579. I'd also try running some queries manually with SET vectorize=experimental_always to make sure that the vectorized flow does get planned.

Seems like that worked, thanks!

One concern here: while the geo stuff happens to be Ints, the json stuff is not anything decodable, and so would break if ported to the new operator.

and even geospatial may become its own custom encoding and not DInt -- ongoing discussion at https://cockroachlabs.slack.com/archives/CPBH0NZHQ/p1598027962010700

Will the geospatial encoding likely change before we release 20.2? If not, maybe we could just use this simple hack for now and fix it later? Alternatively, is there a reason we can't just use DBytes for everything (including JSON and array)?

@rytaft
Copy link
Copy Markdown
Collaborator Author

rytaft commented Aug 23, 2020

So I finally got this working, but it required a lot more changes than I expected. I had to thread the types of the virtual columns through many different functions, and I had to modify the invertedFilterer to encode the key columns (since the vectorized engine only returned decoded columns). This all feels pretty ugly and feels like it's probably not the best way to fix this issue.

I'm not sure how much of a difference this will make in terms of performance and whether or not it's worth the added complexity and ugliness. Is there a particular benchmark we want to test here?

Let me know if people have suggestions or if I should just close this PR.

@rytaft rytaft requested a review from a team as a code owner August 23, 2020 21:58
Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much of a difference this will make in terms of performance and whether or not it's worth the added complexity and ugliness. Is there a particular benchmark we want to test here?

Thanks for working through this!
Can you try with https://github.com/cockroachlabs/sqlcmp and see if this helps -- I am very curious about the result.

The encoding of the inverted geo column will change in #53215 and no longer be the DInt encoding, so I think we'll need to figure out how to use DBytes for everything.

Reviewed 2 of 22 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/rowexec/inverted_joiner.go, line 263 at r1 (raw file):

		descpb.ScanLockingStrength_FOR_NONE, descpb.ScanLockingWaitPolicy_BLOCK,
		nil /* systemColumns */, nil, /* virtualColumns */
	)

Did the scan done by the invertedJoiner not return a decoded inverted column? I would have expected we would need to encode it before passing to the batchedInvertedExprEvaluator.

@rytaft rytaft force-pushed the vectorized branch 2 times, most recently from afda217 to 000fb21 Compare August 25, 2020 14:24
This commit changes the column type for inverted key columns when constructing
scanNodes and TableReaderSpecs so that the type matches the data type actually
stored in the index. Prior to this commit, the type corresponded to the column
being indexed (e.g., geometry), rather than the actual type of the key column
(e.g., int). This is needed in order to enable vectorized execution with the
invertedFilterer processor.

Fixes cockroachdb#50695

Release note (performance improvement): Queries that use a geospatial inverted
index can now take advantage of vectorized execution for some parts of the
query plan, resulting in improved performance.
@rytaft
Copy link
Copy Markdown
Collaborator Author

rytaft commented Aug 25, 2020

I ran the benchmark, and for these queries, vectorization doesn't seem to help (I've only included the queries that use invertedFilterer):

name                                   old ms     new ms     delta
Test/postgis_geography_tutorial/11.2b  3.61 ±41%  3.25 ±50%  -10.10%  (p=0.209 n=24+18)
Test/postgis_geography_tutorial/11.6   8.40 ±40%  8.72 ±23%   +3.82%  (p=0.578 n=16+14)
Test/postgis_geometry_tutorial/11.2b   4.87 ±28%  5.00 ±31%   +2.77%  (p=0.428 n=21+17)
Test/postgis_geometry_tutorial/11.6    5.92 ±45%  6.13 ±40%   +3.64%  (p=0.606 n=25+20)
Test/postgis_geometry_tutorial/12.1.2  3.85 ±33%  3.77 ±33%   -1.99%  (p=0.679 n=24+20)
Test/postgis_geometry_tutorial/12.2.3  5.95 ±66%  6.19 ±60%   +4.05%  (p=0.622 n=25+20)
Test/postgis_geometry_tutorial/12.2.4  6.86 ±36%  7.48 ±28%   +9.01%  (p=0.252 n=24+19)

This doesn't really surprise me, since the only thing these queries are using the vectorized execution engine for is the scan. If anything, vectorization is probably a bit worse here since there is probably some overhead to convert back and forth between the two data formats.

I bet that vectorization will be helpful for more complex queries, but given the complexity of this change and the lack of a compelling use case, seems like we should close this PR.

Any objections?

@yuzefovich
Copy link
Copy Markdown
Member

I agree with your assessment about probably not getting much benefit (if any) from the vectorization on spatial queries and am in favor of closing this PR. Thanks for trying this out!

@rytaft
Copy link
Copy Markdown
Collaborator Author

rytaft commented Aug 25, 2020

Closing - thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

opt: change the key column of inverted indexes to use a new column id and type

5 participants