[DNM] sql: use the correct type for inverted key columns during planning#53202
[DNM] sql: use the correct type for inverted key columns during planning#53202rytaft wants to merge 1 commit intocockroachdb:masterfrom
Conversation
|
This PR is incomplete, but I figured I would open it so I could get some eyes on it and get some help understanding what else I need to change in order to enable vectorized execution with some parts of the plan. |
There was a problem hiding this comment.
One place that needs to be updated is removal of disabling of inverted filterer in colbuilder/execplan.go:579. I'd also try running some queries manually with SET vectorize=experimental_always to make sure that the vectorized flow does get planned.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @yuzefovich)
|
One concern here: while the geo stuff happens to be Ints, the json stuff is not anything decodable, and so would break if ported to the new operator. |
sumeerbhola
left a comment
There was a problem hiding this comment.
and even geospatial may become its own custom encoding and not DInt -- ongoing discussion at https://cockroachlabs.slack.com/archives/CPBH0NZHQ/p1598027962010700
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)
Seems like that worked, thanks!
Will the geospatial encoding likely change before we release 20.2? If not, maybe we could just use this simple hack for now and fix it later? Alternatively, is there a reason we can't just use DBytes for everything (including JSON and array)? |
|
So I finally got this working, but it required a lot more changes than I expected. I had to thread the types of the virtual columns through many different functions, and I had to modify the I'm not sure how much of a difference this will make in terms of performance and whether or not it's worth the added complexity and ugliness. Is there a particular benchmark we want to test here? Let me know if people have suggestions or if I should just close this PR. |
sumeerbhola
left a comment
There was a problem hiding this comment.
I'm not sure how much of a difference this will make in terms of performance and whether or not it's worth the added complexity and ugliness. Is there a particular benchmark we want to test here?
Thanks for working through this!
Can you try with https://github.com/cockroachlabs/sqlcmp and see if this helps -- I am very curious about the result.
The encoding of the inverted geo column will change in #53215 and no longer be the DInt encoding, so I think we'll need to figure out how to use DBytes for everything.
Reviewed 2 of 22 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/rowexec/inverted_joiner.go, line 263 at r1 (raw file):
descpb.ScanLockingStrength_FOR_NONE, descpb.ScanLockingWaitPolicy_BLOCK, nil /* systemColumns */, nil, /* virtualColumns */ )
Did the scan done by the invertedJoiner not return a decoded inverted column? I would have expected we would need to encode it before passing to the batchedInvertedExprEvaluator.
afda217 to
000fb21
Compare
This commit changes the column type for inverted key columns when constructing scanNodes and TableReaderSpecs so that the type matches the data type actually stored in the index. Prior to this commit, the type corresponded to the column being indexed (e.g., geometry), rather than the actual type of the key column (e.g., int). This is needed in order to enable vectorized execution with the invertedFilterer processor. Fixes cockroachdb#50695 Release note (performance improvement): Queries that use a geospatial inverted index can now take advantage of vectorized execution for some parts of the query plan, resulting in improved performance.
|
I ran the benchmark, and for these queries, vectorization doesn't seem to help (I've only included the queries that use invertedFilterer): This doesn't really surprise me, since the only thing these queries are using the vectorized execution engine for is the scan. If anything, vectorization is probably a bit worse here since there is probably some overhead to convert back and forth between the two data formats. I bet that vectorization will be helpful for more complex queries, but given the complexity of this change and the lack of a compelling use case, seems like we should close this PR. Any objections? |
|
I agree with your assessment about probably not getting much benefit (if any) from the vectorization on spatial queries and am in favor of closing this PR. Thanks for trying this out! |
|
Closing - thanks! |
This commit changes the column type for inverted key columns when constructing
scanNodes andTableReaderSpecs so that the type matches the data type actuallystored in the index. Prior to this commit, the type corresponded to the column
being indexed (e.g.,
geometry), rather than the actual type of the key column(e.g.,
int). This is needed in order to enable vectorized execution with theinvertedFiltererprocessor.Fixes #50695
Release note (performance improvement): Queries that use a geospatial inverted
index can now take advantage of vectorized execution for some parts of the
query plan, resulting in improved performance.