sql: use distsql for local distinct#25873
Conversation
|
I've only had time to skim this so far. Replacing localsql distinct with distsql distinct wasn’t as involved as I expected. It’s worthwhile to see if there is any performance diff from the change (i.e. due to the glue code). Review status: 0 of 19 files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
|
Yup, this was fairly quick work, until I got to tuples! There will be performance penalty due to the useless Comments from Reviewable |
|
I like the approach, looks pretty straightforward. Review status: 0 of 19 files reviewed at latest revision, all discussions resolved, some commit checks failed. pkg/sql/distsql_physical_planner.go, line 2196 at r3 (raw file):
unexport pkg/sql/plan_node_to_row_source.go, line 11 at r3 (raw file):
What is this to be used for? pkg/sql/row_source_to_plan_node.go, line 18 at r3 (raw file):
Does this need to be exported? pkg/sql/row_source_to_plan_node.go, line 50 at r3 (raw file):
The check should be for pkg/sql/row_source_to_plan_node.go, line 51 at r3 (raw file):
I think ConsumerClosed is to be called only once (at least according to its comment), it will get called again in Close pkg/sql/row_source_to_plan_node.go, line 59 at r3 (raw file):
This can be pre-allocated once (most planNodes do this) pkg/util/fast_int_set.go, line 196 at r3 (raw file):
This panic is not acceptable. We sometimes run tests with the "small set" cutoff set to a very small value to check there are no problems when falling back to large. The code below should work fine in the large mode. Comments from Reviewable |
|
Nice work. Have you got a rough plan for the long term merge of the two execution engines? I'm still kind of cloudy on the details but wondering if this work has made the path clearer for you. Review status: 0 of 19 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. pkg/sql/distinct.go, line 217 at r3 (raw file):
Comment explaining that pkg/sql/plan_node_to_row_source.go, line 58 at r3 (raw file):
Where is pkg/sql/plan_node_to_row_source.go, line 66 at r3 (raw file):
I think this currently violates the pkg/sql/plan_node_to_row_source.go, line 72 at r3 (raw file):
This might be a nice utility function to have in pkg/sql/row_source_to_plan_node.go, line 27 at r3 (raw file):
pkg/sql/row_source_to_plan_node.go, line 35 at r3 (raw file):
I think it's fine to assume that pkg/sql/distsqlrun/base.go, line 155 at r3 (raw file):
Any reason not to embed the Processor interface as well? Comments from Reviewable |
479c52f to
a9efe37
Compare
9b7d2a6 to
4c49d49
Compare
|
@asubiotto I updated #23901 with an extended plan. Review status: 0 of 26 files reviewed at latest revision, 12 unresolved discussions, some commit checks pending. pkg/sql/distinct.go, line 217 at r3 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Great point, thank you. Done. pkg/sql/distsql_physical_planner.go, line 2196 at r3 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/plan_node_to_row_source.go, line 11 at r3 (raw file): Previously, RaduBerinde wrote…
Eventually when we have multiple planNodes that are rowSourced, we'd want to fuse their execution pipeline and then unfuse them at the end. Right now we don't, so this is not used code. I can leave it out of this PR if you prefer. pkg/sql/plan_node_to_row_source.go, line 58 at r3 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Huh, I was never using it. You're right, since the underlying RowSources track it internally, there's no need for us to redo that work. Removed. pkg/sql/plan_node_to_row_source.go, line 66 at r3 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Added a comment that the caller should close after calling this. pkg/sql/plan_node_to_row_source.go, line 72 at r3 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Done. Moved to pkg/sql/row_source_to_plan_node.go, line 18 at r3 (raw file): Previously, RaduBerinde wrote…
No. Unexported. pkg/sql/row_source_to_plan_node.go, line 27 at r3 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Done, and also unexported. pkg/sql/row_source_to_plan_node.go, line 35 at r3 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Done. Removed. Thanks. pkg/sql/row_source_to_plan_node.go, line 50 at r3 (raw file): Previously, RaduBerinde wrote…
Good point. Thank you. pkg/sql/row_source_to_plan_node.go, line 51 at r3 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/row_source_to_plan_node.go, line 59 at r3 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/distsqlrun/base.go, line 155 at r3 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
You can't embed both pkg/util/fast_int_set.go, line 196 at r3 (raw file): Previously, RaduBerinde wrote…
I ended up removing the whole function, it was just for rapid prototyping, and thus the code quality. Comments from Reviewable |
|
Review status: 0 of 10 files reviewed at latest revision, 11 unresolved discussions, some commit checks failed. pkg/sql/plan_node_to_row_source.go, line 11 at r3 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Not sure I understand. Why would we need a row-source - to - planNode shim to fuse them? Anyway, I'd leave it out of this PR since it's just a distraction for now. Comments from Reviewable |
Prerequisite before embedding it in the planNodes in Local SQL. Release note: None
c1e1c5a to
6db8915
Compare
|
👍 this looks good! Review status: 0 of 8 files reviewed at latest revision, 11 unresolved discussions, all commit checks successful. pkg/sql/plan_node_to_row_source.go, line 11 at r3 (raw file): Previously, RaduBerinde wrote…
We do need both directions of conversion, at least at first. If there are 3 planNodes in a row, A->B->C, and B is implemented as a wrapper around a Processor, you need a rowSourceToPlanNode wrapper to wrap the processor so that A can consume from B as a planNode, and you need a planNodeToRowSource wrapper to wrap the planNode C so that B's inner processor can consume from C as a RowSource. Comments from Reviewable |
|
Comments from Reviewable |
|
Review status: pkg/sql/distinct.go, line 211 at r8 (raw file):
nit: I personally find it easier to read comments when they are on the line above, but feel free to ignore pkg/sql/plan_node_to_row_source.go, line 66 at r3 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
I'm kind of confused. I think we need to implement pkg/sql/row_source_to_plan_node.go, line 81 at r8 (raw file):
nit: We can assume pkg/sql/row_source_to_plan_node.go, line 86 at r8 (raw file):
nit: I would put this right below the struct definition pkg/sql/distsqlrun/base.go, line 155 at r3 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
This is fine Comments from Reviewable |
|
Review status: pkg/sql/plan_node_to_row_source.go, line 66 at r3 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
^ disregard this unfinished comment, I thought I had deleted it. I'm still confused but it seems that implementing this logic even with removing the Comments from Reviewable |
6db8915 to
b0e7220
Compare
|
I fixed the flow so that they're no longer no-ops. It makes more sense now. PTAL. Review status: Comments from Reviewable |
|
Review status: Comments from Reviewable |
Use the distsql distinct processor for processing local sql queries. Closes cockroachdb#23901. Release note: none
b0e7220 to
c74fb21
Compare
|
bors r+ |
25873: sql: use distsql for local distinct r=jordanlewis a=arjunravinarayan Use the distsql distinct processor for processing local sql queries. This is a work-in-progress to demonstrate and begin a discussion around the shims used. This PR is rebased on top of #25860, so only evaluate the last commit. The main interfaces on which I'd like discussion are RowSourceToPlanNode and PlanNodeToRowSource. The rest of this PR is exporting changes (2nd commit), some planning plumbing, and code removal in local distinct. Co-authored-by: Arjun Narayan <arjun@cockroachlabs.com>
Build succeeded |
Use the distsql distinct processor for processing local sql queries. This is a work-in-progress to demonstrate and begin a discussion around the shims used.
This PR is rebased on top of #25860, so only evaluate the last commit. The main interfaces on which I'd like discussion are RowSourceToPlanNode and PlanNodeToRowSource. The rest of this PR is exporting changes (2nd commit), some planning plumbing, and code removal in local distinct.