sql: replace walkPlan with Input and InputCount#137620
sql: replace walkPlan with Input and InputCount#137620craig[bot] merged 7 commits intocockroachdb:masterfrom
walkPlan with Input and InputCount#137620Conversation
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
Nice improvement! The approach makes sense to me, so I'm in favor of polishing it up. One suggestion for implementation I have is to define a couple of helper structs that will implement the new methods and will be embedded into |
Ha... You're really gonna make me work for this one. ;) But I do agree, that's a good idea. |
921a6a8 to
4c7a3f0
Compare
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
54cc717 to
5afe911
Compare
Input and InputCount planNode methods to reduce allocationsInput and InputCount planNode methods to reduce allocations
Input and InputCount planNode methods to reduce allocationswalkPlan with Input and InputCount planNode methods to reduce allocations
walkPlan with Input and InputCount planNode methods to reduce allocationswalkPlan with Input and InputCount
b513252 to
fb92aa6
Compare
|
The reduction in allocations I originally measured was some sort of fluke. I no longer measure that same reduction on the original prototype nor this polished-up version. However, I do measure a consistent speedup in EndToEnd benchmarks. I got a similar result across three separate 20-count runs: I think this is likely due to eliminating some of the overhead of the visitor. |
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 5 of 5 files at r1, 3 of 3 files at r2, 6 of 6 files at r3, 4 of 4 files at r4, 3 of 3 files at r5, 136 of 136 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)
-- commits line 65 at r7:
nit: consider updating the last commit title / message with the recent benchmark findings.
pkg/sql/schema_change_plan_node.go line 216 at r6 (raw file):
// the nodes that existed preceding the current statement with the output of // the built current statement. There maybe cases like CTE's, where we will // need to re-input if the lastState and the plannedState do not match, since
nit: "re-plan" makes more sense 😃
pkg/sql/distsql_physical_planner.go line 4171 at r6 (raw file):
} var err error // Continue walking until we find a node that has a DistSQL
nit: I'd preserve this comment, seems useful for context.
7b3e428 to
ee975d2
Compare
mgartner
left a comment
There was a problem hiding this comment.
The linting failure should be resolved once #137739 is merged.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: consider updating the last commit title / message with the recent benchmark findings.
Done.
pkg/sql/distsql_physical_planner.go line 4171 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: I'd preserve this comment, seems useful for context.
Done.
pkg/sql/schema_change_plan_node.go line 216 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: "re-plan" makes more sense 😃
Oops, done.
pkg/sql/distsql_physical_planner.go line 4178 at r11 (raw file):
return planFirstDistSQLNode(input) default: // Can't wrap plans with more than 1 input.
@yuzefovich is this true? Or are there cases where we might wrap one side of a join or set op but not the other? I'm not sure I understand exactly how the nParents check works today.
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 2 of 3 files at r9, 134 of 136 files at r10, 1 of 2 files at r12, 1 of 1 files at r13, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mgartner)
pkg/sql/distsql_physical_planner.go line 4178 at r11 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
@yuzefovich is this true? Or are there cases where we might wrap one side of a join or set op but not the other? I'm not sure I understand exactly how the
nParentscheck works today.
It is true in a sense that the current infrastructure assumes either 0 or 1 inputs to the planNode-being-wrapped (see rowexec.NewProcessor for LocalPlanNode core).
I think it is confusingly named nParents - I'd say nChildren or nInputs would be more clear. (Consider renaming it while we're here.)
What we have before starting our walk when wrapPlan is called is the first planNode that we don't have a corresponding DistSQL processor for. We then traverse this planNode's descendants / inputs to find the first "stage" of planNodes that have the DistSQL equivalent. Almost all planNode implementations have 0 or 1 inputs. I think the main exceptions are joinNode and unionNode for which we do have equivalent DistSQL processors, so we would never be wrapping them. (There is also some discussion about this on the PR #28509 that introduced this logic.)
(Looks like we have another planNode that might have multiple inputs - hookFnNode since it has the notion of subplans. This was added in 1e72239, and on a quick look it seems like we actually don't have any hooks that use subplans functionality.)
Two methods have been added to the `planNode` interface that can be used for traversing a `planNode` tree: `Input` and `InputCount`. The `zeroInputPlanNode` and `singleInputPlanNode` structs are provided to avoid boilerplate code for the two most common types of plan nodes. The `Input` and `InputCount` methods allow for more flexibility in traversing the tree than the `planVisitor` pattern. With the visitor pattern general information can only be passed back to callsites by capturing local variables in visitor closures, which can incur incidental heap allocations. With `Input` and `InputCount`, callers can define custom recursive functions that pass information back to callsites by returning them. These methods are not yet used, but will be in future commits. Release note: None
`wrapPlan` now traverses the `planNode` tree with a recursive function that uses the `Input` and `InputCount` methods instead of the `planVisitor`. This reduces the overhead of traversal. Below are some benchmarks that show the improvement. ``` name old time/op new time/op delta EndToEnd/kv-read/vectorize=on/Simple-16 213µs ± 1% 205µs ± 1% -4.02% (p=0.000 n=20+20) EndToEnd/kv-read/vectorize=off/Simple-16 207µs ± 1% 199µs ± 1% -3.85% (p=0.000 n=19+20) EndToEnd/kv-read/vectorize=on/Prepared-16 146µs ± 1% 141µs ± 1% -2.98% (p=0.000 n=19+19) EndToEnd/kv-read/vectorize=off/Prepared-16 140µs ± 1% 136µs ± 1% -2.94% (p=0.000 n=20+18) name old alloc/op new alloc/op delta EndToEnd/kv-read/vectorize=on/Prepared-16 20.7kB ± 0% 20.6kB ± 0% -0.23% (p=0.000 n=19+19) EndToEnd/kv-read/vectorize=on/Simple-16 33.4kB ± 0% 33.4kB ± 0% ~ (p=0.670 n=19+19) EndToEnd/kv-read/vectorize=off/Simple-16 33.8kB ± 0% 33.7kB ± 0% ~ (p=0.760 n=18+18) EndToEnd/kv-read/vectorize=off/Prepared-16 21.9kB ± 0% 21.9kB ± 0% ~ (p=0.365 n=18+20) name old allocs/op new allocs/op delta EndToEnd/kv-read/vectorize=on/Simple-16 283 ± 0% 282 ± 0% ~ (p=0.105 n=20+20) EndToEnd/kv-read/vectorize=on/Prepared-16 190 ± 0% 189 ± 0% ~ (p=0.150 n=19+20) EndToEnd/kv-read/vectorize=off/Simple-16 270 ± 0% 270 ± 0% ~ (all equal) EndToEnd/kv-read/vectorize=off/Prepared-16 182 ± 0% 182 ± 0% ~ (p=0.447 n=18+20) ``` Release note: None
ee975d2 to
79bb063
Compare
|
Previously, yuzefovich (Yahor Yuzefovich) wrote…
👍 |
|
TFTR! bors r+ |
The Input and InputCount methods of the planNode interface are now used to walk plan node trees of post-query checks. This continues the effort to deprecate and remove the plan node walkers (see cockroachdb#137620 for more details on the motivation for this). Release note: None
The `Input` and `InputCount` methods of the `planNode` interface are now used to walk plan node trees of CDC expressions. This continues the effort to deprecate and remove the plan node walkers (see cockroachdb#137620 for more details on the motivation for this). Release note: None
The Input and InputCount methods of the planNode interface are now used to walk plan node trees of post-query checks. This continues the effort to deprecate and remove the plan node walkers (see cockroachdb#137620 for more details on the motivation for this). Release note: None
143464: sql: use InputCount and Input planNode methods to walk check plans r=mgartner a=mgartner #### sql: use InputCount and Input planNode methods to walk check plans The Input and InputCount methods of the planNode interface are now used to walk plan node trees of post-query checks. This continues the effort to deprecate and remove the plan node walkers (see #137620 for more details on the motivation for this). Epic: None Release note: None Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
The `Input` and `InputCount` methods of the `planNode` interface are now used to walk plan node trees of CDC expressions. This continues the effort to deprecate and remove the plan node walkers (see cockroachdb#137620 for more details on the motivation for this). Release note: None
143424: raft: use term cache for leader commit term check r=pav-kv a=hakuuww Previously, we avoided calling term(index) for leader commit term check by keeping the first entry index for the current leader term in memory, and comparing against that index. (see #137826) Now, since we have implemented term cache (which stores info of the current leader term's first entryID in this case), the term cache is used for that check to reduce the complexity of our code. (see #142239 ) Part of #136296 Fixes: #143362 Epic: None Release note: None 143463: sql: refactor PlanCDCExpression r=mgartner a=mgartner #### sql: collect CDC presentation outside of plan walker The column presentation of the top node is collected to determine the output columns for the CDC expression. There is no need to do this within the plan node walker, so it has been moved outside. Release note: None #### sql: use InputCount and Input planNode methods to walk CDC plans The `Input` and `InputCount` methods of the `planNode` interface are now used to walk plan node trees of CDC expressions. This continues the effort to deprecate and remove the plan node walkers (see #137620 for more details on the motivation for this). Epic: None Release note: None #### sql: refactor return statements in PlanCDCExpression The `return` statements for error cases now explicitly return an empty `CDCExpressionPlan` for clarity. Release note: None Co-authored-by: Anthony Xu <anthony.xu@cockroachlabs.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
The `(*planNodeToRowSource).SetInput` method now uses the `InputCount`, `Input`, and `SetInput` methods of `planNode`, instead of `walkPlan`, to traverse and replace nodes in the plan node tree. This continues the effort to deprecate and remove the plan node walkers (see cockroachdb#137620 for more details on the motivation for this). Release note: None
CDC plan node trees that previously used plan visitors and observers now use the `InputCount`, `Input`, and `SetInput` methods of the `planNode` interface to traverse and manipulate the plan node tree. This continues the effort to deprecate and remove the plan node walkers (see cockroachdb#137620 for more details on the motivation for this). Release note: None
143503: sql: remove plan node visitor and observer r=mgartner a=mgartner #### sql: add SetInput method to planNode interface The `SetInput` method has been added to the planNode interface. It will be used in future commits. Release note: None #### sql: use planNode methods to replace unwrapped planNodes The `(*planNodeToRowSource).SetInput` method now uses the `InputCount`, `Input`, and `SetInput` methods of `planNode`, instead of `walkPlan`, to traverse and replace nodes in the plan node tree. This continues the effort to deprecate and remove the plan node walkers (see #137620 for more details on the motivation for this). Release note: None #### sql: use planNode methods to traverse CDC planNodes CDC plan node trees that previously used plan visitors and observers now use the `InputCount`, `Input`, and `SetInput` methods of the `planNode` interface to traverse and manipulate the plan node tree. This continues the effort to deprecate and remove the plan node walkers (see #137620 for more details on the motivation for this). Release note: None #### sql: remove unused planVisitor and planObserver Release note: None #### sql: rename walk.go to plan_names.go This file has been renamed to reflect that it no longer contains any logic for walking plan node trees. It only contains logic related to plan node names. Epic: None Release note: None 143506: explain: fix plan gist decoding with nil catalog r=yuzefovich a=yuzefovich Previously, when catalog wasn't specified during plan gist decoding, we forgot to decode the table ID. This would lead to unexpected remainder of the gist that could result in unexpected behavior, and this is now fixed. Only the "external" variant of decoding is affected, so I decided to not include the release note. Epic: None Release note: None 143515: build: upgrade bazel to 7.6.0 r=rail a=rickystewart 7.2.1 has a remote-execution related bug which should be fixed in this version. Epic: none Release note: None 143522: kvserver: use correct tracer in a test r=rickystewart a=tbg Fixes #143270, which would previously crash ~immediately under stress. Epic: none Release note: None Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
sql: remove planDataSource field in renderNode
The
planDataSourcefield ofrenderNodehad acolumnssub-fieldthat was only set some of the time, causing confusion with the
renderNode'scolumnsfield.planDataSourcehas been replaced withplanNodeto avoid confusion and misuse.Release note: None
sql: remove planDataSource field in applyJoinNode
The
planDataSourcefield ofapplyJoinNodehad acolumnssub-fieldthat was redundant with the
applyJoinNode'scolumnsfield.planDataSourcehas been replaced withplanNodeto avoid confusionand misuse.
Release note: None
sql: remove planDataSource field in filterNode
The
planDataSourcefield offilterNodehas been replaced by aplanNodeandcolumnsfield.Release note: None
sql: replace planDataSource fields in joinNode
The
planDataSourcefields ofjoinNodehave been replaced byplanNodefields.Release note: None
sql: remove planDataSource
The
planDataSourcestruct is no longer needed. It has been removed.Release note: None
sql: add Input and InputCount to planNode interface
Two methods have been added to the
planNodeinterface that can be usedfor traversing a
planNodetree:InputandInputCount. ThezeroInputPlanNodeandsingleInputPlanNodestructs are provided toavoid boilerplate code for the two most common types of plan nodes.
The
InputandInputCountmethods allow for more flexibility intraversing the tree than the
planVisitorpattern. With the visitorpattern general information can only be passed back to callsites by
capturing local variables in visitor closures, which can incur
incidental heap allocations. With
InputandInputCount, callers candefine custom recursive functions that pass information back to
callsites by returning them.
These methods are not yet used, but will be in future commits.
Release note: None
sql: new traversal method of planNodes in wrapPlan
wrapPlannow traverses theplanNodetree with a recursive functionthat uses the
InputandInputCountmethods instead of theplanVisitor. This reduces the overhead of traversal.Below are some benchmarks that show the improvement.
Epic: None
Release note: None