-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13482: [C++][Compute] Refactoring away from hard coded ExecNode factories to a registry #10793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks. Overall this looks cleaner than hand-constructing the graphs before, at least for the simple ones we have in tests.
nirandaperera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we could split the exec_plan.cc based on each ExecNode impl? It would be much easier to navigate then
|
Will do, thanks! |
15dcf79 to
211b1da
Compare
|
Is there any plan for documenting these ExecNode factories in a standard way? Does this registry give us opportunities to achieve ARROW-13227 in a modular way? Apologies if this is a silly question; I don't fully understand all the mechanics here. |
|
@ianmcook I'll add a note to ARROW-13227 describing the impact of this PR. In short, ARROW-13227 should now document these factories instead of specific subclasses of ExecNode (and will look a lot like |
211b1da to
87c10c6
Compare
87c10c6 to
1b2274e
Compare
Added factory for SourceExecNode and replacing old MakeProjectNode by the new registry use in plan_test Formatting.. add Declaration helper class to allow declarative construction of ExecPlans msvc: export classes and structs add operator| for dplyr-like pipelines make labels optional for ExecNodes remove operator| add example of registering a custom ExecNode msvc: try explicitly constructing std::function instead of relying on operator= remove usage of hard coded factories from plan_test.cc remove all usages of hard coded factory functions nit: simplify Declaration::Sequence slightly msvc: it's callable, really msvc: export nested structs too msvc: try the copy constructor, why not add comments for Declaration add support for AggregateNodeOptions::targets to ScalarAggregateNode revert msvc: try the copy constructor, why not extract exec node implementations to dedicated source files add output field names to aggregate options
9a329a0 to
1d304cd
Compare
1d304cd to
73e5ef7
Compare
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this (massive) refactor!
I left one small nit.
| namespace internal { | ||
|
|
||
| Result<std::vector<const HashAggregateKernel*>> GetKernels( | ||
| ExecContext* ctx, const std::vector<internal::Aggregate>& aggregates, | ||
| const std::vector<ValueDescr>& in_descrs); | ||
|
|
||
| Result<std::vector<std::unique_ptr<KernelState>>> InitKernels( | ||
| const std::vector<const HashAggregateKernel*>& kernels, ExecContext* ctx, | ||
| const std::vector<internal::Aggregate>& aggregates, | ||
| const std::vector<ValueDescr>& in_descrs); | ||
|
|
||
| Result<FieldVector> ResolveKernels( | ||
| const std::vector<internal::Aggregate>& aggregates, | ||
| const std::vector<const HashAggregateKernel*>& kernels, | ||
| const std::vector<std::unique_ptr<KernelState>>& states, ExecContext* ctx, | ||
| const std::vector<ValueDescr>& descrs); | ||
|
|
||
| } // namespace internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth moving these to an actual _internal header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking I'd replace these with usage of Kernel::InitAll or other public & tested utilities in https://issues.apache.org/jira/browse/ARROW-13451. I've added a note to this effect to that JIRA
|
+1, merging |
… factories to a registry - An extensible registry of exec node factories (`std::function<Result<ExecNode*>(ExecPlan* plan, std::vector<ExecNode*> inputs, const ExecNodeOptions& options)>`) is provided - Hard coded factories like `compute::MakeSinkNode`, `dataset::MakeScanNode` are replaced by factories in the registry named "sink", "scan", etc - `arrow::compute::Declaration` is provided to represent an unconstructed set of `ExecNode`s, which can be validated and emplaced into an `ExecPlan` as a unit Closes apache#10793 from bkietz/exec-node-factory-registry Authored-by: Benjamin Kietzman <bengilgit@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
std::function<Result<ExecNode*>(ExecPlan* plan, std::vector<ExecNode*> inputs, const ExecNodeOptions& options)>) is providedcompute::MakeSinkNode,dataset::MakeScanNodeare replaced by factories in the registry named "sink", "scan", etcarrow::compute::Declarationis provided to represent an unconstructed set ofExecNodes, which can be validated and emplaced into anExecPlanas a unit