Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jan 20, 2021

Problem:

  • There are several places in the DataFusion codebase where a walk of an Expression tree is needed
  • The logic of how to walk the tree is replicated
  • Adding new expression types often require many mechanically different but semantically the same changes in many places where no special treatment of such types is needed

This PR introduces a ExpressionVisitor trait and the Expr::accept function to consolidate this walking of the expression tree. It does not intend to change any functionality.

If folks like this pattern, I have ideas for a similar type of trait ExpressionRewriter which can be used to rewrite expressions (much like clone_with_replacement) as a subsquent PR. I think this was mentioned by @Dandandan in the Rust roadmap

cc @jorgecarleitao @Dandandan and @andygrove

@alamb alamb changed the title ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encod ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking Jan 20, 2021

/// Recursively walk an expression tree, collecting the unique set of column names
/// referenced in the expression
pub fn expr_to_column_names(expr: &Expr, accum: &mut HashSet<String>) -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty good example of the kind of repetition that can be removed using this visitor pattern.
Note that I still left all expr types enumerated so that anyone who adds a new Expr type need to update this code, and (hopefully) think if they need to add special handling for that new expr types

@github-actions
Copy link

@andygrove
Copy link
Member

I took a quick look and this looks good to me. I will try and review in more detail tonight.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beautiful. Thank you, @alamb

@andygrove
Copy link
Member

I just wanted to add that there is no need to wait for me to review before merging.

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
This should make it a bit less verbose / and error prone to write things like optimization rules.
I think we may just want to try to use it, if we find another pattern / approach works better we can adapt / improve it.

Performance-wise, it doesn't compile to a handwritten for loop of course (or even plain recursion), but for most foreseeable use cases, except some extreme examples, I think that should be fine for now.

@alamb
Copy link
Contributor Author

alamb commented Jan 20, 2021

Thanks @jorgecarleitao and @Dandandan . I am personally quite excited at using this same idea for expression rewriting.

Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
@alamb alamb closed this in 4601c02 Jan 21, 2021
kszucs pushed a commit that referenced this pull request Jan 25, 2021
…ssion walking

## Problem:
* There are several places in the DataFusion codebase where a walk of an Expression tree is needed
* The logic of how to walk the tree is replicated
* Adding new expression types often require many mechanically different but semantically the same changes in many places where no special treatment of such types is needed

This PR introduces a `ExpressionVisitor` trait and the `Expr::accept` function to consolidate this walking of the expression tree. It does not intend to change any functionality.

If folks like this pattern, I have ideas for a similar type of trait `ExpressionRewriter` which can be used to rewrite expressions (much like `clone_with_replacement`) as a subsquent PR. I think this was mentioned by @Dandandan  in the [Rust roadmap](https://docs.google.com/document/d/1qspsOM_dknOxJKdGvKbC1aoVoO0M3i6x1CIo58mmN2Y/edit#heading=h.kstb571j5g5j)

cc @jorgecarleitao @Dandandan and @andygrove

Closes #9278 from alamb/alamb/expression_visitor

Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants