-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Description
Currently DataFusion expressions (Expr) do not know their type. The type can be calculated via <Expr as ExprSchemable::get_type function.
This has drawbacks unfortunately
- design: DF being reusable, it needs to account for different coercion rules and type logic of systems building on top of DF. Explicit (provided at construction time) expression type would better support this goal.
- perf: as we add more and more optimizers, asking for expression type is going to be a common operation, slowing down planning process
- anecdotally, this was a problem for Trino expressions and recently Trino introduced separate IR expressions that carry explicit type
I believe the current design of expressions come from the fact they are used during initial AST (abstract syntax tree) processing where types aren't known yet. The pair (AST, AST → LogicalPlan conversion function) is an implementation of a language. DF implements its own variant of SQL (#12357) and systems building on top of DF may or may not use it (e.g. Ballista probably uses it and Comet wants to be closely aligned with Spark SQL's semantics).
Proposal plan
- introduce datafusion's internal AST layer to support conversion from sqlparser's AST to LogicalPlan
- or maybe simply change the conversion so that it can calculate the types correctly
- let
Exprcarry the type explicitly (that would be "logical type" from [Proposal] Decouple logical from physical types #11513)
jonahgaonotfilippo, jayzhan211, goldmedal and haohuaijin
Metadata
Metadata
Assignees
Labels
No labels