Skip to content

Proposal: introduced typed expressions, separate AST and IR #12604

@findepi

Description

@findepi

Currently DataFusion expressions (Expr) do not know their type. The type can be calculated via <Expr as ExprSchemable::get_type function.

This has drawbacks unfortunately

  • design: DF being reusable, it needs to account for different coercion rules and type logic of systems building on top of DF. Explicit (provided at construction time) expression type would better support this goal.
  • perf: as we add more and more optimizers, asking for expression type is going to be a common operation, slowing down planning process
    • anecdotally, this was a problem for Trino expressions and recently Trino introduced separate IR expressions that carry explicit type

I believe the current design of expressions come from the fact they are used during initial AST (abstract syntax tree) processing where types aren't known yet. The pair (AST, AST → LogicalPlan conversion function) is an implementation of a language. DF implements its own variant of SQL (#12357) and systems building on top of DF may or may not use it (e.g. Ballista probably uses it and Comet wants to be closely aligned with Spark SQL's semantics).

Proposal plan

  • introduce datafusion's internal AST layer to support conversion from sqlparser's AST to LogicalPlan
    • or maybe simply change the conversion so that it can calculate the types correctly
  • let Expr carry the type explicitly (that would be "logical type" from [Proposal] Decouple logical from physical types #11513)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions