Skip to content

Conversation

@peasee
Copy link

@peasee peasee commented Oct 30, 2024

This pull request introduces several enhancements and fixes to the DataFusion SQL unparser. The key changes include the addition of new methods to retrieve order_by and sort_by expressions, the introduction of a utility function to remove dangling identifiers, and updates to the test cases to cover these new functionalities.

Enhancements to expression retrieval:

  • Added get_order_by method in QueryBuilder to retrieve a cloned list of order_by expressions. (datafusion/sql/src/unparser/ast.rs)
  • Added get_sort_by method in SelectBuilder to retrieve a cloned list of sort_by expressions. (datafusion/sql/src/unparser/ast.rs)

Utility function for identifier management:

  • Introduced remove_dangling_identifiers function to clean up identifiers that do not correspond to any available tables. (datafusion/sql/src/unparser/rewrite.rs)

Plan unparser updates:

  • Modified the Unparser implementation to utilize the new remove_dangling_identifiers function for order_by and sort_by expressions. (datafusion/sql/src/unparser/plan.rs)

Test case updates:

  • Added new test cases to verify the correct handling of order_by expressions and ensure the removal of dangling identifiers. (datafusion/sql/tests/cases/plan_to_sql.rs)

@peasee peasee added the bug Something isn't working label Oct 30, 2024
@peasee peasee self-assigned this Oct 30, 2024
@sgrebnov sgrebnov merged commit 179c25a into spiceai-42 Oct 31, 2024
@sgrebnov sgrebnov deleted the fix/more-dangling-references branch October 31, 2024 00:37
phillipleblanc pushed a commit that referenced this pull request Nov 12, 2024
* fix: More dangling references

* test: Add tests for remove_dangling_identifiers
phillipleblanc pushed a commit that referenced this pull request Nov 12, 2024
* fix: More dangling references

* test: Add tests for remove_dangling_identifiers
phillipleblanc pushed a commit that referenced this pull request Apr 8, 2025
fix: More dangling references (#54)

* fix: More dangling references

* test: Add tests for remove_dangling_identifiers

UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is.
phillipleblanc pushed a commit that referenced this pull request Apr 25, 2025
fix: More dangling references (#54)

* fix: More dangling references

* test: Add tests for remove_dangling_identifiers

UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is.
sgrebnov pushed a commit that referenced this pull request May 22, 2025
fix: More dangling references (#54)

* fix: More dangling references

* test: Add tests for remove_dangling_identifiers

UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is.

# Conflicts:
#	datafusion/sql/src/unparser/ast.rs
#	datafusion/sql/tests/cases/plan_to_sql.rs
sgrebnov pushed a commit that referenced this pull request May 26, 2025
fix: More dangling references (#54)

UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is.
kczimm pushed a commit that referenced this pull request Aug 19, 2025
fix: More dangling references (#54)

UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is.
kczimm pushed a commit that referenced this pull request Aug 21, 2025
fix: More dangling references (#54)

UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is.
Jeadie pushed a commit that referenced this pull request Sep 9, 2025
fix: More dangling references (#54)

UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is.
Jeadie pushed a commit that referenced this pull request Sep 12, 2025
fix: More dangling references (#54)

UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is.
peasee added a commit that referenced this pull request Oct 27, 2025
fix: More dangling references (#54)

UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is.
peasee added a commit that referenced this pull request Oct 27, 2025
* fix: Ensure only tables or aliases that exist are projected (#52)
fix: More dangling references (#54)

UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is.

* Support for metadata columns (`location`, `size`, `last_modified`)  in ListingTableProvider (#74)

UPSTREAM NOTE: This PR was attempted to be upstreamed but was not accepted. Needs to be applied manually
apache#15181

* Infer placeholder datatype for `Expr::InSubquery` (#80)

UPSTREAM NOTE: Upstream PR has been created but not merged yet. Should be available in DF49
apache#15980

* Infer placeholder datatype after `LIMIT` clause as `DataType::Int64` (#81)

UPSTREAM NOTE: Upstream PR has been created but not merged yet. Should be available in DF49
apache#15980

* Do not double alias Exprs

UPSTREAM NOTE: This was attempted to be fixed with
apache#15008 but was closed

This is the tracking issue on DataFusion:
apache#14895
Do not double alias Exprs

* Add prefix to location metadata column (#82)

UPSTREAM NOTE: This will not be upstreamed as is.

* Infer placeholder types for CASE expressions (#87)

UPSTREAM NOTE: This has not been submitted upstream yet.

* Expand `infer_placeholder_types` to infer all possible placeholder types based on their expression (#88)

UPSTREAM NOTE: This has not been submitted upstream yet.

* Fix `Expr::infer_placeholder_types` inference to not fail (#89)

UPSTREAM NOTE: This has not been submitted upstream yet.

* cherry-pick parquet patch (#94)

* Fix array types coercion: preserve child element nullability for list types (#96)

UPSTREAM NOTE: This was submitted upstream and should be available in DF50

apache#17306

* Expand `infer_placeholder_types` to infer all possible placeholder types based on their expression (#88)

UPSTREAM NOTE: This has not been submitted upstream yet.

* do not enforce type guarantees on all Expr traversed in infer_placeholder_types (#97)

* Use UDTF function args in `LogicalPlan::TableScan` name (#98)

* use UDTF function args in LogicalPlan::TableScan name

* update test snapshots

* Implement timestamp_cast_dtype for SqliteDialect (#99)

* Use text for sqlite timestamp

* Add test

* Custom timestamp format for DuckDB (#102)

* Revert "cherry-pick parquet patch (#94)"

This reverts commit d780cc2.

* Support ExprNamed arguments to Scalar UDFs (#104)

* support ExprNamed until 17379 ships

* add same exprnamed lifting to udtf

* resolve projection against `ListingTable` table_schema incl. partition columns (#106)

* fix: Ensure ListingTable partitions are pruned when filters are not used (#108)

* fix: Prune partitions when no filters are defined

* fix: Backport for DF49:

* review: Address comments

* FileScanConfig: Preserve schema metadata across serde boundary (#107)

* FileScanConfig: preserve schema metadata across serde boundary

* add test

* Merge conflict fixes

UPSTREAM NOTE: this should not be upstreamed. This contains conflict fixes from various cherry-picks and differences in v50.

* update arrow-rs fork

UPSTREAM NOTE: this should not be upstreamed

---------

Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech>
Co-authored-by: Kevin Zimmerman <4733573+kczimm@users.noreply.github.com>
Co-authored-by: sgrebnov <sergei.grebnov@gmail.com>
Co-authored-by: jeadie <jack@spice.ai>
Co-authored-by: Jack Eadie <jack.eadie0@gmail.com>
Co-authored-by: Viktor Yershov <krinart@gmail.com>
Co-authored-by: Viktor Yershov <viktor@spice.ai>
Co-authored-by: David Stancu <david@spice.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants