Skip to content

Conversation

@y-f-u
Copy link

@y-f-u y-f-u commented Jul 16, 2024

Why this happens:

ast parser will generate a logic plan for derived table with columns

roundtrip sql: SELECT c.id FROM (SELECT j1.j1_id FROM j1) AS c (id)
plan Projection: c.id
  SubqueryAlias: c
    Projection: j1.j1_id AS id
      Projection: j1.j1_id
        TableScan: j1

However the unparser wasn't able to understand the context of two projections are a result of derived table columns.
In contract, the other type of derived table will generate plan like below which unparser was capable to handle.

roundtrip sql: SELECT c.id FROM (SELECT j1.j1_id AS id FROM j1) AS c
plan Projection: c.id
  SubqueryAlias: c
    Projection: j1.j1_id AS id
      TableScan: j1

The fix is to have a strict rule to detect if a subquery plan matches the derived table with columns pattern, then unwrap the first layer alias projection into table alias with columns which ast has already supported.

Background

The error was detected from the roundtrip of tpch-13 query

select
    c_count,
    count(*) as custdist
from
    (
        select
            c_custkey,
            count(o_orderkey)
        from
            customer left outer join orders on
                        c_custkey = o_custkey
                    and o_comment not like '%special%requests%'
        group by
            c_custkey
    ) as c_orders (c_custkey, c_count)
group by
    c_count
order by
    custdist desc,
    c_count desc;

related to the issue in spiceai/datafusion-federation#11

@y-f-u y-f-u force-pushed the unparser-for-tpch-query branch 2 times, most recently from 903ffd1 to 37fc43f Compare July 17, 2024 01:58
@y-f-u y-f-u marked this pull request as ready for review July 17, 2024 01:58
@y-f-u y-f-u force-pushed the unparser-for-tpch-query branch from 37fc43f to 51156f1 Compare July 17, 2024 01:59
@y-f-u y-f-u changed the title reproduce test for nested select without alias fix: unparser generates wrong sql for derived table with columns Jul 17, 2024
@y-f-u y-f-u force-pushed the unparser-for-tpch-query branch from 51156f1 to 9ab73c5 Compare July 17, 2024 02:31
@y-f-u y-f-u merged commit 1d5fe15 into spiceai-40 Jul 17, 2024
@y-f-u y-f-u deleted the unparser-for-tpch-query branch July 17, 2024 05:33
y-f-u added a commit that referenced this pull request Jul 17, 2024
* fix unparser for derived table with columns

* refactoring

* renaming

* case in tests
y-f-u added a commit that referenced this pull request Jul 22, 2024
apache#11505)

* fix unparser for derived table with columns

* refactoring

* renaming

* case in tests
phillipleblanc pushed a commit that referenced this pull request Apr 22, 2025
… ParquetOpener (apache#15561)

* parquet reader: move pruning predicate creation from ParquetSource to ParquetOpener

* use file schema, avoid loading page index if unecessary

* Add comment

* add comment

* Add comment

* remove check

* fix clippy

* update sqllogictest

* restore to explain plans

* reverted

* modify access

* Fix ArrowReaderOptions should read with physical_file_schema so we do… (#17)

* Fix ArrowReaderOptions should read with physical_file_schema so we don't need to cast back to utf8

* Fix fmt

* Update opener.rs

* Always apply per-file schema during parquet read (#18)

* Update datafusion/datasource-parquet/src/opener.rs

---------

Co-authored-by: Qi Zhu <821684824@qq.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants