Skip to content

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Sep 11, 2025

Summary

Problem

PR #17295 added support for FROM-first syntax (e.g., FROM table without SELECT). To enable this, it added logic to insert a wildcard projection when the projection list was empty.

However, this caused a regression: SELECT FROM table (which is valid SQL meaning "select no columns") was also getting the wildcard projection added, causing it to return all columns instead of an empty result set.

Solution

This PR removes the automatic wildcard projection addition for empty projections. This fixes the regression where SELECT FROM table was incorrectly returning all columns.

Note: This temporarily breaks FROM-first syntax support. A proper fix would require distinguishing between FROM table and SELECT FROM table at the parser level, which sqlparser currently doesn't support. This should be addressed in a follow-up PR.

Test Changes

The test in projection.slt now correctly shows that SELECT FROM table WHERE ... produces an empty Projection node in the logical plan, while the optimizer still pushes down column requirements to the TableScan for efficiency.

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

When using 'SELECT FROM table WHERE condition', the query should create
an empty projection (no columns) while still filtering rows. This was
broken by PR apache#17295 which added FROM-first syntax support.

The issue was that both 'FROM table' and 'SELECT FROM table' resulted
in empty projection lists, making them indistinguishable. The fix checks
for the presence of a WHERE clause to differentiate:
- 'FROM table' (no WHERE) -> add wildcard projection (all columns)
- 'SELECT FROM table WHERE ...' -> keep empty projection

Also updates the test expectation to correctly show the empty Projection
node in the query plan.

Fixes apache#17513
@github-actions github-actions bot added sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels Sep 11, 2025
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix before merging

@adriangb
Copy link
Contributor Author

@simonvandel and @Jefffrey could you review please?

@Jefffrey
Copy link
Contributor

I wasn't aware we allowed SELECT FROM t1 WHERE ... syntax, omitting columns entirely. I checked out your branch and observed this behaviour which doesn't seem consistent:

> select from t1;
+---+---+
| a | b |
+---+---+
| 1 | 2 |
+---+---+
1 row(s) fetched.
Elapsed 0.007 seconds.

> select from t1 where 1=0;
0 row(s) fetched.
Elapsed 0.006 seconds.
  • Using the same t1 table as from the added slt test

So SELECT FROM t1 would return all columns but adding a WHERE clause now omits all columns? Is this intended?

@adriangb
Copy link
Contributor Author

Hmm good point. I agree with you. And actually select a from t1 where t1.a > 1; and select b from t1 where t1.a > 1; work as expected.

@xudong963 I think your original report may be a non issue?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @adriangb

@adriangb adriangb changed the title Fix regression in SELECT FROM syntax with WHERE clause Fix regression: SELECT FROM syntax should return empty projection Sep 11, 2025
Removes automatic wildcard projection for empty projections, fixing
the regression where `SELECT FROM table` incorrectly returned all
columns instead of empty projection.

Note: This temporarily breaks FROM-first syntax. A proper fix would
require distinguishing between `FROM table` and `SELECT FROM table`
at the parser level.

Fixes apache#17513

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a "non-negotiable" test

// incorrectly returned all columns instead of empty projection.
// For now, we don't add wildcard, which fixes the regression but breaks FROM-first.
// A proper fix would need to distinguish between "FROM table" and "SELECT FROM table"
// at the parser level, which sqlparser currently doesn't support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the comment is needed. The test is an automated comment.
This PR is logically a revert + regression test.


query TT
explain format indent
select from t1 where t1.a > 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan tests are cool, but plan changes are easy to overlook. They need manual verification.
Query results-based tests are less disputable.

Can we also run this query and assert the result?
sqllogic doesn't support this (maybe we would need to change the rendering or result lines to make them non-empty).
If slt doesn't allow testing this, then we need a test in Rust.

@adriangb
Copy link
Contributor Author

Yeah it seems like what this is coming to is:

  • We can't differentiate between SELECT FROM t; and FROM t;
  • We want the opposite behavior between them (agreed?)
  • Thus we need to revert Support from-first SQL syntax #17295 and add a regression test

I'd like to get some input from the authors of #17295 before we proceed down that path. Then I agree @findepi we can add a better regression test that asserts against the relevant bit of the plan in Rust.

@github-actions github-actions bot added the core Core DataFusion crate label Sep 11, 2025
@adriangb adriangb changed the title Fix regression: SELECT FROM syntax should return empty projection Revert #17295 (Support from-first SQL syntax) Sep 11, 2025
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @adriangb some minors

adriangb and others added 2 commits September 11, 2025 10:51
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Copy link
Contributor

@simonvandel simonvandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the regression I introduced!

@Jefffrey
Copy link
Contributor

Need to revert doc changes too:

The `FROM` clause can also come before the `SELECT` clause.
Example:
```sql
FROM table AS t
SELECT t.a
```
If the `SELECT` clause is omitted, the `FROM` clause will return all columns from the table.
```sql
FROM table
```

@findepi
Copy link
Member

findepi commented Sep 12, 2025

  • We want the opposite behavior between them (agreed?)

agreed.

BTW Trino has the following syntax:

-- equivalent of SELECT * FROM my_table
TABLE my_table;

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 12, 2025
@adriangb
Copy link
Contributor Author

Need to revert doc changes too:

done!

@alamb
Copy link
Contributor

alamb commented Sep 12, 2025

Thank you @adriangb

@alamb alamb merged commit 5fd831f into apache:main Sep 12, 2025
29 checks passed
alamb pushed a commit to alamb/datafusion that referenced this pull request Sep 12, 2025
* Add failing test

* Fix regression in SELECT FROM syntax with WHERE clause

When using 'SELECT FROM table WHERE condition', the query should create
an empty projection (no columns) while still filtering rows. This was
broken by PR apache#17295 which added FROM-first syntax support.

The issue was that both 'FROM table' and 'SELECT FROM table' resulted
in empty projection lists, making them indistinguishable. The fix checks
for the presence of a WHERE clause to differentiate:
- 'FROM table' (no WHERE) -> add wildcard projection (all columns)
- 'SELECT FROM table WHERE ...' -> keep empty projection

Also updates the test expectation to correctly show the empty Projection
node in the query plan.

Fixes apache#17513

* Revert

* Fix regression: SELECT FROM syntax should return empty projection

Removes automatic wildcard projection for empty projections, fixing
the regression where `SELECT FROM table` incorrectly returned all
columns instead of empty projection.

Note: This temporarily breaks FROM-first syntax. A proper fix would
require distinguishing between `FROM table` and `SELECT FROM table`
at the parser level.

Fixes apache#17513

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* add a better regression test

* remove comment

* fmt

* Update datafusion/sqllogictest/test_files/projection.slt

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

* Update datafusion/core/tests/sql/select.rs

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

* revert docs

* fmt

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
@alamb
Copy link
Contributor

alamb commented Sep 12, 2025

This one looks good thanks everyone!

I also made a backport PR to branch-50 branch so we can include it in the RC

@adriangb
Copy link
Contributor Author

@simonvandel I'm sorry we had to revert your contribution. I hope you're able to contribute again and now that we have the regression test and clarity about the behavior it should be okay to re-add your feature.

@adriangb adriangb deleted the fix-proj branch September 12, 2025 20:18
@xudong963
Copy link
Member

Thank you all! Sorry, I didn't get a chance to review it.

xudong963 pushed a commit that referenced this pull request Sep 13, 2025
* Add failing test

* Fix regression in SELECT FROM syntax with WHERE clause

When using 'SELECT FROM table WHERE condition', the query should create
an empty projection (no columns) while still filtering rows. This was
broken by PR #17295 which added FROM-first syntax support.

The issue was that both 'FROM table' and 'SELECT FROM table' resulted
in empty projection lists, making them indistinguishable. The fix checks
for the presence of a WHERE clause to differentiate:
- 'FROM table' (no WHERE) -> add wildcard projection (all columns)
- 'SELECT FROM table WHERE ...' -> keep empty projection

Also updates the test expectation to correctly show the empty Projection
node in the query plan.

Fixes #17513

* Revert

* Fix regression: SELECT FROM syntax should return empty projection

Removes automatic wildcard projection for empty projections, fixing
the regression where `SELECT FROM table` incorrectly returned all
columns instead of empty projection.

Note: This temporarily breaks FROM-first syntax. A proper fix would
require distinguishing between `FROM table` and `SELECT FROM table`
at the parser level.

Fixes #17513

🤖 Generated with [Claude Code](https://claude.ai/code)



* add a better regression test

* remove comment

* fmt

* Update datafusion/sqllogictest/test_files/projection.slt



* Update datafusion/core/tests/sql/select.rs



* revert docs

* fmt

---------

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
samueleresca pushed a commit to samueleresca/datafusion that referenced this pull request Sep 15, 2025
* Add failing test

* Fix regression in SELECT FROM syntax with WHERE clause

When using 'SELECT FROM table WHERE condition', the query should create
an empty projection (no columns) while still filtering rows. This was
broken by PR apache#17295 which added FROM-first syntax support.

The issue was that both 'FROM table' and 'SELECT FROM table' resulted
in empty projection lists, making them indistinguishable. The fix checks
for the presence of a WHERE clause to differentiate:
- 'FROM table' (no WHERE) -> add wildcard projection (all columns)
- 'SELECT FROM table WHERE ...' -> keep empty projection

Also updates the test expectation to correctly show the empty Projection
node in the query plan.

Fixes apache#17513

* Revert

* Fix regression: SELECT FROM syntax should return empty projection

Removes automatic wildcard projection for empty projections, fixing
the regression where `SELECT FROM table` incorrectly returned all
columns instead of empty projection.

Note: This temporarily breaks FROM-first syntax. A proper fix would
require distinguishing between `FROM table` and `SELECT FROM table`
at the parser level.

Fixes apache#17513

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* add a better regression test

* remove comment

* fmt

* Update datafusion/sqllogictest/test_files/projection.slt

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

* Update datafusion/core/tests/sql/select.rs

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

* revert docs

* fmt

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate documentation Improvements or additions to documentation sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression: projection pushdown doesn't work as expected in DF50

7 participants