Skip to content

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Nov 21, 2025

Which issue does this PR close?

Rationale for this change

This PR addresses failures in Substrait round‑trip mode where Virtual Tables were limited to accepting only literals or aliased literals. Several SQLLogicTests—especially those involving nested or scalar functions such as map()—failed because expressions were rejected prematurely.

Allowing expression-based rows when exporting Virtual Tables is essential to making Substrait → DataFusion → Substrait round‑trips stable and representative of real workloads. Fixing this improves test coverage, unblocks Substrait integration efforts, and eliminates a major source of round‑trip inconsistencies.

Before


❯ cargo test --test sqllogictests -- --substrait-round-trip map.slt:388
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.66s
     Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-d29eafa7e841495b)
Completed 1 test files in 0 seconds                                                                  External error: 2 errors in file /Users/kosiew/GitHub/datafusion/datafusion/sqllogictest/test_files/map.slt

1. query failed: DataFusion error: This feature is not implemented: Unsupported plan type: DescribeTable { schema: Schema ...
[SQL] describe data;
at /Users/kosiew/GitHub/datafusion/datafusion/sqllogictest/test_files/map.slt:43


2. query failed: DataFusion error: Substrait error: Only literal types and aliases are supported in Virtual Tables, got: ScalarFunction
[SQL] VALUES (MAP(['a'], [1])), (MAP(['b'], [2])), (MAP(['c', 'a'], [3, 1]))
at /Users/kosiew/GitHub/datafusion/datafusion/sqllogictest/test_files/map.slt:388


After

❯ cargo test --test sqllogictests -- --substrait-round-trip map.slt:388
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.50s
     Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-62df8e0045f6176f)
Completed 1 test files in 0 seconds

What changes are included in this PR?

  • Introduces convert_literal_rows and convert_expression_rows helpers to properly distinguish literal-only rows from expression-based rows.
  • Adds support for generating nested Struct expressions (Substrait NestedStruct) for non-literal Virtual Table values.
  • Updates from_values to dynamically choose between literal structs and nested structs.
  • Ensures schema consistency validation for expression-based rows.
  • Adds new round‑trip test cases, including roundtrip_values_with_scalar_function.
  • Refactors tests to use a new substrait_roundtrip helper to reduce duplication.
  • Adds LogicalPlan::DescribeTable(_) handling in the round‑trip engine.

Are these changes tested?

Yes. New test coverage includes:

  • A dedicated round‑trip test for scalar functions inside VALUES clauses.
  • Use of substrait_roundtrip across multiple existing tests to ensure consistent behavior.
  • Validation of expression-based row handling, alias stripping, and schema equivalence.

Are there any user-facing changes?

No direct API-breaking changes. The behavior of Substrait round‑trip conversion improves and becomes more permissive regarding expressions in Virtual Tables. This is internal to Substrait/DataFusion integration and does not change SQL semantics or user-level APIs.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

Add support for literal rows in VirtualTable.values while
routing general expressions through handle_expr. Implement
schema width validation to align with the consumer path.
Introduce a regression test for round-tripping VALUES plans
with scalar functions, normalizing aliases for comparison
with the original schema and structure.
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate labels Nov 21, 2025
| LogicalPlan::Explain(_)
| LogicalPlan::Dml(_)
| LogicalPlan::Copy(_)
| LogicalPlan::DescribeTable(_)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix this error

This feature is not implemented: Unsupported plan type: DescribeTable { schema: Schema ...
[SQL] describe data;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test for converting DESCRIBE to substrait?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kosiew -- the code looks good to me

I left some suggestions on how to make the tests easier to read but I don't think they are required to merge


let actual = substrait_roundtrip(&plan, &ctx).await?;

let normalize = |plan: &LogicalPlan| match plan {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments about why normalize is needed? It looks to me like it just clones the input expressions and I had to look carefully to find the one line that removes aliases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to strip_aliases_from_values

.iter()
.map(|row| {
row.iter()
.map(|expr| match expr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is stripping the aliases -- maybe you could use https://docs.rs/datafusion/latest/datafusion/prelude/enum.Expr.html#method.unalias instead?

You could probably also use the tree node APIs, for example https://docs.rs/datafusion-expr/latest/datafusion_expr/logical_plan/enum.LogicalPlan.html#method.map_expressions to rewrite all expressions in the node more succinctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL
Thanks

| LogicalPlan::Explain(_)
| LogicalPlan::Dml(_)
| LogicalPlan::Copy(_)
| LogicalPlan::DescribeTable(_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test for converting DESCRIBE to substrait?

#[tokio::test]
async fn roundtrip_values_with_scalar_function() -> Result<()> {
let ctx = create_context().await?;
let expr = map(vec![lit("a")], vec![lit(1)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help to call out here that map is a function call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

//  datafusion::functions_nested::map::map;

@kosiew kosiew marked this pull request as draft November 24, 2025 11:49
Add the missing import for Transformed in datafusion.
Change the closure to return `Ok(Transformed::yes(expr.unalias()))`
to match the expected Result type. Adjust result handling by
using .map to extract LogicalPlan from Transformed, cloning
the plan before passing it to map_expressions to ensure
ownership is correctly handled.
@kosiew
Copy link
Contributor Author

kosiew commented Nov 24, 2025

@alamb
Thanks for the review and feedback.

@kosiew kosiew marked this pull request as ready for review November 24, 2025 23:46
@kosiew kosiew added this pull request to the merge queue Nov 25, 2025
Merged via the queue into apache:main with commit 2dbd9c7 Nov 25, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[substrait] [sqllogictest] Only literal types and aliases are supported in Virtual Tables, got: ScalarFunction

2 participants