-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Support Non-Literal Expressions in Substrait VirtualTable Values and Improve Round-Trip Robustness #18866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Non-Literal Expressions in Substrait VirtualTable Values and Improve Round-Trip Robustness #18866
Conversation
Add support for literal rows in VirtualTable.values while routing general expressions through handle_expr. Implement schema width validation to align with the consumer path. Introduce a regression test for round-tripping VALUES plans with scalar functions, normalizing aliases for comparison with the original schema and structure.
… clarity and maintainability
…ing nested expression rows
| | LogicalPlan::Explain(_) | ||
| | LogicalPlan::Dml(_) | ||
| | LogicalPlan::Copy(_) | ||
| | LogicalPlan::DescribeTable(_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix this error
This feature is not implemented: Unsupported plan type: DescribeTable { schema: Schema ...
[SQL] describe data;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a test for converting DESCRIBE to substrait?
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kosiew -- the code looks good to me
I left some suggestions on how to make the tests easier to read but I don't think they are required to merge
|
|
||
| let actual = substrait_roundtrip(&plan, &ctx).await?; | ||
|
|
||
| let normalize = |plan: &LogicalPlan| match plan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comments about why normalize is needed? It looks to me like it just clones the input expressions and I had to look carefully to find the one line that removes aliases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to strip_aliases_from_values
| .iter() | ||
| .map(|row| { | ||
| row.iter() | ||
| .map(|expr| match expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it is stripping the aliases -- maybe you could use https://docs.rs/datafusion/latest/datafusion/prelude/enum.Expr.html#method.unalias instead?
You could probably also use the tree node APIs, for example https://docs.rs/datafusion-expr/latest/datafusion_expr/logical_plan/enum.LogicalPlan.html#method.map_expressions to rewrite all expressions in the node more succinctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL
Thanks
| | LogicalPlan::Explain(_) | ||
| | LogicalPlan::Dml(_) | ||
| | LogicalPlan::Copy(_) | ||
| | LogicalPlan::DescribeTable(_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a test for converting DESCRIBE to substrait?
| #[tokio::test] | ||
| async fn roundtrip_values_with_scalar_function() -> Result<()> { | ||
| let ctx = create_context().await?; | ||
| let expr = map(vec![lit("a")], vec![lit(1)]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might help to call out here that map is a function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
// datafusion::functions_nested::map::map;
Add the missing import for Transformed in datafusion. Change the closure to return `Ok(Transformed::yes(expr.unalias()))` to match the expected Result type. Adjust result handling by using .map to extract LogicalPlan from Transformed, cloning the plan before passing it to map_expressions to ensure ownership is correctly handled.
|
@alamb |
Which issue does this PR close?
Rationale for this change
This PR addresses failures in Substrait round‑trip mode where Virtual Tables were limited to accepting only literals or aliased literals. Several SQLLogicTests—especially those involving nested or scalar functions such as
map()—failed because expressions were rejected prematurely.Allowing expression-based rows when exporting Virtual Tables is essential to making Substrait → DataFusion → Substrait round‑trips stable and representative of real workloads. Fixing this improves test coverage, unblocks Substrait integration efforts, and eliminates a major source of round‑trip inconsistencies.
Before
After
What changes are included in this PR?
convert_literal_rowsandconvert_expression_rowshelpers to properly distinguish literal-only rows from expression-based rows.Structexpressions (SubstraitNestedStruct) for non-literal Virtual Table values.from_valuesto dynamically choose between literal structs and nested structs.roundtrip_values_with_scalar_function.substrait_roundtriphelper to reduce duplication.LogicalPlan::DescribeTable(_)handling in the round‑trip engine.Are these changes tested?
Yes. New test coverage includes:
VALUESclauses.substrait_roundtripacross multiple existing tests to ensure consistent behavior.Are there any user-facing changes?
No direct API-breaking changes. The behavior of Substrait round‑trip conversion improves and becomes more permissive regarding expressions in Virtual Tables. This is internal to Substrait/DataFusion integration and does not change SQL semantics or user-level APIs.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.