Skip to content

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Dec 19, 2025

Which issue does this PR close?

Rationale for this change

Substrait round-trip tests and ~35 sqllogictest cases were failing with errors like "No table named 'generate_series()'" when deserializing a Substrait plan back into a DataFusion logical plan.

The root cause is that table function invocations (e.g. generate_series, range) were being serialized as ordinary ReadRel::NamedTable scans, so the consumer attempted to resolve a table named like the function call rather than reconstructing a table function scan.

This change adds a stable mechanism to preserve table-function intent and evaluated arguments in the Substrait plan, enabling correct reconstruction during consumption and eliminating the "No table named ..." failures.

What changes are included in this PR?

  • Catalog API extension:

    • Added TableProvider::table_function_details() (default None) and a TableFunctionDetails struct carrying:

      • function name (&'static str)
      • evaluated argument literals (&[ScalarValue])
    • Kept a convenience table_function_name() helper that delegates to table_function_details().

  • Expose shared scalar constant:

    • Made datafusion_common::scalar::NANOS_PER_DAY public and reused it in date handling to avoid local redefinition.
  • Built-in table function provider updates:

    • Updated GenerateSeriesTable to cache evaluated arguments and to implement table_function_details() so the producer can serialize table function calls without downcasting.
    • Refactored argument parsing to consistently treat NULL / missing inputs and to always represent table function calls with a complete 3-argument list (start, end, step) where applicable.
  • Substrait producer support (logical plan):

    • Added a ReadRel.advanced_extension payload when a TableScan originates from a table function provider.

    • Introduced a dedicated protobuf message TableFunctionReadRelExtension encoded as google.protobuf.Any using a shared type_url constant:

      • type.googleapis.com/datafusion.substrait.TableFunctionReadRel
    • This payload records the table function name and evaluated argument literals (after defaults/coercions).

  • Substrait consumer support (logical plan):

    • Added parsing of the ReadRel.advanced_extension payload and reconstruction of the table function scan by:

      • looking up a registered table function by name (SubstraitConsumer::get_table_function)
      • calling create_table_provider with literal Expr arguments
    • Added a provider_override path so a reconstructed provider can be used even when the NamedTable reference would not resolve.

  • Code organization:

    • Moved duplicated VirtualTable literal conversion helpers into logical_plan::utils (producer + consumer), reducing duplication and improving maintainability.
  • Dependencies:

    • Added datafusion-functions-table dependency to datafusion-substrait so built-in table function behavior (and tests) are available in round-trip contexts.
  • Tests:

    • Added a producer unit test ensuring generate_series table scans emit the expected advanced_extension payload and argument normalization.

    • Added an integration test table_function_roundtrip.rs validating round-trip correctness for:

      • generate_series(1, 4)
      • range(2, 8, 3)
      • generate_series(NULL, 5)

Are these changes tested?

Before

❯ cargo test --test sqllogictests -- --substrait-round-trip array.slt:6472
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.78s
     Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-3a351d3176e1df99)
Completed 2 test files in 1 second                                                   External error: 1 errors in file /Users/kosiew/GitHub/datafusion/datafusion/sqllogictest/test_files/array.slt

1. query failed: DataFusion error: Error during planning: No table named 'generate_series()'

After

❯ cargo test --test sqllogictests -- --substrait-round-trip array.slt:6472
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.58s
     Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-3a351d3176e1df99)
Completed 2 test files in 1 second

Yes.

  • Added a producer-side unit test that verifies the ReadRel.advanced_extension is present, uses the expected type URL, and encodes normalized arguments for generate_series(1, 3).
  • Added new Substrait round-trip integration tests that serialize → deserialize → execute and compare results for generate_series and range, including a NULL-argument case.

These tests exercise both the new producer metadata emission path and the new consumer reconstruction path, covering the scenario that caused the sqllogictest failures.

Are there any user-facing changes?

  • Improved Substrait round-trip behavior for queries using built-in table functions like generate_series and range. Previously, such plans could fail to round-trip with "No table named ..." errors; with this change they can be reconstructed and executed correctly.

  • New (optional) API hook for custom table providers:

    • TableProvider::table_function_details() allows custom table-function providers to participate in Substrait serialization without needing producer-side downcasting.

If this new method is considered a public API surface change, the PR should carry the api change label.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

Update GenerateSeries table functions to log evaluated argument
lists, including defaults and nulls. Expose them through
table_function_details for accurate scans.

Enhance Substrait ReadRel serialization to support generate_series
scans by emitting an advanced extension with function name and
arguments, documenting the payload format.

Revise physical plan deserialization to initialize
generate_series tables with evaluated arguments and null
metadata, maintaining context of the table function.
Parse advanced extension metadata to detect table function
invocations, resolve arguments, and build scans with the
resulting providers. Introduce a lookup hook in
SubstraitConsumer for table functions with a default
implementation linked to session state registrations.
Implement tests for generate_series and range, ensuring the
correctness of serialized plan metadata and query results.
Register the new suite in the Substrait test module manifest.
Use the public byte slice accessor to decode table function
metadata, eliminating the need for private APIs. Directly return
planning errors for missing table functions to avoid nested
Result types and potential compilation failures.
- Change read location from NamedTable.advanced_extension
  to ReadRel.advanced_extension.
- Rename TableFunctionMetadata to
  TableFunctionReadRelExtension for consistency.
- Update argument type from Vec<Expression> to
  Vec<Literal>.
- Implement type URL validation to check
  type_url == TABLE_FUNCTION_TYPE_URL before decoding.
- Properly convert literals using from_substrait_literal()
  and wrap in Expr::Literal(value, None).
Eliminate the arg_count field from GenSeriesArgs::ContainsNull
and streamline the code. Update evaluated_args() to use a fixed
count of 3 NULLs, in line with the protobuf schema. Also unify
error handling by replacing plan_datafusion_err! with
plan_err! in read_rel.rs. Fix README formatting by removing
empty backticks for improved readability.
@github-actions github-actions bot added substrait Changes to the substrait crate catalog Related to the catalog crate common Related to common crate proto Related to proto crate functions Changes to functions implementation labels Dec 19, 2025
@github-actions github-actions bot removed the proto Related to proto crate label Dec 19, 2025
@kosiew kosiew marked this pull request as ready for review December 22, 2025 07:02
@kosiew kosiew requested a review from gabotechs December 22, 2025 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate common Related to common crate functions Changes to functions implementation substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[substrait] [sqllogictest] Error during planning: No table named 'generate_series()'

1 participant