Skip to content

TVF Part 1. Pushdown to connector.#25484

Closed
mohsaka wants to merge 5 commits into
prestodb:masterfrom
mohsaka:tvf-part-1
Closed

TVF Part 1. Pushdown to connector.#25484
mohsaka wants to merge 5 commits into
prestodb:masterfrom
mohsaka:tvf-part-1

Conversation

@mohsaka

@mohsaka mohsaka commented Jul 2, 2025

Copy link
Copy Markdown
Contributor

This is the first part towards implementing Table functions in Presto. Please find the RFC at https://github.com/prestodb/rfcs/pull/39/files

In this PR, the Table function syntax is added to the Presto Parser, AST and RelationPlanner code is added. At this point, we only compile Table functions into TableScan of tables in the respective connector. There will be subsequent PRs to handle Table functions as operators (appearing mid-pipeline) and leaf operators.

Contains changes from #25032.

From commit
Support table function invocation in grammar and AST

to commit
Analyze table function arguments and cleanup

This includes all of the TVF changes required to use table functions with pushdown to connector. This PR also includes as much refactoring changes as I could from future commits.

Description

Allows table functions via pushdown to connector. Documentation will come at a later point after all of the code has been pushed in. Until then, this should be considered experimental. There is very little interaction with existing code.

Documentation from Trino can be followed as we wanted to match their existing API for portability.
https://trino.io/docs/current/functions/table.html

Motivation and Context

Impact

Test Plan

TVF test cases added in TestTableFunctionInvocation.
Parser tests cases added in TestSqlParser and TestSqlParserErrorHandling.
JDBC failure test cases added in TestJdbcDistributedQueries.

Further testing is done in the complete PR.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

If release note is NOT required, use:

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jul 2, 2025
@mohsaka mohsaka changed the title [DNR]Support table function invocation in grammar and AST [DNR] TVF Part 1. Pushdown to connector. Jul 3, 2025
@mohsaka mohsaka force-pushed the tvf-part-1 branch 4 times, most recently from df1b21a to 3fbe3a4 Compare July 3, 2025 20:49
@mohsaka mohsaka marked this pull request as ready for review July 4, 2025 00:04
@prestodb-ci prestodb-ci requested review from a team, ScrapCodes and xin-zhang2 and removed request for a team July 4, 2025 00:04
@github-actions

github-actions Bot commented Jul 4, 2025

Copy link
Copy Markdown

Codenotify: Notifying subscribers in CODENOTIFY files for diff 3545b1d...25fd88e.

Notify File(s)
@aditi-pandit presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@elharo presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@kaikalur presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@rschlussel presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4

@mohsaka mohsaka changed the title [DNR] TVF Part 1. Pushdown to connector. TVF Part 1. Pushdown to connector. Jul 8, 2025
@mohsaka mohsaka requested a review from aditi-pandit July 11, 2025 17:37

@aditi-pandit aditi-pandit left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mohsaka and @xin-zhang2... Have a round of review comments.

* As part of function-specific validation, the Table Function's author might want to:
* - check if the descriptors which reference input tables contain a correct number of column references
* - check if the referenced input columns have appropriate types to fit the function's logic // TODO return request for coercions to the Analyzer in the TableFunctionAnalysis object
* - if there is a descriptor which describes the function's output, check if it matches the shape of the actual function's output

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it should be describes the functions input and not output. Please can you confirm.

@mohsaka mohsaka Jul 30, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Descriptor arguments represent the column name/types of the input not the output. Ex.

exclude_columns(input => table, columns => descriptor) → table
We check to make sure that the columns are present in the table.

        if (!inputNames.containsAll(excludedNames)) {
            String missingColumns = Sets.difference(excludedNames, inputNames).stream()
                    .collect(joining(", ", "[", "]"));
            throw new TrinoException(INVALID_FUNCTION_ARGUMENT, format("Excluded columns: %s not present in the table", missingColumns));
        }

Updated comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


import static java.util.Objects.requireNonNull;

public abstract class AbstractConnectorTableFunction

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write an introductory paragraph about the usage of this class ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


public static class Field
{
private final Optional<String> name;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is field name optional ? What's the use-case for it ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in ReturnType to return anonymous columns.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Can you leave a comment about this in the class ?

return new Builder();
}

public static final class Builder

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess you'll are adding Builders for all classes, else having a one field Builder doesn't seem worth it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for example, ScalarArgument, DescriptorArgument, TableArgument all have builders. So for consistency we have one here.

.map(MockConnectorColumnHandle::getType)
.collect(toImmutableList());
Map<String, Integer> columnIndexes = getColumnIndexes(tableName);
/*

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this code if not needed ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Override
public TableFunctionAnalysis analyze(ConnectorSession session, ConnectorTransactionHandle transaction, Map<String, Argument> arguments)
{
ScalarArgument argument = (ScalarArgument) arguments.get("COLUMN");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you do with the IGNORED column ? Can we remove it if not required ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do nothing with the column. It is an optional column used in the test and will be ingored during table function analysis.

public PlanNode visitTableFunction(TableFunctionNode node, RewriteContext<Void> context)
{
// TODO rewrite sources, and tableArgumentProperties when we add support for input tables
/*

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment if not needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return processChildren(node, context);
}

/*

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this code ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jaystarshot jaystarshot left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash commits into readable parts, could be

  1. Parser changes
  2. Analysis + planner changes
  3. SPI (table function registry etc)
    these can also be separate PRs

@xin-zhang2

Copy link
Copy Markdown
Contributor

@jaystarshot Thanks for the review. I've reorganized the PR into 5 commits.

  1. Syntax and parser support
  2. SPI support
  3. Analysis
  4. Planning and optimization
  5. Test coverage

Could you please take a look? Thanks.

@mohsaka mohsaka force-pushed the tvf-part-1 branch 5 times, most recently from 9d09628 to f76b692 Compare August 1, 2025 06:52
@mohsaka

mohsaka commented Aug 1, 2025

Copy link
Copy Markdown
Contributor Author

@aditi-pandit Addressed the comments. Please take a second look when you have a chance. The commit addressing comments will be merged into their respective commits once complete. Thanks!

@mohsaka mohsaka requested a review from aditi-pandit August 1, 2025 19:07

@aditi-pandit aditi-pandit left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka, @xin-zhang2 : Still looking through all the files. But had 2 comments.

Please also merge the last commit into their respective prior commits.

public final class CatalogSchemaFunctionName
{
private final String catalogName;
private final SchemaFunctionName schemaFunctionName;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need such a special class in the SPI ? CatalogSchemaFunctionName can have all 3 as individual elements.


public static class Field
{
private final Optional<String> name;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Can you leave a comment about this in the class ?

@mohsaka

mohsaka commented Aug 5, 2025

Copy link
Copy Markdown
Contributor Author

@aditi-pandit
For this
Why do we need such a special class in the SPI ? CatalogSchemaFunctionName can have all 3 as individual elements.

It's mainly because we use SchemaFunctionName in the TableFunctionRegistry. So I guess 2 reasons.

  1. We have the SchemaFunctionName class already that is widely used in TableFunctionRegistry. This is because, while in a Catalogs context, the catalog is no longer needed.
  2. It allows for us to access this object directly rather than creating a SchemaFunctionName object each time.

Will add comment for descriptor.

Changes adapted from trino/PR#11336, 12910
Original commit:
2e00c8e64c32d6fdd813999b2e04b3b3415235c8
f0508a7ab420449c6e2960ecf1d0a8d7058242da
Author: kasiafi

Modifications were made to adapt to Presto including:
Addition of KEEP in the parser.
Adjustment of the TestSqlParser.java to apply to Presto concepts.
Switch from Trino's DataType based datatypes to Presto's String based datatypes.

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
mohsaka and others added 4 commits August 5, 2025 13:42
Changes adapted from trino/PR#11336, 12350, 12407, 12476, 12531, 12813
Original commits:
395fd91c6480a993241eeabd9599873d0d05b24b
998315075343beecef962657b8cbf440d53cc13b
712b8e98ff8a726f95295ac539159fc532628273
131dc44af97b31a2fa8115028d98d06671641bfa
0da095e14b0855f89af3c4f254a5a60280fc7170
5310671f80291394b12ba2ea746e4e60051aaff4
18bb60262cb0850cf839c2b20b434344921f5122
4a7d72afb64f93a9748a4c6b4defc2d42bbae000
Author: kasiafi

Modifications were made to adapt to Presto including:
Removal of ConnectorExpression.

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#11336, 12910
Original commits:
9e8d51ad45f57267f5f7fa6bf8e8c4ec56103dda
f0508a7ab420449c6e2960ecf1d0a8d7058242da
Author: kasiafi

Modifications were made to adapt to Presto including:
Removal of Node Location from TrinoException
Added new SemanticErrorCodes
Changed Void context to SqlPlannerContext in RelationPlanner.java
Add newUnqualified to Field class with Presto specification
Add getCanonicalValue to Identifier.java

Co-authored-by: Pratik Joseph Dabre <pdabre12@gmail.com>
Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#11336
Original commit:
d4c73389bbdb6b48c24a0969b259286b05a99ade
565700985baff0c4b29fdb1e3e26139a29318b9e
ec8b9fd2b2cc9c8bc78c0ca1317dc34fcf2c48c7
Author: kasiafi

Modifications were made to adapt to Presto including:
Change CatalogName to ConnectorId
Change Symbol to VariableReferenceExpression
TableFunctionNode extends InternalPlanNode instead of PlanNode.
Add applyTableFunction to all implementations of Metadata
Add empty ConnectorTableLayoutHandle to TableHandle in MetadataManger::applyTableFunction
Removal of PlannerContext and replaced with Metadata

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#12951, 14175

Original commits:
98fc1ee8b29fca86f2a1b3abe4989524940333a6
1aea489884346822c812b1a242acc286e3e1248e
8bd17171a8469b9351e2fd7d9f2f49f4af9ea209
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Co-authored-by: Pratik Joseph Dabre <pdabre12@gmail.com>
Co-authored-by: Xin Zhang <desertsxin@gmail.com>

@pdabre12 pdabre12 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka Thanks for this code.
Reviewed commit: Support table function invocation in grammar and AST.

@pdabre12 pdabre12 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka some more comments on the commit which adds SPI support.

function.getName().toLowerCase(ENGLISH)),
new TableFunctionMetadata(catalogName, function));
}
checkState(tableFunctions.putIfAbsent(catalogName, builder.buildOrThrow()) == null, "Table functions already registered for catalog: " + catalogName);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move this check before we validate all the table functions.
Something simple like,

if (tableFunctions.containsKey(catalogName) {
     error out saying ("Table functions already registered for catalog:");
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in new PR

}
}

return null;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge fan of returning null, is it possible to change the method to return an Optional instead?

INVALID_ROW_FILTER(0x0000_0034, USER_ERROR),
INVALID_COLUMN_MASK(0x0000_0035, USER_ERROR),
DATATYPE_MISMATCH(0x0000_0036, USER_ERROR),
MISSING_CATALOG_NAME(0x0000_0037, USER_ERROR),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error code is too vague, i think the name needs to be tied to the table function registry for clarity

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to SESSION_CATALOG_NOT_SET

{
Analysis.TableFunctionInvocationAnalysis functionAnalysis = analysis.getTableFunctionAnalysis(node);

// TODO handle input relations:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not clear yo me but looks like only scalar arguments work and table within table won't so can we fail gracefully for unsupported cases?

import static com.google.common.base.Preconditions.checkState;
import static java.util.Objects.requireNonNull;

public class RewriteTableFunctionToTableScan

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this rule and corresponding SPI calls? I think we can use Connector optimizers (https://prestodb.io/blog/2019/12/23/improve-presto-planner/) which are present in presto and not trino so we can potentially skip a lot of additions to metadata?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess i get the changes here, metadata might be the better interface to override in this case

@Override
public PlanWithProperties visitTableFunction(TableFunctionNode node, PreferredProperties preferredProperties)
{
throw new UnsupportedOperationException("execution by operator is not yet implemented for table function " + node.getName());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need this, by default it will throw unsupported

checkArgument(!tableFunction.getName().isEmpty(), "table function name is empty");
checkArgument(!tableFunction.getSchema().isEmpty(), "table function schema name is empty");

Set<String> argumentNames = new HashSet<>();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems inefficient having 2 for loops doing same thing

`Set seenNames = new HashSet<>();
int tableArgumentsWithRowSemantics = 0;

for (ArgumentSpecification spec : tableFunction.getArguments()) {
    // Check for duplicate names
    if (!seenNames.add(spec.getName())) {
        throw new IllegalArgumentException("duplicate argument name: " + spec.getName());
    }
    
    // Count table arguments with row semantics
    if (spec instanceof TableArgumentSpecification && 
        ((TableArgumentSpecification) spec).isRowSemantics()) {
        tableArgumentsWithRowSemantics++;
    }
}

checkArgument(tableArgumentsWithRowSemantics <= 1, "more than one table argument with row semantics");

`

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We could also exit early if tableArgumentsWithRowSemantics > 1 but I think we should keep it where it throws all duplicate arguments first. So followed your example.

@jaystarshot jaystarshot left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR gets too big to review if we had a separate PR for parser only etc we could involve a lot more folks like @kaikalur for the review/advice

@mohsaka

mohsaka commented Aug 13, 2025

Copy link
Copy Markdown
Contributor Author

I think the PR gets too big to review if we had a separate PR for parser only etc we could involve a lot more folks like @kaikalur for the review/advice

Thank you for the review. Will separate it out into further PRs

@mohsaka

mohsaka commented Aug 18, 2025

Copy link
Copy Markdown
Contributor Author

Closed as PR will be opened in multiple parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants