TVF Part 1. Pushdown to connector.#25484
Conversation
df1b21a to
3fbe3a4
Compare
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 3545b1d...25fd88e.
|
aditi-pandit
left a comment
There was a problem hiding this comment.
Thanks @mohsaka and @xin-zhang2... Have a round of review comments.
| * As part of function-specific validation, the Table Function's author might want to: | ||
| * - check if the descriptors which reference input tables contain a correct number of column references | ||
| * - check if the referenced input columns have appropriate types to fit the function's logic // TODO return request for coercions to the Analyzer in the TableFunctionAnalysis object | ||
| * - if there is a descriptor which describes the function's output, check if it matches the shape of the actual function's output |
There was a problem hiding this comment.
Think it should be describes the functions input and not output. Please can you confirm.
There was a problem hiding this comment.
Agreed. Descriptor arguments represent the column name/types of the input not the output. Ex.
exclude_columns(input => table, columns => descriptor) → table
We check to make sure that the columns are present in the table.
if (!inputNames.containsAll(excludedNames)) {
String missingColumns = Sets.difference(excludedNames, inputNames).stream()
.collect(joining(", ", "[", "]"));
throw new TrinoException(INVALID_FUNCTION_ARGUMENT, format("Excluded columns: %s not present in the table", missingColumns));
}
Updated comment.
|
|
||
| import static java.util.Objects.requireNonNull; | ||
|
|
||
| public abstract class AbstractConnectorTableFunction |
There was a problem hiding this comment.
Can you write an introductory paragraph about the usage of this class ?
|
|
||
| public static class Field | ||
| { | ||
| private final Optional<String> name; |
There was a problem hiding this comment.
Why is field name optional ? What's the use-case for it ?
There was a problem hiding this comment.
It's used in ReturnType to return anonymous columns.
There was a problem hiding this comment.
Makes sense. Can you leave a comment about this in the class ?
| return new Builder(); | ||
| } | ||
|
|
||
| public static final class Builder |
There was a problem hiding this comment.
Guess you'll are adding Builders for all classes, else having a one field Builder doesn't seem worth it.
There was a problem hiding this comment.
Yes, for example, ScalarArgument, DescriptorArgument, TableArgument all have builders. So for consistency we have one here.
| .map(MockConnectorColumnHandle::getType) | ||
| .collect(toImmutableList()); | ||
| Map<String, Integer> columnIndexes = getColumnIndexes(tableName); | ||
| /* |
There was a problem hiding this comment.
Can you remove this code if not needed ?
| @Override | ||
| public TableFunctionAnalysis analyze(ConnectorSession session, ConnectorTransactionHandle transaction, Map<String, Argument> arguments) | ||
| { | ||
| ScalarArgument argument = (ScalarArgument) arguments.get("COLUMN"); |
There was a problem hiding this comment.
What do you do with the IGNORED column ? Can we remove it if not required ?
There was a problem hiding this comment.
We do nothing with the column. It is an optional column used in the test and will be ingored during table function analysis.
| public PlanNode visitTableFunction(TableFunctionNode node, RewriteContext<Void> context) | ||
| { | ||
| // TODO rewrite sources, and tableArgumentProperties when we add support for input tables | ||
| /* |
There was a problem hiding this comment.
Remove this comment if not needed.
| return processChildren(node, context); | ||
| } | ||
|
|
||
| /* |
jaystarshot
left a comment
There was a problem hiding this comment.
Please squash commits into readable parts, could be
- Parser changes
- Analysis + planner changes
- SPI (table function registry etc)
these can also be separate PRs
|
@jaystarshot Thanks for the review. I've reorganized the PR into 5 commits.
Could you please take a look? Thanks. |
9d09628 to
f76b692
Compare
|
@aditi-pandit Addressed the comments. Please take a second look when you have a chance. The commit addressing comments will be merged into their respective commits once complete. Thanks! |
aditi-pandit
left a comment
There was a problem hiding this comment.
@mohsaka, @xin-zhang2 : Still looking through all the files. But had 2 comments.
Please also merge the last commit into their respective prior commits.
| public final class CatalogSchemaFunctionName | ||
| { | ||
| private final String catalogName; | ||
| private final SchemaFunctionName schemaFunctionName; |
There was a problem hiding this comment.
Why do we need such a special class in the SPI ? CatalogSchemaFunctionName can have all 3 as individual elements.
|
|
||
| public static class Field | ||
| { | ||
| private final Optional<String> name; |
There was a problem hiding this comment.
Makes sense. Can you leave a comment about this in the class ?
|
@aditi-pandit It's mainly because we use SchemaFunctionName in the TableFunctionRegistry. So I guess 2 reasons.
Will add comment for descriptor. |
Changes adapted from trino/PR#11336, 12910 Original commit: 2e00c8e64c32d6fdd813999b2e04b3b3415235c8 f0508a7ab420449c6e2960ecf1d0a8d7058242da Author: kasiafi Modifications were made to adapt to Presto including: Addition of KEEP in the parser. Adjustment of the TestSqlParser.java to apply to Presto concepts. Switch from Trino's DataType based datatypes to Presto's String based datatypes. Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#11336, 12350, 12407, 12476, 12531, 12813 Original commits: 395fd91c6480a993241eeabd9599873d0d05b24b 998315075343beecef962657b8cbf440d53cc13b 712b8e98ff8a726f95295ac539159fc532628273 131dc44af97b31a2fa8115028d98d06671641bfa 0da095e14b0855f89af3c4f254a5a60280fc7170 5310671f80291394b12ba2ea746e4e60051aaff4 18bb60262cb0850cf839c2b20b434344921f5122 4a7d72afb64f93a9748a4c6b4defc2d42bbae000 Author: kasiafi Modifications were made to adapt to Presto including: Removal of ConnectorExpression. Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#11336, 12910 Original commits: 9e8d51ad45f57267f5f7fa6bf8e8c4ec56103dda f0508a7ab420449c6e2960ecf1d0a8d7058242da Author: kasiafi Modifications were made to adapt to Presto including: Removal of Node Location from TrinoException Added new SemanticErrorCodes Changed Void context to SqlPlannerContext in RelationPlanner.java Add newUnqualified to Field class with Presto specification Add getCanonicalValue to Identifier.java Co-authored-by: Pratik Joseph Dabre <pdabre12@gmail.com> Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#11336 Original commit: d4c73389bbdb6b48c24a0969b259286b05a99ade 565700985baff0c4b29fdb1e3e26139a29318b9e ec8b9fd2b2cc9c8bc78c0ca1317dc34fcf2c48c7 Author: kasiafi Modifications were made to adapt to Presto including: Change CatalogName to ConnectorId Change Symbol to VariableReferenceExpression TableFunctionNode extends InternalPlanNode instead of PlanNode. Add applyTableFunction to all implementations of Metadata Add empty ConnectorTableLayoutHandle to TableHandle in MetadataManger::applyTableFunction Removal of PlannerContext and replaced with Metadata Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#12951, 14175 Original commits: 98fc1ee8b29fca86f2a1b3abe4989524940333a6 1aea489884346822c812b1a242acc286e3e1248e 8bd17171a8469b9351e2fd7d9f2f49f4af9ea209 Author: kasiafi Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com> Co-authored-by: Pratik Joseph Dabre <pdabre12@gmail.com> Co-authored-by: Xin Zhang <desertsxin@gmail.com>
| function.getName().toLowerCase(ENGLISH)), | ||
| new TableFunctionMetadata(catalogName, function)); | ||
| } | ||
| checkState(tableFunctions.putIfAbsent(catalogName, builder.buildOrThrow()) == null, "Table functions already registered for catalog: " + catalogName); |
There was a problem hiding this comment.
Would it make sense to move this check before we validate all the table functions.
Something simple like,
if (tableFunctions.containsKey(catalogName) {
error out saying ("Table functions already registered for catalog:");
}
There was a problem hiding this comment.
Addressed in new PR
| } | ||
| } | ||
|
|
||
| return null; |
There was a problem hiding this comment.
Not a huge fan of returning null, is it possible to change the method to return an Optional instead?
| INVALID_ROW_FILTER(0x0000_0034, USER_ERROR), | ||
| INVALID_COLUMN_MASK(0x0000_0035, USER_ERROR), | ||
| DATATYPE_MISMATCH(0x0000_0036, USER_ERROR), | ||
| MISSING_CATALOG_NAME(0x0000_0037, USER_ERROR), |
There was a problem hiding this comment.
This error code is too vague, i think the name needs to be tied to the table function registry for clarity
There was a problem hiding this comment.
Updated to SESSION_CATALOG_NOT_SET
| { | ||
| Analysis.TableFunctionInvocationAnalysis functionAnalysis = analysis.getTableFunctionAnalysis(node); | ||
|
|
||
| // TODO handle input relations: |
There was a problem hiding this comment.
its not clear yo me but looks like only scalar arguments work and table within table won't so can we fail gracefully for unsupported cases?
| import static com.google.common.base.Preconditions.checkState; | ||
| import static java.util.Objects.requireNonNull; | ||
|
|
||
| public class RewriteTableFunctionToTableScan |
There was a problem hiding this comment.
Why do we need this rule and corresponding SPI calls? I think we can use Connector optimizers (https://prestodb.io/blog/2019/12/23/improve-presto-planner/) which are present in presto and not trino so we can potentially skip a lot of additions to metadata?
There was a problem hiding this comment.
I guess i get the changes here, metadata might be the better interface to override in this case
| @Override | ||
| public PlanWithProperties visitTableFunction(TableFunctionNode node, PreferredProperties preferredProperties) | ||
| { | ||
| throw new UnsupportedOperationException("execution by operator is not yet implemented for table function " + node.getName()); |
There was a problem hiding this comment.
Not sure we need this, by default it will throw unsupported
| checkArgument(!tableFunction.getName().isEmpty(), "table function name is empty"); | ||
| checkArgument(!tableFunction.getSchema().isEmpty(), "table function schema name is empty"); | ||
|
|
||
| Set<String> argumentNames = new HashSet<>(); |
There was a problem hiding this comment.
This code seems inefficient having 2 for loops doing same thing
`Set seenNames = new HashSet<>();
int tableArgumentsWithRowSemantics = 0;
for (ArgumentSpecification spec : tableFunction.getArguments()) {
// Check for duplicate names
if (!seenNames.add(spec.getName())) {
throw new IllegalArgumentException("duplicate argument name: " + spec.getName());
}
// Count table arguments with row semantics
if (spec instanceof TableArgumentSpecification &&
((TableArgumentSpecification) spec).isRowSemantics()) {
tableArgumentsWithRowSemantics++;
}
}
checkArgument(tableArgumentsWithRowSemantics <= 1, "more than one table argument with row semantics");
`
There was a problem hiding this comment.
Good catch. We could also exit early if tableArgumentsWithRowSemantics > 1 but I think we should keep it where it throws all duplicate arguments first. So followed your example.
jaystarshot
left a comment
There was a problem hiding this comment.
I think the PR gets too big to review if we had a separate PR for parser only etc we could involve a lot more folks like @kaikalur for the review/advice
Thank you for the review. Will separate it out into further PRs |
|
Closed as PR will be opened in multiple parts. |
This is the first part towards implementing Table functions in Presto. Please find the RFC at https://github.com/prestodb/rfcs/pull/39/files
In this PR, the Table function syntax is added to the Presto Parser, AST and RelationPlanner code is added. At this point, we only compile Table functions into TableScan of tables in the respective connector. There will be subsequent PRs to handle Table functions as operators (appearing mid-pipeline) and leaf operators.
Contains changes from #25032.
From commit
Support table function invocation in grammar and AST
to commit
Analyze table function arguments and cleanup
This includes all of the TVF changes required to use table functions with pushdown to connector. This PR also includes as much refactoring changes as I could from future commits.
Description
Allows table functions via pushdown to connector. Documentation will come at a later point after all of the code has been pushed in. Until then, this should be considered experimental. There is very little interaction with existing code.
Documentation from Trino can be followed as we wanted to match their existing API for portability.
https://trino.io/docs/current/functions/table.html
Motivation and Context
Impact
Test Plan
TVF test cases added in
TestTableFunctionInvocation.Parser tests cases added in
TestSqlParserandTestSqlParserErrorHandling.JDBC failure test cases added in
TestJdbcDistributedQueries.Further testing is done in the complete PR.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
If release note is NOT required, use: