Revert stream pattern method in V2 and implement SIMPLE_PATTERN in Calcite#3553
Conversation
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
| .toString() | ||
| .toLowerCase(Locale.ROOT); | ||
| if (patternMethod.equalsIgnoreCase(PatternMethod.SIMPLE_PATTERN.getName())) { | ||
| return new Parse(ParseMethod.PATTERNS, sourceField, pattern, arguments); |
There was a problem hiding this comment.
Shall we rename the Parse node and change its doc? It's weird to use a node which says /** AST node represent Parse with regex operation. */ in its doc for PatternCommand.
Looks like our definition of UnresolvedPlan is something between AST and LogicalPlan. Most are more like AST(e.g. Parse, Rename, Eval), while others are more like LogicalPlan(e.g. Window, Aggregation, Project, Filter).
Under current convention, seems we cannot figure out a proper name for it or determine which group it belong to.
There was a problem hiding this comment.
I would prefer not doing this refactoring in this PR to introduce minimum change. I left comment to call out why do we keep that Parse plan for legacy patterns method. And call out there is opportunity of refactoring it into just Project operator.
| java.util.Map<String, Literal> arguments = node.getArguments(); | ||
| String pattern = (String) node.getPattern().getValue(); | ||
| // TODO: Support Grok parse method | ||
| Pair<SqlOperator, String> operatorPatternPair = |
There was a problem hiding this comment.
Could this be registered as functions in PPLFuncImpTable?Seems we will have more method in future and it should be a true function in Calcite.
There was a problem hiding this comment.
Done. Registered related calcite REGEXP functions into PPLFuncImpTable
Signed-off-by: Songkan Tang <songkant@amazon.com>
| REGEXP_EXTRACT(FunctionName.of("regexp_extract")), | ||
| REGEXP_REPLACE_2(FunctionName.of("regexp_replace_2")), |
There was a problem hiding this comment.
Are these functions all existing UDF in V2?
There was a problem hiding this comment.
As discussed, change them to internal functions with additional boolean to describe them.
| RelNode root = getRelNode(ppl); | ||
|
|
||
| String expectedLogical = | ||
| "LogicalProject(ENAME=[$1], patterns_field=[REGEXP_REPLACE($1, '[A-H]')])\n" |
There was a problem hiding this comment.
I do think the '[A-H]' here should change to '[A-H]':VARCHAR if #3558 merged first Just call out for reminding.
There was a problem hiding this comment.
Done. In visitParse, it's changed to makeLiteral(pattern, VARCHAR).
| ParseMethod parseMethod = node.getParseMethod(); | ||
| java.util.Map<String, Literal> arguments = node.getArguments(); | ||
| String pattern = (String) node.getPattern().getValue(); | ||
| String patternValue = (String) node.getPattern().getValue(); |
There was a problem hiding this comment.
patterns is a command, I didn't see the implementation code visitPatterns here? How PatternsIT works?
There was a problem hiding this comment.
It will be changed in part II. Part I just reverts previous changes to V2. V2's patterns representation is Parse.
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
There was a problem hiding this comment.
Could you please add tests for explain on patterns in ExplainIT as well? May need 2 tests for cases w/wo stats command after patterns.
As discussed offline, it only supports pushdown for patterns with stats command after it. It would help us verify the feature of pushing down patterns on Calcite in the future.
Signed-off-by: Songkan Tang <songkant@amazon.com>
Sure, I added suggested two cases. But I feel that IT is a bit strange. It's better to assert on actual plan change. The pushed down script is encoded string that is not readable. |
Signed-off-by: Songkan Tang <songkant@amazon.com>
|
Please update |
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Done. And fixed failed IT. |
LantaoJin
left a comment
There was a problem hiding this comment.
Could you add testPatterns in PPLQueryDataAnonymizerTest
Signed-off-by: Songkan Tang <songkant@amazon.com>
Added |
…lcite (#3553) * Revert simple_pattern window function change to recover pushdown ability Signed-off-by: Songkan Tang <songkant@amazon.com> * Add SIMPLE_PATTERN patterns command support based on parse command Signed-off-by: Songkan Tang <songkant@amazon.com> * Address minor comments Signed-off-by: Songkan Tang <songkant@amazon.com> * Address comments part 2 Signed-off-by: Songkan Tang <songkant@amazon.com> * Make allowCast for pattern VARCHAR literal Signed-off-by: Songkan Tang <songkant@amazon.com> * Fix spotless Signed-off-by: Songkan Tang <songkant@amazon.com> * Minor ut failure fix Signed-off-by: Songkan Tang <songkant@amazon.com> * Add patterns command push down cases in ExplainIT Signed-off-by: Songkan Tang <songkant@amazon.com> * Fix patterns ExplainIT failure Signed-off-by: Songkan Tang <songkant@amazon.com> * Rename visitWindow to visitPatterns Signed-off-by: Songkan Tang <songkant@amazon.com> * Correct ExplainIT expected plan for patterns Signed-off-by: Songkan Tang <songkant@amazon.com> * Add patterns anonymizer test for SIMPLE_PATTERN Signed-off-by: Songkan Tang <songkant@amazon.com> --------- Signed-off-by: Songkan Tang <songkant@amazon.com> Signed-off-by: xinyual <xinyual@amazon.com>
Description
Revert stream pattern method in V2 and implement SIMPLE_PATTERN in Calcite
Related Issues
Resolves #3552
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.