Skip to content

Revert stream pattern method in V2 and implement SIMPLE_PATTERN in Calcite#3553

Merged
LantaoJin merged 13 commits intoopensearch-project:mainfrom
songkant-aws:calcite-patterns-command-revert-stream-pattern
Apr 30, 2025
Merged

Revert stream pattern method in V2 and implement SIMPLE_PATTERN in Calcite#3553
LantaoJin merged 13 commits intoopensearch-project:mainfrom
songkant-aws:calcite-patterns-command-revert-stream-pattern

Conversation

@songkant-aws
Copy link
Copy Markdown
Collaborator

@songkant-aws songkant-aws commented Apr 16, 2025

Description

Revert stream pattern method in V2 and implement SIMPLE_PATTERN in Calcite

Related Issues

Resolves #3552

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
.toString()
.toLowerCase(Locale.ROOT);
if (patternMethod.equalsIgnoreCase(PatternMethod.SIMPLE_PATTERN.getName())) {
return new Parse(ParseMethod.PATTERNS, sourceField, pattern, arguments);
Copy link
Copy Markdown
Collaborator

@qianheng-aws qianheng-aws Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we rename the Parse node and change its doc? It's weird to use a node which says /** AST node represent Parse with regex operation. */ in its doc for PatternCommand.

Looks like our definition of UnresolvedPlan is something between AST and LogicalPlan. Most are more like AST(e.g. Parse, Rename, Eval), while others are more like LogicalPlan(e.g. Window, Aggregation, Project, Filter).

Under current convention, seems we cannot figure out a proper name for it or determine which group it belong to.

Copy link
Copy Markdown
Collaborator Author

@songkant-aws songkant-aws Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not doing this refactoring in this PR to introduce minimum change. I left comment to call out why do we keep that Parse plan for legacy patterns method. And call out there is opportunity of refactoring it into just Project operator.

java.util.Map<String, Literal> arguments = node.getArguments();
String pattern = (String) node.getPattern().getValue();
// TODO: Support Grok parse method
Pair<SqlOperator, String> operatorPatternPair =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be registered as functions in PPLFuncImpTable?Seems we will have more method in future and it should be a true function in Calcite.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Registered related calcite REGEXP functions into PPLFuncImpTable

@LantaoJin LantaoJin added the calcite calcite migration releated label Apr 16, 2025
Signed-off-by: Songkan Tang <songkant@amazon.com>
Comment on lines +207 to +208
REGEXP_EXTRACT(FunctionName.of("regexp_extract")),
REGEXP_REPLACE_2(FunctionName.of("regexp_replace_2")),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these functions all existing UDF in V2?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, change them to internal functions with additional boolean to describe them.

RelNode root = getRelNode(ppl);

String expectedLogical =
"LogicalProject(ENAME=[$1], patterns_field=[REGEXP_REPLACE($1, '[A-H]')])\n"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think the '[A-H]' here should change to '[A-H]':VARCHAR if #3558 merged first Just call out for reminding.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. In visitParse, it's changed to makeLiteral(pattern, VARCHAR).

@LantaoJin LantaoJin added calcite calcite migration releated and removed calcite calcite migration releated labels Apr 18, 2025
ParseMethod parseMethod = node.getParseMethod();
java.util.Map<String, Literal> arguments = node.getArguments();
String pattern = (String) node.getPattern().getValue();
String patternValue = (String) node.getPattern().getValue();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patterns is a command, I didn't see the implementation code visitPatterns here? How PatternsIT works?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be changed in part II. Part I just reverts previous changes to V2. V2's patterns representation is Parse.

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Copy link
Copy Markdown
Collaborator

@qianheng-aws qianheng-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add tests for explain on patterns in ExplainIT as well? May need 2 tests for cases w/wo stats command after patterns.

As discussed offline, it only supports pushdown for patterns with stats command after it. It would help us verify the feature of pushing down patterns on Calcite in the future.

Signed-off-by: Songkan Tang <songkant@amazon.com>
@songkant-aws
Copy link
Copy Markdown
Collaborator Author

Could you please add tests for explain on patterns in ExplainIT as well? May need 2 tests for cases w/wo stats command after patterns.

As discussed offline, it only supports pushdown for patterns with stats command after it. It would help us verify the feature of pushing down patterns on Calcite in the future.

Sure, I added suggested two cases. But I feel that IT is a bit strange. It's better to assert on actual plan change. The pushed down script is encoded string that is not readable.

Signed-off-by: Songkan Tang <songkant@amazon.com>
@LantaoJin
Copy link
Copy Markdown
Member

Please update visitWindow to visitPatterns in Analyzer.

@songkant-aws
Copy link
Copy Markdown
Collaborator Author

Please update visitWindow to visitPatterns in Analyzer.

Done. And fixed failed IT.

Copy link
Copy Markdown
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add testPatterns in PPLQueryDataAnonymizerTest

Signed-off-by: Songkan Tang <songkant@amazon.com>
@songkant-aws
Copy link
Copy Markdown
Collaborator Author

songkant-aws commented Apr 29, 2025

Could you add testPatterns in PPLQueryDataAnonymizerTest

Added testPatterns for SIMPLE_PATTERN method. The BRAIN method will be added in another PR because it needs some signature change of unresolved plan.

@LantaoJin LantaoJin merged commit 788da98 into opensearch-project:main Apr 30, 2025
22 checks passed
@songkant-aws songkant-aws deleted the calcite-patterns-command-revert-stream-pattern branch June 6, 2025 05:48
penghuo pushed a commit that referenced this pull request Jun 16, 2025
…lcite (#3553)

* Revert simple_pattern window function change to recover pushdown ability

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Add SIMPLE_PATTERN patterns command support based on parse command

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Address minor comments

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Address comments part 2

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Make allowCast for pattern VARCHAR literal

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix spotless

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Minor ut failure fix

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Add patterns command push down cases in ExplainIT

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix patterns ExplainIT failure

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Rename visitWindow to visitPatterns

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Correct ExplainIT expected plan for patterns

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Add patterns anonymizer test for SIMPLE_PATTERN

Signed-off-by: Songkan Tang <songkant@amazon.com>

---------

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Revert stream pattern method to recover pushdown ability in V2 and implement SIMPLE_PATTERN in Calcite

3 participants