Skip to content

Add unified query parser API#5274

Merged
dai-chen merged 1 commit intoopensearch-project:mainfrom
dai-chen:feature/unified-query-parser
Mar 27, 2026
Merged

Add unified query parser API#5274
dai-chen merged 1 commit intoopensearch-project:mainfrom
dai-chen:feature/unified-query-parser

Conversation

@dai-chen
Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen commented Mar 26, 2026

Description

Extract parsing logic from UnifiedQueryPlanner into a UnifiedQueryParser<R> interface with language-specific implementations: PPLQueryParser (returns UnresolvedPlan) and CalciteSqlQueryParser (returns SqlNode).

Implementation Notes

  • Context-owned parser: Following the Spark/Flink pattern, UnifiedQueryContext creates and owns the parser, ensuring consistent configuration across all components.
  • No parse with visitor API yet: A language-neutral parse(query, visitor) was explored but deferred. PPL's custom AST and Calcite's SqlNode tree have fundamentally different node structures. Until this tension resolves, callers use native visitors on the typed parse(query) result.

Related Issues

Part of #5248

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen self-assigned this Mar 26, 2026
@dai-chen dai-chen added the enhancement New feature or request label Mar 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@dai-chen dai-chen force-pushed the feature/unified-query-parser branch from 0676f32 to 9ed26fe Compare March 26, 2026 22:29
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@dai-chen dai-chen force-pushed the feature/unified-query-parser branch from 9ed26fe to eb41b88 Compare March 26, 2026 22:45
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@dai-chen dai-chen force-pushed the feature/unified-query-parser branch from eb41b88 to 0fc6009 Compare March 26, 2026 22:48
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

penghuo
penghuo previously approved these changes Mar 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

Extract parsing logic from UnifiedQueryPlanner into a UnifiedQueryParser
interface with language-specific implementations: PPLQueryParser (returns
UnresolvedPlan) and CalciteSqlQueryParser (returns SqlNode).

UnifiedQueryContext owns the parser instance, created eagerly by the
builder which has direct access to query type and future SQL config.
Each implementation receives only its required dependencies:
PPLQueryParser takes Settings, CalciteSqlQueryParser takes
CalcitePlanContext. UnifiedQueryPlanner.CustomVisitorStrategy now obtains
the parser from the context via the interface type.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen force-pushed the feature/unified-query-parser branch from 9f2bf0c to 7110f63 Compare March 27, 2026 19:05
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@dai-chen dai-chen requested a review from penghuo March 27, 2026 20:06
@dai-chen
Copy link
Copy Markdown
Collaborator Author

@penghuo Addressed conflicts. Please approve again. Thanks!

@dai-chen dai-chen enabled auto-merge (squash) March 27, 2026 20:11
@dai-chen dai-chen merged commit 8a7524c into opensearch-project:main Mar 27, 2026
38 checks passed
ahkcs added a commit that referenced this pull request Mar 30, 2026
* Init CLAUDE.md (#5259)

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add label to exempt specific PRs from stalled labeling (#5263)

* Implement `reverse` performance optimization (#4775)

Co-authored-by: Jialiang Liang <jiallian@amazon.com>

* Add songkant-aws as maintainer (#5244)

* Move some maintainers from active to Emeritus (#5260)

* Move inactive current maintainers to Emeritus

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Remove affiliation column for emeritus maintainers

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* formatted

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix formatting in MAINTAINERS.md

Signed-off-by: Simeon Widdis <sawiddis@gmail.com>

Signed-off-by: Simeon Widdis <sawiddis@gmail.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Co-authored-by: Simeon Widdis <sawiddis@gmail.com>

* Add query cancellation support via _tasks/_cancel API for PPL queries (#5254)

* Add query cancellation support via _tasks/_cancel API for PPL queries

Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>

* Refactor PPL query cancellation to cooperative model and other PR suggestions.

Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>

---------

Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>

* Add Calcite native SQL planning in UnifiedQueryPlanner (#5257)

* feat(api): Add Calcite native SQL planning path in UnifiedQueryPlanner

Add SQL support to the unified query API using Calcite's native parser
pipeline (SqlParser → SqlValidator → SqlToRelConverter → RelNode),
bypassing the ANTLR parser used by PPL.

Changes:
- UnifiedQueryPlanner: use PlanningStrategy to dispatch
  CalciteNativeStrategy vs CustomVisitorStrategy
- CalciteNativeStrategy: Calcite Planner with try-with-resources
  for ANSI SQL
- CustomVisitorStrategy: ANTLR-based path for PPL (and future SQL V2)
- UnifiedQueryContext: SqlParser.Config with Casing.UNCHANGED to
  preserve lowercase OpenSearch index names

Signed-off-by: Chen Dai <daichen@amazon.com>

* test(api): Add SQL planner tests and refactor test base for multi-language support

- Refactor UnifiedQueryTestBase with queryType() hook for subclass override
- Add UnifiedSqlQueryPlannerTest covering SELECT, WHERE, GROUP BY, JOIN,
  ORDER BY, subquery, case sensitivity, namespaces, and error handling
- Update UnifiedQueryContextTest to verify SQL context creation

Signed-off-by: Chen Dai <daichen@amazon.com>

* perf(benchmarks): Add SQL queries to UnifiedQueryBenchmark

Add language (PPL/SQL) and queryPattern param dimensions for
side-by-side comparison of equivalent queries across both languages.
Remove separate UnifiedSqlQueryBenchmark in favor of unified class.

Signed-off-by: Chen Dai <daichen@amazon.com>

* docs(api): Update README to reflect SQL support in UnifiedQueryPlanner

Signed-off-by: Chen Dai <daichen@amazon.com>

* fix(api): Normalize trailing whitespace in assertPlan comparison

RelOptUtil.toString() appends a trailing newline after the last plan
node, which doesn't match Java text block expectations. Also add
\r\n normalization for Windows CI compatibility, consistent with
the existing pattern in core module tests.

Signed-off-by: Chen Dai <daichen@amazon.com>

---------

Signed-off-by: Chen Dai <daichen@amazon.com>

* [Feature] Support graphLookup with literal value as its start (#5253)

* [Feature] Support graphLookup as top-level PPL command (#5243)

Add support for graphLookup as the first command in a PPL query with
literal start values, instead of requiring piped input from source=.

Syntax:
  graphLookup table start="value" edge=from-->to as output
  graphLookup table start=("v1", "v2") edge=from-->to as output

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Spotless check

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Ignore child pipe if using start value

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add graphLookup integration tests per PPL command checklist

- Add explain plan tests in CalciteExplainIT with YAML assertions
- Add v2-unsupported tests in NewAddedCommandsIT
- Add CalcitePPLGraphLookupIT to CalciteNoPushdownIT suite
- Skip graphLookup tests when pushdown is disabled (required by impl)
- Add expected plan YAML files for piped and top-level graphLookup

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Remove brace of start value list

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Apply docs website feedback to ppl functions (#5207)

* apply doc website feedback to ppl functions

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* take out comments

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix json_append example

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix json_append example

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix links

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

---------

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>

* feat(api): Add profiling support to unified query API (#5268)

Add query profiling infrastructure that measures time spent in each
query phase (analyze, optimize, execute, format). Profiling is opt-in
via UnifiedQueryContext.builder().profiling(true) and uses thread-local
context to avoid passing profiling state through every method.

Key changes:
- QueryProfiling/ProfileContext for thread-local profiling lifecycle
- UnifiedQueryContext.measure() API for timing arbitrary phases
- Auto-profiling in UnifiedQueryPlanner (analyze) and compiler (optimize)
- UnifiedQueryTestBase shared test fixture for unified query tests
- Comprehensive profiling tests with non-flaky >= 0 timing assertions

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add UnifiedQueryParser with language-specific implementations (#5274)

Extract parsing logic from UnifiedQueryPlanner into a UnifiedQueryParser
interface with language-specific implementations: PPLQueryParser (returns
UnresolvedPlan) and CalciteSqlQueryParser (returns SqlNode).

UnifiedQueryContext owns the parser instance, created eagerly by the
builder which has direct access to query type and future SQL config.
Each implementation receives only its required dependencies:
PPLQueryParser takes Settings, CalciteSqlQueryParser takes
CalcitePlanContext. UnifiedQueryPlanner.CustomVisitorStrategy now obtains
the parser from the context via the interface type.

Signed-off-by: Chen Dai <daichen@amazon.com>

* Fix flaky TPC-H Q1 test due to bugs in `MatcherUtils.closeTo()` (#5283)

* Fix the flaky tpch Q1

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Change to ULP-aware to handle floating-point precision differences

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: qianheng <qianheng@amazon.com>
Co-authored-by: Simeon Widdis <sawiddis@gmail.com>
Co-authored-by: Jialiang Liang <jiallian@amazon.com>
Co-authored-by: Lantao Jin <ltjin@amazon.com>
Co-authored-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>
Co-authored-by: Chen Dai <daichen@amazon.com>
Co-authored-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
ahkcs added a commit to ahkcs/sql that referenced this pull request Mar 30, 2026
…rch-project#5247)

Task 1: Enable profiling (opensearch-project#5268)
- Add .profiling(pplRequest.profile()) to UnifiedQueryContext.builder()
  in both doExecute and doExplain

Task 2: Migrate to UnifiedQueryParser for index extraction (opensearch-project#5274)
- Replace StubIndexDetector regex with PPLQueryParser AST-based
  extraction: parse query, walk AST to find Relation node, extract
  table name via getTableQualifiedName()
- Delete StubIndexDetector
- isAnalyticsIndex() is now an instance method (needs PPLQueryParser)
- Constructor takes Settings for PPLQueryParser

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
ahkcs added a commit that referenced this pull request Mar 31, 2026
* [Mustang] Enable profiling and migrate to UnifiedQueryParser (#5247)

Task 1: Enable profiling (#5268)
- Add .profiling(pplRequest.profile()) to UnifiedQueryContext.builder()
  in both doExecute and doExplain

Task 2: Migrate to UnifiedQueryParser for index extraction (#5274)
- Replace StubIndexDetector regex with PPLQueryParser AST-based
  extraction: parse query, walk AST to find Relation node, extract
  table name via getTableQualifiedName()
- Delete StubIndexDetector
- isAnalyticsIndex() is now an instance method (needs PPLQueryParser)
- Constructor takes Settings for PPLQueryParser

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Switch to SimpleJsonResponseFormatter for profiling support

Switch from JdbcResponseFormatter to SimpleJsonResponseFormatter so
profiling data is included in the response when profile=true. The
SimpleJsonResponseFormatter calls QueryProfiling.current().finish()
to populate the profile field.

Update test assertions to match SimpleJsonResponseFormatter type names
(PPL_SPEC: INTEGER -> "int", STRING -> "string") and remove status
field check (not included by SimpleJsonResponseFormatter).

Add integration test verifying profile field appears in response.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Use context parser for index extraction instead of standalone PPLQueryParser

Create UnifiedQueryContext upfront in isAnalyticsIndex() and use
context.getParser() for index name extraction. This reuses the
context-owned parser which supports both PPL and SQL, making it ready
for unified SQL support without code changes.

Remove standalone PPLQueryParser field and Settings constructor param.
isAnalyticsIndex() now takes QueryType to create the right context.
extractIndexName() handles UnresolvedPlan (PPL) with a TODO for
SqlNode (SQL) when unified SQL is enabled.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Use AST visitor for index name extraction

Replace manual tree walking (findRelation) with IndexNameExtractor
visitor extending AbstractNodeVisitor. The visitor's visitRelation()
is called automatically by the AST accept/visitChildren pattern,
which handles tree traversal.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Wrap execute and explain with context.measure() for profiling

Wrap analyticsEngine.execute() and analyticsEngine.explain() calls
with context.measure(MetricName.EXECUTE, ...) so execution time is
captured in the profiling metrics. Planning is auto-profiled by
UnifiedQueryPlanner.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Fix EXECUTE profiling metric by recording inside AnalyticsExecutionEngine

Move EXECUTE metric recording into AnalyticsExecutionEngine.execute(),
between the actual execution (planExecutor + row conversion) and the
listener.onResponse() call. This ensures the metric is written before
SimpleJsonResponseFormatter calls QueryProfiling.finish() to snapshot.

Previously context.measure() was used in RestUnifiedQueryAction, but
finish() was called inside the listener callback (synchronously) before
measure()'s finally block could write the metric, resulting in 0ms.

Add IT assertion that execute phase time_ms > 0 to catch this bug.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants