Skip to content

Enable profiling and migrate to UnifiedQueryParser#5285

Merged
ahkcs merged 6 commits intoopensearch-project:feature/mustang-ppl-integrationfrom
ahkcs:pr/mustang-profiling-and-parser-migration
Mar 31, 2026
Merged

Enable profiling and migrate to UnifiedQueryParser#5285
ahkcs merged 6 commits intoopensearch-project:feature/mustang-ppl-integrationfrom
ahkcs:pr/mustang-profiling-and-parser-migration

Conversation

@ahkcs
Copy link
Copy Markdown
Collaborator

@ahkcs ahkcs commented Mar 30, 2026

Summary

Implements two tasks from #5247 comment:

Task 1: Enable profiling (#5268)

Add .profiling(pplRequest.profile()) to UnifiedQueryContext.builder() in both doExecute and doExplain. When profile=true is requested, profiling metrics are collected during query planning and execution via the unified query API's auto-profiling support.

Task 2: Migrate to UnifiedQueryParser (#5274)

Replace StubIndexDetector (regex-based index extraction) with PPLQueryParser (AST-based). Parse the query to get the UnresolvedPlan, walk the AST tree to find the Relation node, and extract the table name via getTableQualifiedName(). This is more robust than regex and uses the same parser infrastructure as the rest of the pipeline.

Changed Files

File Change
plugin/.../rest/RestUnifiedQueryAction.java Add profiling, replace regex with PPLQueryParser for index extraction, remove manual latency logging
plugin/.../transport/TransportPPLQueryAction.java Pass Settings to RestUnifiedQueryAction, use instance method
plugin/.../rest/analytics/stub/StubIndexDetector.java Deleted -- replaced by PPLQueryParser
plugin/.../rest/RestUnifiedQueryActionTest.java Updated for instance method

Test plan

  • 2 unit tests -- routing detection (parquet vs non-parquet, null/empty)
  • 9 integration tests -- AnalyticsPPLIT (full pipeline)
  • 6 integration tests -- AnalyticsExplainIT (explain plans)

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 1e683b6.

'Diff too large, requires skip by maintainers after manual review'


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

…rch-project#5247)

Task 1: Enable profiling (opensearch-project#5268)
- Add .profiling(pplRequest.profile()) to UnifiedQueryContext.builder()
  in both doExecute and doExplain

Task 2: Migrate to UnifiedQueryParser for index extraction (opensearch-project#5274)
- Replace StubIndexDetector regex with PPLQueryParser AST-based
  extraction: parse query, walk AST to find Relation node, extract
  table name via getTableQualifiedName()
- Delete StubIndexDetector
- isAnalyticsIndex() is now an instance method (needs PPLQueryParser)
- Constructor takes Settings for PPLQueryParser

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the pr/mustang-profiling-and-parser-migration branch from 1e683b6 to 5addc89 Compare March 30, 2026 18:45
@ahkcs ahkcs changed the title [Mustang] Enable profiling and migrate to UnifiedQueryParser Enable profiling and migrate to UnifiedQueryParser Mar 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@ahkcs ahkcs added PPL Piped processing language enhancement New feature or request labels Mar 30, 2026
@ahkcs ahkcs marked this pull request as ready for review March 30, 2026 20:43
Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the profiling, is the result automatically populated into REST response? Is it possible to add integration test or verify manually?


analyticsEngine.execute(
plan, planContext, createQueryListener(queryType, planTime, listener));
analyticsEngine.execute(plan, planContext, createQueryListener(queryType, listener));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap by context.measure?

Copy link
Copy Markdown
Collaborator Author

@ahkcs ahkcs Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Wrapped both analyticsEngine.execute() and analyticsEngine.explain() with context.measure(MetricName.EXECUTE, ...). Planning is auto-profiled by UnifiedQueryPlanner internally.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated reply: Moved EXECUTE metric recording into AnalyticsExecutionEngine.execute() instead of wrapping with context.measure(). The issue with context.measure() is that SimpleJsonResponseFormatter calls QueryProfiling.finish() inside the listener callback (synchronously), which snapshots all metrics before context.measure()'s finally block writes the EXECUTE metric — resulting in 0ms.

The fix records the metric between the actual execution (planExecutor + row conversion) and the listener.onResponse() call, so it's captured before finish() snapshots. Added IT that asserts execute phase time_ms > 0 to catch this

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I feel it shouldn't do that. For now, is format metric missing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FORMAT metric is not missing — it's recorded by SimpleJsonResponseFormatter.buildJsonObject() itself, before calling finish(). The IT output confirms all phases:
analyze: 13.69ms, execute: 126.32ms, format: 1.29ms, optimize: 0ms

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahkcs ^^

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimpleJsonResponseFormatter calls finish() inside buildJsonObject() to snapshot metrics. Any metric written after that (like context.measure()'s finally block) is missed — that's why we record EXECUTE manually before the listener fires.

The main PPL path uses the same SimpleJsonResponseFormatter with the same finish() pattern, so this isn't specific to our analytics path.

To fully adopt context.measure(), I think the formatter would need to stop calling finish() internally. Should we make this a follow-up PR?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

PR Reviewer Guide 🔍

(Review updated until commit 7347fef)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Enable profiling support in AnalyticsExecutionEngine

Relevant files:

  • core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java
  • integ-test/src/test/java/org/opensearch/sql/ppl/AnalyticsPPLIT.java

Sub-PR theme: Migrate index routing from StubIndexDetector to UnifiedQueryParser

Relevant files:

  • plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java
  • plugin/src/main/java/org/opensearch/sql/plugin/transport/TransportPPLQueryAction.java
  • plugin/src/main/java/org/opensearch/sql/plugin/rest/analytics/stub/StubIndexDetector.java
  • plugin/src/test/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryActionTest.java

⚡ Recommended focus areas for review

Resource Leak

In isAnalyticsIndex, a UnifiedQueryContext is created via buildContext and used in a try-with-resources block. However, the comment in the Javadoc notes that creating a separate context for routing is expensive and involves a Calcite JDBC connection. If this method is called frequently (e.g., on every request), it could cause performance issues or connection pool exhaustion. The TODO mentions moving this to the sql-worker thread, but there is no tracking issue or follow-up mechanism.

public boolean isAnalyticsIndex(String query, QueryType queryType) {
  if (query == null || query.isEmpty()) {
    return false;
  }
  try (UnifiedQueryContext context = buildContext(queryType, false)) {
    String indexName = extractIndexName(query, context);
    if (indexName == null) {
      return false;
    }
    int lastDot = indexName.lastIndexOf('.');
    String tableName = lastDot >= 0 ? indexName.substring(lastDot + 1) : indexName;
    return tableName.startsWith("parquet_");
  } catch (Exception e) {
    return false;
  }
}
Incomplete SQL Support

The extractIndexName method only handles UnresolvedPlan (PPL AST) and has a TODO for SQL SqlNode table extraction. If a SQL query is routed through isAnalyticsIndex, it will always return null and thus false, potentially causing incorrect routing for SQL queries targeting analytics indices.

private static String extractIndexName(String query, UnifiedQueryContext context) {
  Object parseResult = context.getParser().parse(query);
  if (parseResult instanceof UnresolvedPlan unresolvedPlan) {
    return unresolvedPlan.accept(new IndexNameExtractor(), null);
  }
  // TODO: handle SQL SqlNode for table extraction when unified SQL is enabled
  return null;
}
AST Visitor Incomplete

IndexNameExtractor only overrides visitRelation. If the PPL query has a more complex AST (e.g., nested plans, subqueries, or joins), the visitor may not traverse child nodes and could return null even for valid queries with a Relation node deeper in the tree. The AbstractNodeVisitor default behavior should be verified to ensure it recurses into children.

private static class IndexNameExtractor extends AbstractNodeVisitor<String, Void> {
  @Override
  public String visitRelation(Relation node, Void context) {
    return node.getTableQualifiedName().toString();
  }
}
Profiling Coupling

The comment explains that execMetric.set(...) must be called before listener.onResponse(...) because the listener triggers QueryProfiling.finish(). This creates a fragile ordering dependency. If the listener implementation changes or profiling is refactored, this ordering constraint may be silently violated, leading to missing or incorrect profiling data.

// Record EXECUTE metric before calling listener, because the listener's onResponse
// triggers SimpleJsonResponseFormatter which calls QueryProfiling.finish() to snapshot
// all metrics. The metric must be written before that snapshot.
ProfileMetric execMetric = QueryProfiling.current().getOrCreateMetric(MetricName.EXECUTE);
long execStart = System.nanoTime();

Iterable<Object[]> rows = planExecutor.execute(plan, null);

List<RelDataTypeField> fields = plan.getRowType().getFieldList();
List<ExprValue> results = convertRows(rows, fields);
Schema schema = buildSchema(fields);

execMetric.set(System.nanoTime() - execStart);

listener.onResponse(new QueryResponse(schema, results, Cursor.None));

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

PR Code Suggestions ✨

Latest suggestions up to 7347fef

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Visitor must recurse into children to find nested Relation nodes

The IndexNameExtractor visitor only overrides visitRelation but does not override a
default/visit-children method. For compound PPL plans (e.g., source = parquet_logs |
fields ts | where ...), the Relation node may be nested inside a chain of unary
nodes. If the base AbstractNodeVisitor does not automatically recurse into children,
the visitor will return null for any query with pipe commands, causing
isAnalyticsIndex to always return false for such queries. Ensure the visitor
traverses child nodes to find the Relation.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [178-183]

 private static class IndexNameExtractor extends AbstractNodeVisitor<String, Void> {
   @Override
   public String visitRelation(Relation node, Void context) {
     return node.getTableQualifiedName().toString();
   }
+
+  @Override
+  public String visitChildren(UnresolvedPlan node, Void context) {
+    for (UnresolvedPlan child : node.getChild()) {
+      String result = child.accept(this, context);
+      if (result != null) {
+        return result;
+      }
+    }
+    return null;
+  }
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid concern: if AbstractNodeVisitor doesn't recurse by default, the IndexNameExtractor would fail to find the Relation node in queries with pipe commands (e.g., source = parquet_logs | fields ts), causing isAnalyticsIndex to always return false for such queries. The integration tests in AnalyticsPPLIT use piped queries, so this would be a real bug if the base visitor doesn't recurse.

Medium
Ensure profiling metric is always recorded on failure

If convertRows or buildSchema throws an exception, execMetric.set(...) is never
called, leaving the metric unset before listener.onFailure is invoked. This could
cause inconsistent profiling state. The metric should be set in a finally block to
ensure it is always recorded, even on failure.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [76-87]

 ProfileMetric execMetric = QueryProfiling.current().getOrCreateMetric(MetricName.EXECUTE);
 long execStart = System.nanoTime();
+try {
+  Iterable<Object[]> rows = planExecutor.execute(plan, null);
 
-Iterable<Object[]> rows = planExecutor.execute(plan, null);
+  List<RelDataTypeField> fields = plan.getRowType().getFieldList();
+  List<ExprValue> results = convertRows(rows, fields);
+  Schema schema = buildSchema(fields);
 
-List<RelDataTypeField> fields = plan.getRowType().getFieldList();
-List<ExprValue> results = convertRows(rows, fields);
-Schema schema = buildSchema(fields);
+  listener.onResponse(new QueryResponse(schema, results, Cursor.None));
+} finally {
+  execMetric.set(System.nanoTime() - execStart);
+}
 
-execMetric.set(System.nanoTime() - execStart);
-
-listener.onResponse(new QueryResponse(schema, results, Cursor.None));
-
Suggestion importance[1-10]: 6

__

Why: If convertRows or buildSchema throws, execMetric.set(...) is skipped and the outer catch (Exception e) calls listener.onFailure(e) without recording the metric. Using a finally block ensures the profiling metric is always captured, which is a legitimate correctness concern for the profiling feature.

Low
General
Log routing exceptions instead of silently swallowing them

The isAnalyticsIndex method creates a full UnifiedQueryContext (which includes a
Calcite JDBC connection) on every routing check, even for queries that will
ultimately be routed to the Lucene engine. This is wasteful and the Javadoc itself
acknowledges it may become expensive. Since the routing check is a hot path called
for every query, silently swallowing all exceptions via catch (Exception e) { return
false; } also hides potential parsing errors that could cause incorrect routing.
Consider at minimum logging the exception at debug level so routing failures are
diagnosable.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [69-84]

 public boolean isAnalyticsIndex(String query, QueryType queryType) {
   if (query == null || query.isEmpty()) {
     return false;
   }
   try (UnifiedQueryContext context = buildContext(queryType, false)) {
     String indexName = extractIndexName(query, context);
-    ...
+    if (indexName == null) {
+      return false;
+    }
+    int lastDot = indexName.lastIndexOf('.');
+    String tableName = lastDot >= 0 ? indexName.substring(lastDot + 1) : indexName;
+    return tableName.startsWith("parquet_");
   } catch (Exception e) {
+    LOG.debug("[unified] Failed to extract index name for routing, defaulting to Lucene. Query: {}", query, e);
     return false;
   }
 }
Suggestion importance[1-10]: 4

__

Why: Adding debug logging for swallowed exceptions is a reasonable improvement for diagnosability, but the core concern about performance (creating a UnifiedQueryContext per routing check) is acknowledged in the Javadoc as a known limitation. The logging improvement is minor but valid.

Low

Previous suggestions

Suggestions up to commit 1e4fba8
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix visitor to traverse nested AST nodes

The IndexNameExtractor visitor only overrides visitRelation but does not override
the default visit method for child traversal. If the Relation node is not the root
of the AST (e.g., it is nested under a Filter or Project node), the visitor will
never reach it and extractIndexName will return null, causing all non-trivial
queries to be misrouted to the Lucene engine.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [173-178]

 private static class IndexNameExtractor extends AbstractNodeVisitor<String, Void> {
   @Override
   public String visitRelation(Relation node, Void context) {
     return node.getTableQualifiedName().toString();
   }
+
+  @Override
+  public String visitChildren(org.opensearch.sql.ast.Node node, Void context) {
+    for (org.opensearch.sql.ast.Node child : node.getChild()) {
+      String result = child.accept(this, context);
+      if (result != null) {
+        return result;
+      }
+    }
+    return null;
+  }
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid concern — if Relation is not the root AST node (e.g., in queries with | fields or other pipes), the visitor won't find it without child traversal. However, in PPL the Relation is typically the leaf/source node and traversal may work differently depending on AbstractNodeVisitor's default behavior, so the actual impact depends on the AST structure.

Medium
General
Ensure profiling metric is recorded on exception

If convertRows or buildSchema throws an exception, execMetric.set(...) is never
called, leaving the metric unset before QueryProfiling.finish() is triggered. Use a
try-finally block to ensure the metric is always recorded, even on failure.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [76-85]

 ProfileMetric execMetric = QueryProfiling.current().getOrCreateMetric(MetricName.EXECUTE);
 long execStart = System.nanoTime();
+try {
+  Iterable<Object[]> rows = planExecutor.execute(plan, null);
 
-Iterable<Object[]> rows = planExecutor.execute(plan, null);
+  List<RelDataTypeField> fields = plan.getRowType().getFieldList();
+  List<ExprValue> results = convertRows(rows, fields);
+  Schema schema = buildSchema(fields);
 
-List<RelDataTypeField> fields = plan.getRowType().getFieldList();
-List<ExprValue> results = convertRows(rows, fields);
-Schema schema = buildSchema(fields);
+  execMetric.set(System.nanoTime() - execStart);
 
-execMetric.set(System.nanoTime() - execStart);
+  listener.onResponse(new QueryResponse(schema, results, Cursor.None));
+} catch (Exception e) {
+  execMetric.set(System.nanoTime() - execStart);
+  throw e;
+}
Suggestion importance[1-10]: 6

__

Why: This is a valid correctness concern — if convertRows or buildSchema throws, the metric is never set. The try-finally pattern ensures the metric is always recorded. However, the outer catch (Exception e) block already handles failures gracefully, so the impact is limited to incomplete profiling data on error paths.

Low
Log routing parse failures for debuggability

The isAnalyticsIndex method creates a full UnifiedQueryContext (with schema,
catalog, and planner setup) just to parse the index name for routing. This is called
on every incoming query before execution, making it expensive. Consider using a
lightweight parse-only path or caching the parsed index name to avoid redundant
context creation.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [68-78]

 try (UnifiedQueryContext context = buildContext(queryType, false)) {
   String indexName = extractIndexName(query, context);
-  if (indexName == null) {
+  if (indexName == null || indexName.isEmpty()) {
     return false;
   }
   int lastDot = indexName.lastIndexOf('.');
   String tableName = lastDot >= 0 ? indexName.substring(lastDot + 1) : indexName;
   return tableName.startsWith("parquet_");
 } catch (Exception e) {
+  LOG.debug("Failed to extract index name for routing from query, defaulting to non-analytics", e);
   return false;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion adds a debug log on parse failure and a null-check on indexName.isEmpty(), which is a minor improvement. The isEmpty() check is redundant since an empty string wouldn't start with "parquet_" anyway, and the debug log is a minor observability improvement.

Low
Suggestions up to commit ba3efd7
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix visitor to traverse nested AST nodes

The IndexNameExtractor visitor only overrides visitRelation but does not override
the default visit method for child traversal. If the Relation node is not the root
of the AST (e.g., it's nested under a Project or Filter node), the visitor will
never reach it and extractIndexName will return null, causing isAnalyticsIndex to
incorrectly return false. Add a default implementation that visits children.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [173-178]

 private static class IndexNameExtractor extends AbstractNodeVisitor<String, Void> {
   @Override
   public String visitRelation(Relation node, Void context) {
     return node.getTableQualifiedName().toString();
   }
+
+  @Override
+  public String visitChildren(org.opensearch.sql.ast.Node node, Void context) {
+    for (org.opensearch.sql.ast.Node child : node.getChild()) {
+      String result = child.accept(this, context);
+      if (result != null) {
+        return result;
+      }
+    }
+    return null;
+  }
 }
Suggestion importance[1-10]: 7

__

Why: This is a legitimate concern - if the Relation node is nested (e.g., under Project from | fields), the visitor won't find it without child traversal. The fix is logically sound and addresses a real potential bug where isAnalyticsIndex could return false for queries with pipe commands.

Medium
General
Ensure profiling metric is always recorded on failure

If convertRows or buildSchema throws an exception, execMetric.set(...) will never be
called, leaving the metric unset before the profiling snapshot. The timing should be
recorded in a finally block to ensure it is always captured, even on failure.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [76-85]

 ProfileMetric execMetric = QueryProfiling.current().getOrCreateMetric(MetricName.EXECUTE);
 long execStart = System.nanoTime();
+try {
+  Iterable<Object[]> rows = planExecutor.execute(plan, null);
 
-Iterable<Object[]> rows = planExecutor.execute(plan, null);
+  List<RelDataTypeField> fields = plan.getRowType().getFieldList();
+  List<ExprValue> results = convertRows(rows, fields);
+  Schema schema = buildSchema(fields);
 
-List<RelDataTypeField> fields = plan.getRowType().getFieldList();
-List<ExprValue> results = convertRows(rows, fields);
-Schema schema = buildSchema(fields);
+  execMetric.set(System.nanoTime() - execStart);
 
-execMetric.set(System.nanoTime() - execStart);
+  listener.onResponse(new QueryResponse(schema, results, Cursor.None));
+} catch (Exception e) {
+  execMetric.set(System.nanoTime() - execStart);
+  throw e;
+}
Suggestion importance[1-10]: 6

__

Why: Valid correctness concern - if convertRows or buildSchema throws, the metric is never set. The finally block approach (or try-catch as shown) ensures the metric is captured. However, the outer catch (Exception e) in the existing code calls listener.onFailure(e) and swallows the exception, so re-throwing in the improved code changes behavior slightly.

Low
Avoid heavyweight context creation for routing check

The isAnalyticsIndex method creates a full UnifiedQueryContext (with schema,
catalog, and profiling setup) just to parse the query for index name extraction.
This is wasteful and may have side effects (e.g., initializing profiling state).
Consider using a lightweight parse-only context or a dedicated parser instance that
doesn't set up the full execution environment.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [68-78]

-try (UnifiedQueryContext context = buildContext(queryType, false)) {
+try (UnifiedQueryContext context = UnifiedQueryContext.builder()
+    .language(queryType)
+    .build()) {
   String indexName = extractIndexName(query, context);
   if (indexName == null) {
     return false;
   }
   int lastDot = indexName.lastIndexOf('.');
   String tableName = lastDot >= 0 ? indexName.substring(lastDot + 1) : indexName;
   return tableName.startsWith("parquet_");
 } catch (Exception e) {
   return false;
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion is valid in principle - creating a full UnifiedQueryContext with schema and catalog just for index name extraction is wasteful. However, the improved_code assumes UnifiedQueryContext.builder() can build without .catalog() and .defaultNamespace(), which may not be valid if the parser requires catalog setup. The suggestion is speculative without knowing the full API contract.

Low
Suggestions up to commit ce4f6c5
CategorySuggestion                                                                                                                                    Impact
Possible issue
Traverse nested AST nodes to find relation

The IndexNameExtractor visitor only overrides visitRelation but does not override
the default visit method. If the AST root is not a Relation (e.g., it's a Project or
Filter wrapping a Relation), the visitor will return null without traversing
children. Consider adding a default implementation that visits children to find the
nested Relation.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [187-192]

 private static class IndexNameExtractor extends AbstractNodeVisitor<String, Void> {
   @Override
   public String visitRelation(Relation node, Void context) {
     return node.getTableQualifiedName().toString();
   }
+
+  @Override
+  public String visit(Node node, Void context) {
+    for (Node child : node.getChild()) {
+      String result = child.accept(this, context);
+      if (result != null) {
+        return result;
+      }
+    }
+    return null;
+  }
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid concern — if the AST root is a Project or Filter wrapping a Relation, the current IndexNameExtractor would return null without traversing children. Adding a default child-traversal implementation would make the extractor more robust for complex queries.

Medium
Fix async profiling measurement for explain

The doExplain method wraps the analyticsEngine.explain call inside context.measure,
but explain is asynchronous and delivers its result via listener. The measure call
will complete before the explain response is received, so profiling metrics will not
capture the actual explain execution time. The profiling measurement should wrap the
full async lifecycle, not just the invocation.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [151-157]

-context.measure(
-    MetricName.EXECUTE,
-    () -> {
-      analyticsEngine.explain(finalPlan, pplRequest.mode(), planContext, listener);
-      return null;
-    });
+analyticsEngine.explain(finalPlan, pplRequest.mode(), planContext, listener);
Suggestion importance[1-10]: 6

__

Why: The concern about async profiling is valid — context.measure will complete before the async explain response is received, so timing metrics won't capture actual execution time. However, the same pattern is used in doExecute and fixing only doExplain would be inconsistent. The improved code simply removes the measurement wrapper without addressing the underlying async profiling design.

Low
General
Use safe JSON construction for request body

The query string is embedded directly into a JSON string using String.format, which
will break if the query contains special JSON characters (quotes, backslashes,
etc.). Use a JSONObject to safely construct the request body.

integ-test/src/test/java/org/opensearch/sql/ppl/AnalyticsPPLIT.java [121-126]

 Request request = new Request("POST", QUERY_API_ENDPOINT);
-request.setJsonEntity(
-    String.format(
-        Locale.ROOT,
-        "{\n  \"query\": \"%s\",\n  \"profile\": true\n}",
-        "source = opensearch.parquet_logs"));
+JSONObject body = new JSONObject();
+body.put("query", "source = opensearch.parquet_logs");
+body.put("profile", true);
+request.setJsonEntity(body.toString());
Suggestion importance[1-10]: 5

__

Why: Using JSONObject to construct the request body is safer and more maintainable than embedding strings in a format template. For the current hardcoded query string this is low risk, but the suggestion improves robustness for future changes where the query might contain special characters.

Low
Avoid heavy context creation for routing checks

The isAnalyticsIndex method creates a full UnifiedQueryContext (including schema
building) just to parse the index name for routing decisions. This is called on
every request before execution, making it expensive. Consider using a lightweight
parse-only path or caching the parsed index name to avoid redundant context
creation.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [65-80]

 public boolean isAnalyticsIndex(String query, QueryType queryType) {
   if (query == null || query.isEmpty()) {
     return false;
   }
-  try (UnifiedQueryContext context = buildContext(queryType, false)) {
+  try (UnifiedQueryContext context = UnifiedQueryContext.builder()
+      .language(queryType)
+      .build()) {
     String indexName = extractIndexName(query, context);
-    ...
+    if (indexName == null) {
+      return false;
+    }
+    int lastDot = indexName.lastIndexOf('.');
+    String tableName = lastDot >= 0 ? indexName.substring(lastDot + 1) : indexName;
+    return tableName.startsWith("parquet_");
   } catch (Exception e) {
     return false;
   }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid performance concern about creating a full UnifiedQueryContext with schema building for every routing decision. However, the improved_code uses a builder without a catalog/schema which may not be valid or may fail to parse queries that require schema context. The suggestion is speculative without knowing if UnifiedQueryContext supports schema-less construction.

Low
Suggestions up to commit 2545d34
CategorySuggestion                                                                                                                                    Impact
Possible issue
Recurse into AST children for nested relation nodes

The IndexNameExtractor visitor only overrides visitRelation but does not override
the default visit method for child traversal. If the Relation node is not the root
of the AST (e.g., it is nested under a Filter or Project node), the visitor will
never reach it and extractIndexName will return null, causing valid analytics
queries to be misrouted to Lucene. Consider overriding the default visitor method to
recurse into children.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [187-192]

 private static class IndexNameExtractor extends AbstractNodeVisitor<String, Void> {
   @Override
   public String visitRelation(Relation node, Void context) {
     return node.getTableQualifiedName().toString();
   }
+
+  @Override
+  public String visitChildren(org.opensearch.sql.ast.Node node, Void context) {
+    for (org.opensearch.sql.ast.Node child : node.getChild()) {
+      String result = child.accept(this, context);
+      if (result != null) {
+        return result;
+      }
+    }
+    return null;
+  }
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid concern — if Relation is not the root AST node (e.g., it's nested under Project or Filter), the visitor won't find it and queries would be misrouted. The improved_code correctly adds child traversal logic to handle this case.

Medium
Fix async profiling measurement timing inaccuracy

The context.measure call wraps the async analyticsEngine.execute, but the execution
completes asynchronously via the listener. The metric measurement will end
immediately after execute returns (before the query actually finishes), making the
timing data inaccurate. The profiling/timing should be captured inside the
onResponse/onFailure callbacks of the listener instead.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [127-133]

-context.measure(
-    MetricName.EXECUTE,
-    () -> {
-      analyticsEngine.execute(
-          finalPlan, planContext, createQueryListener(queryType, listener));
-      return null;
-    });
+long executeStart = System.nanoTime();
+analyticsEngine.execute(
+    finalPlan, planContext, createQueryListener(queryType, listener, executeStart));
Suggestion importance[1-10]: 6

__

Why: The concern about async timing is valid — context.measure will complete before the async execution finishes. However, the improved_code changes the method signature of createQueryListener which is not fully shown, making the suggestion incomplete. The issue itself is real but the fix requires broader changes than shown.

Low
General
Log swallowed routing exceptions for observability

The isAnalyticsIndex method silently swallows all exceptions and returns false,
which could mask real parsing errors or misroute queries to the wrong engine.
Consider at minimum logging the exception at debug level before returning false, so
routing failures are observable during troubleshooting.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [65-80]

 public boolean isAnalyticsIndex(String query, QueryType queryType) {
   if (query == null || query.isEmpty()) {
     return false;
   }
   try (UnifiedQueryContext context = buildContext(queryType, false)) {
     String indexName = extractIndexName(query, context);
-    ...
+    if (indexName == null) {
+      return false;
+    }
+    int lastDot = indexName.lastIndexOf('.');
+    String tableName = lastDot >= 0 ? indexName.substring(lastDot + 1) : indexName;
+    return tableName.startsWith("parquet_");
   } catch (Exception e) {
+    LOG.debug("[unified] Failed to detect analytics index for query routing: {}", e.getMessage(), e);
     return false;
   }
 }
Suggestion importance[1-10]: 5

__

Why: Logging swallowed exceptions at debug level is a reasonable observability improvement, but it's a minor enhancement. The improved_code accurately reflects the change and the suggestion is contextually valid.

Low
Suggestions up to commit 2cf5183
CategorySuggestion                                                                                                                                    Impact
Possible issue
Traverse child nodes to find nested Relation

The IndexNameExtractor visitor only overrides visitRelation but does not override
the default visit method. If the AST has wrapper nodes (e.g., Filter, Project) above
the Relation, the visitor will return null without traversing children. Consider
delegating to children in the default case to ensure the Relation node is found
regardless of its depth in the tree.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [173-178]

 private static class IndexNameExtractor extends AbstractNodeVisitor<String, Void> {
   @Override
   public String visitRelation(Relation node, Void context) {
     return node.getTableQualifiedName().toString();
   }
+
+  @Override
+  public String visit(Node node, Void context) {
+    for (Node child : node.getChild()) {
+      String result = child.accept(this, context);
+      if (result != null) {
+        return result;
+      }
+    }
+    return null;
+  }
 }
Suggestion importance[1-10]: 6

__

Why: This is a valid concern — if the AST has wrapper nodes above Relation (e.g., from | fields projection), the visitor won't traverse children and will return null, causing routing to fail. However, the fix depends on AbstractNodeVisitor's API which may differ from what's shown.

Low
General
Avoid heavy context creation for routing check

The isAnalyticsIndex method creates a full UnifiedQueryContext (including schema,
catalog, and profiling setup) just to extract an index name for routing. This is
wasteful and may have side effects. Consider creating a lightweight context or a
dedicated parsing utility that only parses the query AST without building the full
execution context.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [64-79]

 public boolean isAnalyticsIndex(String query, QueryType queryType) {
   if (query == null || query.isEmpty()) {
     return false;
   }
-  try (UnifiedQueryContext context = buildContext(queryType, false)) {
+  try (UnifiedQueryContext context = UnifiedQueryContext.builder()
+      .language(queryType)
+      .build()) {
     String indexName = extractIndexName(query, context);
-    ...
+    if (indexName == null) {
+      return false;
+    }
+    int lastDot = indexName.lastIndexOf('.');
+    String tableName = lastDot >= 0 ? indexName.substring(lastDot + 1) : indexName;
+    return tableName.startsWith("parquet_");
   } catch (Exception e) {
     return false;
   }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion has merit in principle (avoiding unnecessary schema/catalog setup for routing), but the improved_code may not be valid since UnifiedQueryContext.builder() without .catalog() and .defaultNamespace() might fail or produce an incomplete context. The suggestion is speculative without knowing the full API contract.

Low
Validate profiling data content, not just presence

The test only checks for the presence of the profile key but does not validate that
the profiling data is non-null or has meaningful content. A malformed or empty
profile object would still pass this test. Consider also asserting that the profile
value is a non-empty JSONObject.

integ-test/src/test/java/org/opensearch/sql/ppl/AnalyticsPPLIT.java [131-134]

 Response response = client().performRequest(request);
 JSONObject result = new JSONObject(getResponseBody(response, true));
 
 assertTrue("Response should have 'profile' field when profile=true", result.has("profile"));
+assertFalse("Profile field should not be empty", result.isNull("profile"));
Suggestion importance[1-10]: 3

__

Why: The suggestion improves test coverage by checking that the profile field is non-null, but assertFalse(result.isNull("profile")) is a minor addition that doesn't significantly strengthen the test's ability to catch real issues.

Low

Switch from JdbcResponseFormatter to SimpleJsonResponseFormatter so
profiling data is included in the response when profile=true. The
SimpleJsonResponseFormatter calls QueryProfiling.current().finish()
to populate the profile field.

Update test assertions to match SimpleJsonResponseFormatter type names
(PPL_SPEC: INTEGER -> "int", STRING -> "string") and remove status
field check (not included by SimpleJsonResponseFormatter).

Add integration test verifying profile field appears in response.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the pr/mustang-profiling-and-parser-migration branch from ce2589b to b0ecfc2 Compare March 30, 2026 22:19
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit b0ecfc2

@ahkcs
Copy link
Copy Markdown
Collaborator Author

ahkcs commented Mar 30, 2026

For the profiling, is the result automatically populated into REST response? Is it possible to add integration test or verify manually?

The profiling result is now populated into the REST response. Switched from JdbcResponseFormatter to SimpleJsonResponseFormatter which calls QueryProfiling.current().finish() to include the profile field. This also aligns the response format with the existing Lucene PPL path which already uses SimpleJsonResponseFormatter.

The EXECUTE metric is recorded inside AnalyticsExecutionEngine.execute(), between the actual execution and the listener.onResponse() call. This is necessary because SimpleJsonResponseFormatter calls QueryProfiling.finish() inside the listener callback — if we used context.measure() in RestUnifiedQueryAction, the metric would be written after finish() already took the snapshot, resulting in 0ms.

Added integration test that verifies:

  • profile field exists when profile=true
  • phases.analyze.time_ms > 0 (planning took real time)
  • phases.execute.time_ms > 0 (execution took real time)

…yParser

Create UnifiedQueryContext upfront in isAnalyticsIndex() and use
context.getParser() for index name extraction. This reuses the
context-owned parser which supports both PPL and SQL, making it ready
for unified SQL support without code changes.

Remove standalone PPLQueryParser field and Settings constructor param.
isAnalyticsIndex() now takes QueryType to create the right context.
extractIndexName() handles UnresolvedPlan (PPL) with a TODO for
SqlNode (SQL) when unified SQL is enabled.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 022461a

Replace manual tree walking (findRelation) with IndexNameExtractor
visitor extending AbstractNodeVisitor. The visitor's visitRelation()
is called automatically by the AST accept/visitChildren pattern,
which handles tree traversal.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2cf5183

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2545d34

Wrap analyticsEngine.execute() and analyticsEngine.explain() calls
with context.measure(MetricName.EXECUTE, ...) so execution time is
captured in the profiling metrics. Planning is auto-profiled by
UnifiedQueryPlanner.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the pr/mustang-profiling-and-parser-migration branch from 2545d34 to ce4f6c5 Compare March 30, 2026 22:54
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit ce4f6c5

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit ba3efd7

@ahkcs ahkcs force-pushed the pr/mustang-profiling-and-parser-migration branch from ba3efd7 to 1e4fba8 Compare March 30, 2026 23:52
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 1e4fba8

…gine

Move EXECUTE metric recording into AnalyticsExecutionEngine.execute(),
between the actual execution (planExecutor + row conversion) and the
listener.onResponse() call. This ensures the metric is written before
SimpleJsonResponseFormatter calls QueryProfiling.finish() to snapshot.

Previously context.measure() was used in RestUnifiedQueryAction, but
finish() was called inside the listener callback (synchronously) before
measure()'s finally block could write the metric, resulting in 0ms.

Add IT assertion that execute phase time_ms > 0 to catch this bug.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the pr/mustang-profiling-and-parser-migration branch from 1e4fba8 to 7347fef Compare March 31, 2026 00:14
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 7347fef

@ahkcs ahkcs requested a review from dai-chen March 31, 2026 16:49
@ahkcs
Copy link
Copy Markdown
Collaborator Author

ahkcs commented Mar 31, 2026

Did verification on commands like DESCRIBE and SHOW DATASOURCES produce ASTs without a Relation node, so extractIndexName returns null and isAnalyticsIndex returns false — they fall through to the existing Lucene/PPLService path automatically.

@ahkcs ahkcs merged commit 151cffc into opensearch-project:feature/mustang-ppl-integration Mar 31, 2026
36 checks passed
@ahkcs ahkcs deleted the pr/mustang-profiling-and-parser-migration branch March 31, 2026 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants