Add query cancellation support via _tasks/_cancel API for PPL queries by sunil9977 · Pull Request #5254 · opensearch-project/sql

sunil9977 · 2026-03-23T11:14:10Z

PPL queries now register as CancellableTask, making them visible in GET /_tasks and cancellable via POST /_tasks/{task_id}/_cancel
Supports optional queryId in request body for task identification

Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>

github-actions · 2026-03-23T11:15:21Z

PR Reviewer Guide 🔍

(Review updated until commit `b2f0dc0`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Add queryId support to PPL request/transport layer Relevant files: ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java plugin/src/main/java/org/opensearch/sql/plugin/request/PPLQueryRequestFactory.java plugin/src/main/java/org/opensearch/sql/plugin/transport/TransportPPLQueryRequest.java Sub-PR theme: Implement CancellableTask registration and cooperative cancellation for PPL queries Relevant files: plugin/src/main/java/org/opensearch/sql/plugin/transport/PPLQueryTask.java plugin/src/main/java/org/opensearch/sql/plugin/transport/TransportPPLQueryAction.java opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java plugin/src/test/java/org/opensearch/sql/plugin/transport/PPLQueryTaskTest.java
⚡ Recommended focus areas for review ThreadLocal Leak In `submit()`, `cancellableTask.get()` is called and then `cancellableTask.remove()` is called before scheduling. However, `setCancellableTask(cancelTask)` is called again inside the scheduled `Runnable` (line 81), and `clearCancellableTask()` is only called in the `finally` block. If the scheduled task throws an exception before reaching the `finally` block (e.g., during `withCurrentContext` wrapping), the ThreadLocal is never cleared on the worker thread, potentially leaking the `CancellableTask` reference for future tasks reusing that thread. setCancellableTask(cancelTask); try { task.run(); timeoutTask.cancel(); // Clear any leftover thread interrupts to keep the thread pool clean Thread.interrupted(); } catch (Exception e) { timeoutTask.cancel(); // Special-case handling of timeout-related interruptions if (Thread.interrupted() \|\| e.getCause() instanceof InterruptedException) { LOG.error("Query was interrupted due to timeout after {}", timeout); throw new OpenSearchTimeoutException( "Query execution timed out after " + timeout); } throw e; } finally { clearCancellableTask(); Race Condition `setCancellableTask(pplQueryTask)` is called in `TransportPPLQueryAction.doExecute()` on the transport thread, and then `cancellableTask.get()` is read in `submit()` — both on the same thread, which is fine. However, the static `ThreadLocal` is set in `doExecute` before `submit` is called. If `submit` is ever called from a different thread than `doExecute` (e.g., due to future refactoring or async dispatch), the `ThreadLocal` value will be `null` on the worker thread. The design is fragile because it relies on implicit same-thread execution ordering between `doExecute` and `submit`. CancellableTask cancelTask = cancellableTask.get(); cancellableTask.remove(); schedule(nodeClient, queryPlan::execute, timeout, cancelTask); Cancellation Check Timing The cancellation check in `moveNext()` only fires when iterating rows. If a query is cancelled while waiting inside `bgScanner.fetchNextBatch()` (a blocking call), the cancellation won't be detected until the next call to `moveNext()` after the batch returns. For long-running or slow fetches, this means cancellation may be significantly delayed. This is a design limitation worth documenting, but could also be a real usability issue. if (cancellableTask != null && cancellableTask.isCancelled()) { throw new TaskCancelledException("The task is cancelled."); }

github-actions · 2026-03-23T11:15:53Z

PR Code Suggestions ✨

Latest suggestions up to b2f0dc0

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Inject dependency instead of static coupling `OpenSearchIndexEnumerator` is tightly coupled to `OpenSearchQueryManager` by directly calling its static `getCancellableTask()`. This makes the class harder to test and violates separation of concerns. The `CancellableTask` should be injected as a constructor parameter instead. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java [86-88] this.bgScanner = new BackgroundSearchScanner(client, maxResultWindow, queryBucketSize); this.bgScanner.startScanning(request); -this.cancellableTask = OpenSearchQueryManager.getCancellableTask(); +this.cancellableTask = cancellableTask; // injected via constructor parameter Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies a design concern where `OpenSearchIndexEnumerator` is tightly coupled to `OpenSearchQueryManager` via a static call, making it harder to test in isolation. Injecting `CancellableTask` as a constructor parameter would improve testability and separation of concerns.	Low
General	Add early cancellation check before task execution `setCancellableTask(cancelTask)` is called inside the worker thread, but `OpenSearchIndexEnumerator` is constructed during `task.run()` and calls `getCancellableTask()` in its constructor. Since `setCancellableTask` is called just before `task.run()`, this ordering is correct only if no enumerator is constructed before this point. However, if the task was already cancelled before execution starts, the cancellation check in `moveNext()` will catch it, but there's no early exit before `task.run()` begins. Consider checking `cancelTask.isCancelled()` before starting execution. opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java [81-85] setCancellableTask(cancelTask); + +if (cancelTask != null && cancelTask.isCancelled()) { + clearCancellableTask(); + return; +} try { task.run(); timeoutTask.cancel(); Suggestion importance[1-10]: 4 __ Why: Adding an early cancellation check before `task.run()` is a valid defensive improvement to avoid unnecessary work when a task is already cancelled. However, this is a minor optimization since the cancellation check in `moveNext()` would catch it shortly after anyway.	Low
Security	Truncate query in task description The `pplQuery` field could be very long and is exposed verbatim in the task description, which is visible via the `_tasks` API. This could leak sensitive query content. Consider truncating `pplQuery` to a reasonable maximum length in the description. plugin/src/main/java/org/opensearch/sql/plugin/transport/TransportPPLQueryRequest.java [166-170] @Override public String getDescription() { String prefix = (queryId != null) ? "PPL [queryId=" + queryId + "]: " : "PPL: "; - return prefix + pplQuery; + String truncated = pplQuery != null && pplQuery.length() > 1000 + ? pplQuery.substring(0, 1000) + "..." + : pplQuery; + return prefix + truncated; } Suggestion importance[1-10]: 4 __ Why: Truncating the query in the task description is a reasonable defensive measure to prevent very long queries from being exposed verbatim via the `_tasks` API. However, this is a minor concern as the `_tasks` API is typically admin-only and the query content is not inherently more sensitive than other logged data.	Low
Possible issue	Fix cross-thread ThreadLocal race condition The `setCancellableTask` stores the task in a `ThreadLocal`, but `doExecute` runs on a transport thread while `submit` (which reads and clears it) runs on a different thread pool thread. This creates a race condition where the ThreadLocal value set on the transport thread is not visible on the worker thread. The task should be passed directly rather than through a ThreadLocal. opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java [113-115] +// Pass cancelTask directly to submit or store it on the queryPlan instead of using ThreadLocal +// across thread boundaries. For example, store it on the AbstractPlan before scheduling. if (task instanceof PPLQueryTask pplQueryTask) { - OpenSearchQueryManager.setCancellableTask(pplQueryTask); + queryPlan.setCancellableTask(pplQueryTask); // or pass via another thread-safe mechanism } Suggestion importance[1-10]: 2 __ Why: The suggestion claims a race condition, but looking at the code flow: `setCancellableTask` is called on the transport thread before `submit`, then in `submit` the value is read and cleared immediately (`cancellableTask.get()` + `cancellableTask.remove()`), and then passed directly as `cancelTask` to `schedule`. Inside `schedule`, `setCancellableTask(cancelTask)` is called again on the worker thread before `task.run()`. The `OpenSearchIndexEnumerator` constructor reads it on the same worker thread. The flow is actually correct. The suggestion's `improved_code` references a non-existent `setCancellableTask` method on `AbstractPlan`.	Low

Previous suggestions

Suggestions up to commit fd7642c

Category	Suggestion	Impact
General	Verify enumerator construction timing relative to ThreadLocal set `setCancellableTask` is called inside the scheduled runnable, which sets the `ThreadLocal` on the worker thread. However, `OpenSearchIndexEnumerator` reads `getCancellableTask()` in its constructor, which is called during `task.run()`. This ordering is correct only if the enumerator is constructed inside `task.run()`. If the enumerator is constructed before `task.run()` is invoked, the `cancellableTask` will be null. Verify that the enumerator is always constructed within the execution of `task.run()`. opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java [81-84] +setCancellableTask(cancelTask); +try { + task.run(); + timeoutTask.cancel(); Suggestion importance[1-10]: 5 __ Why: This is a valid concern about ordering — `setCancellableTask` must be called before the enumerator constructor runs. The suggestion correctly identifies that if the enumerator is constructed inside `task.run()`, the ordering is correct. However, the `existing_code` and `improved_code` are identical, making this more of a verification note than an actionable fix.	Low
General	Add null-check logging for missing cancellable task `submit` is called on the transport thread, and `cancellableTask.get()` reads the `ThreadLocal` of that thread. But `setCancellableTask` is called in `doExecute` on the same transport thread, so this part is actually consistent. However, if `submit` is ever called from a different thread than where `setCancellableTask` was called, the value will be null. Consider adding a null-check or logging when `cancelTask` is null to aid debugging. opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java [54-56] CancellableTask cancelTask = cancellableTask.get(); cancellableTask.remove(); +if (cancelTask == null) { + LOG.debug("No cancellable task found in ThreadLocal for query submission."); +} schedule(nodeClient, queryPlan::execute, timeout, cancelTask); Suggestion importance[1-10]: 2 __ Why: Adding a debug log for a null `cancelTask` is a minor improvement with low impact. The code already handles null gracefully since `cancelTask` is passed to `schedule` and checked in the enumerator with a null guard.	Low
Possible issue	Ensure cancellation exception is unchecked in enumerator `TaskCancelledException` is thrown inside `moveNext()`, but Calcite's `Enumerator` interface does not declare checked exceptions. If `TaskCancelledException` is a checked exception, this will cause a compilation error. Verify that `TaskCancelledException` extends `RuntimeException`; if it does not, wrap it in an unchecked exception. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java [121-123] if (cancellableTask != null && cancellableTask.isCancelled()) { - throw new TaskCancelledException("The task is cancelled."); + throw new NonFallbackCalciteException(new TaskCancelledException("The task is cancelled.")); } Suggestion importance[1-10]: 4 __ Why: This is a valid concern — if `TaskCancelledException` is a checked exception, it would cause a compilation error in `moveNext()`. However, `org.opensearch.core.tasks.TaskCancelledException` in OpenSearch extends `RuntimeException`, so this is not actually a problem in practice, making the suggestion less critical.	Low
Possible issue	ThreadLocal misused for cross-thread task passing The `setCancellableTask` stores the task in a `ThreadLocal`, but `doExecute` runs on the transport thread while `submit` (which reads and clears it) runs on a different thread pool thread. This means the `ThreadLocal` value set here will never be visible to the thread that calls `cancellableTask.get()` in `submit`. The task should be passed directly rather than relying on `ThreadLocal` for cross-thread communication. opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java [113-115] -if (task instanceof PPLQueryTask pplQueryTask) { - OpenSearchQueryManager.setCancellableTask(pplQueryTask); -} +// Pass pplQueryTask directly to the plan or service layer instead of using ThreadLocal +// e.g., pplService.execute(request, pplQueryTask, listener); Suggestion importance[1-10]: 2 __ Why: The suggestion claims `setCancellableTask` and `submit` run on different threads, but looking at the code, `setCancellableTask` is called in `doExecute` on the transport thread, and `submit` is also called synchronously from that same thread before scheduling. The `ThreadLocal` is read in `submit` on the same thread it was set, so the concern is largely unfounded. The `improved_code` is also not actionable code.	Low

Suggestions up to commit 024a8ca

Category	Suggestion	Impact
Possible issue	Guard against pre-execution cancellation race condition There is a race condition: if the task is cancelled between `onExecutionThreadAvailable` and the start of actual query execution, the interrupt may be missed. Additionally, if the task is already cancelled before `onExecutionThreadAvailable` is called, the query will still proceed. You should check `callBack.isCancelled()` immediately after registering the thread and throw early if already cancelled. opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java [71-78] if (callBack != null) { callBack.onExecutionThreadAvailable(executionThread); + if (callBack.isCancelled()) { + LOG.info("Query was cancelled before execution started"); + throw new OpenSearchException("Query was cancelled."); + } } Scheduler.ScheduledCancellable timeoutTask = threadPool.schedule( () -> { LOG.warn( Suggestion importance[1-10]: 6 __ Why: The suggestion identifies a valid race condition where a task cancelled before execution starts would not be detected. Adding an `isCancelled()` check after `onExecutionThreadAvailable` is a reasonable defensive measure, though the window is small in practice.	Low
Possible issue	Ensure ThreadLocal cleanup on exception paths If an exception is thrown between `setCancellationCallback` and the point where `cancellationCallBackThreadLocal.remove()` is called in `submit()`, the `ThreadLocal` may not be cleaned up, causing a memory leak or stale callback in the thread pool. A try-finally block should be used to ensure cleanup, or the removal should be guaranteed even on failure paths. plugin/src/main/java/org/opensearch/sql/plugin/transport/TransportPPLQueryAction.java [113-131] if (task instanceof SQLQueryTask sqlQueryTask) { - OpenSearchQueryManager.setCancellationCallback(new OpenSearchQueryManager.CancellationCallBack() { @Override public void onExecutionThreadAvailable(Thread thread) { sqlQueryTask.setExecutionThread(thread); } @Override public void onExecutionComplete() { sqlQueryTask.clearExecutionThread(); } @Override public boolean isCancelled() { return sqlQueryTask.isCancelled(); } }); } +try { Suggestion importance[1-10]: 3 __ Why: The concern about `ThreadLocal` cleanup is valid in theory, but the `improved_code` is incomplete (ends with an unclosed `try {`) and doesn't show the full solution. The suggestion is partially correct but the implementation is not actionable as provided.	Low
General	Restrict direct access to ThreadLocal field The `cancellationCallBackThreadLocal` field is `public` and mutable, which allows external code to directly manipulate the `ThreadLocal` bypassing the provided setter/cleaner methods. It should be `private` to enforce encapsulation and prevent misuse. opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java [43] -public static ThreadLocal<CancellationCallBack> cancellationCallBackThreadLocal = new ThreadLocal<>(); +private static final ThreadLocal<CancellationCallBack> cancellationCallBackThreadLocal = new ThreadLocal<>(); Suggestion importance[1-10]: 5 __ Why: Making `cancellationCallBackThreadLocal` `public` and non-final is a valid encapsulation concern. Making it `private static final` enforces proper access through the provided setter/cleaner methods and prevents external misuse.	Low
General	Handle null query in description method If `pplQuery` is `null`, the method returns `prefix + null` (i.e., the string `"PPL:` `null"`), which is misleading. A null check should be added to handle this case gracefully. plugin/src/main/java/org/opensearch/sql/plugin/transport/TransportPPLQueryRequest.java [167-176] public String getDescription() { String prefix = (queryId != null) ? "PPL [queryId=" + queryId + "]: " : "PPL: "; - if (pplQuery != null && pplQuery.length() > 512) { - return prefix + pplQuery.substring(0,512) + "..."; + if (pplQuery == null) { + return prefix; + } + + if (pplQuery.length() > 512) { + return prefix + pplQuery.substring(0, 512) + "..."; } return prefix + pplQuery; } Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies that a `null` `pplQuery` would result in the string `"PPL: null"` being returned. Adding a null check improves robustness, though `pplQuery` being null is an edge case given the existing constructors.	Low

ahkcs · 2026-03-23T18:06:11Z

Hi @sunil9977, thanks for the changes! Please take a look at the CI failures

ahkcs · 2026-03-23T21:19:56Z

plugin/src/main/java/org/opensearch/sql/plugin/transport/SQLQueryTask.java

@@ -0,0 +1,44 @@
+/*


We can rename this file to PPLQueryTask

ahkcs · 2026-03-23T21:21:03Z

CI failure seems to be caused by spotless check, please run ./gradlew spotlessApply to fix this

ahkcs · 2026-03-23T21:27:04Z

During e2e testing with the OSD cancel button, we noticed that Thread.interrupt() only takes effect when the execution thread hits a blocking call (I/O, Future.get(), etc.). For CPU-bound queries like large joins, this can mean a 50+ second delay between clicking cancel and the query actually stopping.

Adding an interrupt check in OpenSearchIndexEnumerator.moveNext() would make cancellation near-instant, since moveNext() is called for every row:

// opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java

@Override
public boolean moveNext() {
    if (Thread.interrupted()) {
        throw new NonFallbackCalciteException(
            String.format("Query was cancelled after processing %d rows.", queryCount));
    }

    if (queryCount >= maxResponseSize) {
        return false;
    }
    // ... rest of method
}

This is outside the scope of this PR but would be a nice follow-up to make the cancel experience snappy. In our testing it brought cancel response time from ~50s down to <5s.

Swiddis · 2026-03-23T21:50:09Z

plugin/src/main/java/org/opensearch/sql/plugin/transport/TransportPPLQueryRequest.java

+      String prefix = (queryId != null) ? "PPL [queryId=" + queryId + "]: " : "PPL: ";
+
+      if (pplQuery != null && pplQuery.length() > 512) {
+          return prefix + pplQuery.substring(0,512) + "...";


suggestion: I don't think this is worth truncating.

The only way to access this description is to directly check the task by ID. Seeing the full query might be useful debug context. Similar debug logic in OSD core exists and doesn't truncate.

Swiddis · 2026-03-23T22:06:42Z

opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java

                if (Thread.interrupted() || e.getCause() instanceof InterruptedException) {
+                    if (callBack != null && callBack.isCancelled()) {
+                        LOG.info("Query was cancelled");
+                        throw new OpenSearchException("Query was cancelled.");


issue: This exception will propagate as a 500, which is bad for people monitoring errors.

OpenSearchException is the generic top-level exception class, it defaults as internal server error when the status isn't set. You want to use one of its subclasses.

TaskCancelledException seems like a better choice.

Swiddis · 2026-03-23T22:10:09Z

During e2e testing with the OSD cancel button, we noticed that Thread.interrupt() only takes effect when the execution thread hits a blocking call (I/O, Future.get(), etc.). For CPU-bound queries like large joins, this can mean a 50+ second delay between clicking cancel and the query actually stopping.

That's on me lol, I thought when I implemented it that the CPU part of joins would be fast (the bulk of the time on most joins is waiting for batches), so it'd be overkill to check every iteration and we could just throw when we request new batches. Happy to take it up if there's a followup task. Do you have more precise repro steps? Since I haven't noticed this issue before.

I'm also interested in if doing this on every row would have a negative effect on benchmarks.

Swiddis · 2026-03-23T22:41:54Z

plugin/src/main/java/org/opensearch/sql/plugin/transport/SQLQueryTask.java

+
+    @Override
+    public boolean shouldCancelChildrenOnCancellation() {
+        return false;


Should this be false? Query tasks tend to spawn background IO tasks, but I don't know if those count as children.

Yes, updated it to true.

ahkcs · 2026-03-23T22:42:57Z

Do you have more precise repro steps?

I was doing a join query from OSD using POC OSD side change for the cancel button. And after query cancelled, the stacktrace showed up 50 seconds later

ahkcs · 2026-03-23T23:13:15Z

Based on @Swiddis's observation that almost no core code overrides onCancelled() — it looks like core's cancellation model is cooperative: cancel() flips isCancelled() to true, and the task is expected to poll that flag and stop itself (e.g., search tasks check task.isCancelled() per Lucene segment).

I think a simpler approach might work here: pass the CancellableTask reference to the worker thread, and check task.isCancelled() in OpenSearchIndexEnumerator.moveNext(). This would remove the need for the CancellationCallBack interface, the ThreadLocal callback, onCancelled() override, thread interruption, and the AtomicReference<Thread> — all replaced by one isCancelled() check per row.

Swiddis · 2026-03-23T23:32:21Z

Oh, interesting. If I'm understanding right they implemented their own thread interruption mechanism? I wonder why not use regular interrupt(). Given we already implemented timeouts via interrupt, I wonder if it's worthwhile to refactor everything or we should just have cancellation point to interrupt. Open to either but given the current impl it'd be less work to stick with interrupts in the core places.

Swiddis · 2026-03-23T23:43:23Z

Ok, talked with the core folks, task cancellation is the better approach. Since opensearch uses a thread pool system, interrupting the thread has potential to poison the whole pool. It works for us because we're careful to uninterrupt the thread when handling interruption errors, but there might theoretically be a path that leaves a perpetually-interrupted thread (either currently or in the future). Cancellation also has the benefit of propagating across clusters better (which is more relevant to Core which is fanning out requests to many data nodes regularly).

So the correct approach is change our thread interrupts to use the native task cancellation feature. If all that understanding is correct, it means we can also clean up the logic that un-interrupts the thread, but I think it's only like 2 lines anyway so not a massive win.

github-actions · 2026-03-24T11:36:15Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit b2f0dc0.

Path	Line	Severity	Description
plugin/src/main/java/org/opensearch/sql/plugin/transport/TransportPPLQueryRequest.java	161	low	User-supplied 'queryId' from the request payload (parsed in PPLQueryRequestFactory.java) is embedded unsanitized into task description strings via getDescription(). While task descriptions are informational and not used for security decisions, arbitrary user input in system-visible task metadata could enable log injection or information disclosure if descriptions are surfaced in monitoring/audit tooling. This has a plausible legitimate explanation (task identification), but warrants review of how task descriptions are consumed downstream.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-24T11:37:09Z

Persistent review updated to latest commit fd7642c

sunil9977 · 2026-03-24T11:40:36Z

Thank you all for reviews.
I've refactored it to use existing methods rather than using thread.interrupt().
Please let me know if my understanding is correct.

Swiddis · 2026-03-24T17:00:41Z

opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java

                        LOG.warn(
                            "Query execution timed out after {}. Interrupting execution thread.",
                            timeout);
                        executionThread.interrupt();


future PR: this interrupt needs to be changed to cancellation as well (and all code paths that depend on it)

Swiddis · 2026-03-24T17:02:25Z

@sunil9977 LGTM, one more chore: please add DCO signoffs to the commits as described in https://github.com/opensearch-project/sql/pull/5254/checks?check_run_id=68346380999 & https://github.com/opensearch-project/.github/blob/main/CONTRIBUTING.md#developer-certificate-of-origin.

…gestions. Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>

github-actions · 2026-03-25T03:05:25Z

Persistent review updated to latest commit b2f0dc0

Enable cancellation of in-flight PPL queries from the Discover UI cancel button by leveraging the OpenSearch task framework. - Generate a client-side UUID (queryId) for each synchronous PPL query - Pass queryId through the full request pipeline to OpenSearch PPL API - Add POST /api/enhancements/ppl/cancel server route that lists PPL tasks via _tasks API, matches by queryId in task description, and cancels all matching tasks - Wire the existing Discover abort flow to call the cancel route on abort - Cancel all matching tasks (data + histogram queries share the same queryId) Depends on opensearch-project/sql#5254 for backend task registration. Signed-off-by: Andrew Kim <andrewkcs@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>

Enable cancellation of in-flight PPL queries from the Discover UI cancel button by leveraging the OpenSearch task framework. - Generate a client-side UUID (queryId) for each synchronous PPL query - Pass queryId through the full request pipeline to OpenSearch PPL API - Add POST /api/enhancements/ppl/cancel server route that lists PPL tasks via _tasks API, matches by queryId in task description, and cancels all matching tasks - Wire the existing Discover abort flow to call the cancel route on abort - Cancel all matching tasks (data + histogram queries share the same queryId) Depends on opensearch-project/sql#5254 for backend task registration. Signed-off-by: Kai Huang <ahkcs@amazon.com>

* [Query] Add PPL query cancellation support via Discover cancel button Enable cancellation of in-flight PPL queries from the Discover UI cancel button by leveraging the OpenSearch task framework. - Generate a client-side UUID (queryId) for each synchronous PPL query - Pass queryId through the full request pipeline to OpenSearch PPL API - Add POST /api/enhancements/ppl/cancel server route that lists PPL tasks via _tasks API, matches by queryId in task description, and cancels all matching tasks - Wire the existing Discover abort flow to call the cancel route on abort - Cancel all matching tasks (data + histogram queries share the same queryId) Depends on opensearch-project/sql#5254 for backend task registration. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Fire-and-forget PPL cancel request to avoid blocking UI Remove await from the cancel HTTP call so the AbortError propagates immediately. The cancel is a best-effort notification to the backend and should not block the frontend from moving on. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Use exact PPL action name instead of wildcard in task filter Use 'cluster:admin/opensearch/ppl' instead of '*ppl*' to avoid matching unrelated tasks. The _tasks API does not support filtering by description or queryId, so client-side matching remains necessary. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Revert action filter back to wildcard *ppl* The exact action name is less resilient to future changes. The wildcard is sufficient since we still match by queryId in the task description. Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Init CLAUDE.md (#5259) Signed-off-by: Heng Qian <qianheng@amazon.com> * Add label to exempt specific PRs from stalled labeling (#5263) * Implement `reverse` performance optimization (#4775) Co-authored-by: Jialiang Liang <jiallian@amazon.com> * Add songkant-aws as maintainer (#5244) * Move some maintainers from active to Emeritus (#5260) * Move inactive current maintainers to Emeritus Signed-off-by: Lantao Jin <ltjin@amazon.com> * Remove affiliation column for emeritus maintainers Signed-off-by: Lantao Jin <ltjin@amazon.com> * formatted Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix formatting in MAINTAINERS.md Signed-off-by: Simeon Widdis <sawiddis@gmail.com> Signed-off-by: Simeon Widdis <sawiddis@gmail.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> Signed-off-by: Simeon Widdis <sawiddis@gmail.com> Co-authored-by: Simeon Widdis <sawiddis@gmail.com> * Add query cancellation support via _tasks/_cancel API for PPL queries (#5254) * Add query cancellation support via _tasks/_cancel API for PPL queries Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com> * Refactor PPL query cancellation to cooperative model and other PR suggestions. Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com> --------- Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com> * Add Calcite native SQL planning in UnifiedQueryPlanner (#5257) * feat(api): Add Calcite native SQL planning path in UnifiedQueryPlanner Add SQL support to the unified query API using Calcite's native parser pipeline (SqlParser → SqlValidator → SqlToRelConverter → RelNode), bypassing the ANTLR parser used by PPL. Changes: - UnifiedQueryPlanner: use PlanningStrategy to dispatch CalciteNativeStrategy vs CustomVisitorStrategy - CalciteNativeStrategy: Calcite Planner with try-with-resources for ANSI SQL - CustomVisitorStrategy: ANTLR-based path for PPL (and future SQL V2) - UnifiedQueryContext: SqlParser.Config with Casing.UNCHANGED to preserve lowercase OpenSearch index names Signed-off-by: Chen Dai <daichen@amazon.com> * test(api): Add SQL planner tests and refactor test base for multi-language support - Refactor UnifiedQueryTestBase with queryType() hook for subclass override - Add UnifiedSqlQueryPlannerTest covering SELECT, WHERE, GROUP BY, JOIN, ORDER BY, subquery, case sensitivity, namespaces, and error handling - Update UnifiedQueryContextTest to verify SQL context creation Signed-off-by: Chen Dai <daichen@amazon.com> * perf(benchmarks): Add SQL queries to UnifiedQueryBenchmark Add language (PPL/SQL) and queryPattern param dimensions for side-by-side comparison of equivalent queries across both languages. Remove separate UnifiedSqlQueryBenchmark in favor of unified class. Signed-off-by: Chen Dai <daichen@amazon.com> * docs(api): Update README to reflect SQL support in UnifiedQueryPlanner Signed-off-by: Chen Dai <daichen@amazon.com> * fix(api): Normalize trailing whitespace in assertPlan comparison RelOptUtil.toString() appends a trailing newline after the last plan node, which doesn't match Java text block expectations. Also add \r\n normalization for Windows CI compatibility, consistent with the existing pattern in core module tests. Signed-off-by: Chen Dai <daichen@amazon.com> --------- Signed-off-by: Chen Dai <daichen@amazon.com> * [Feature] Support graphLookup with literal value as its start (#5253) * [Feature] Support graphLookup as top-level PPL command (#5243) Add support for graphLookup as the first command in a PPL query with literal start values, instead of requiring piped input from source=. Syntax: graphLookup table start="value" edge=from-->to as output graphLookup table start=("v1", "v2") edge=from-->to as output Signed-off-by: Heng Qian <qianheng@amazon.com> * Spotless check Signed-off-by: Heng Qian <qianheng@amazon.com> * Ignore child pipe if using start value Signed-off-by: Heng Qian <qianheng@amazon.com> * Add graphLookup integration tests per PPL command checklist - Add explain plan tests in CalciteExplainIT with YAML assertions - Add v2-unsupported tests in NewAddedCommandsIT - Add CalcitePPLGraphLookupIT to CalciteNoPushdownIT suite - Skip graphLookup tests when pushdown is disabled (required by impl) - Add expected plan YAML files for piped and top-level graphLookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Remove brace of start value list Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com> * Apply docs website feedback to ppl functions (#5207) * apply doc website feedback to ppl functions Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * take out comments Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix json_append example Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix json_append example Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix links Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> --------- Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com> * feat(api): Add profiling support to unified query API (#5268) Add query profiling infrastructure that measures time spent in each query phase (analyze, optimize, execute, format). Profiling is opt-in via UnifiedQueryContext.builder().profiling(true) and uses thread-local context to avoid passing profiling state through every method. Key changes: - QueryProfiling/ProfileContext for thread-local profiling lifecycle - UnifiedQueryContext.measure() API for timing arbitrary phases - Auto-profiling in UnifiedQueryPlanner (analyze) and compiler (optimize) - UnifiedQueryTestBase shared test fixture for unified query tests - Comprehensive profiling tests with non-flaky >= 0 timing assertions Signed-off-by: Chen Dai <daichen@amazon.com> * Add UnifiedQueryParser with language-specific implementations (#5274) Extract parsing logic from UnifiedQueryPlanner into a UnifiedQueryParser interface with language-specific implementations: PPLQueryParser (returns UnresolvedPlan) and CalciteSqlQueryParser (returns SqlNode). UnifiedQueryContext owns the parser instance, created eagerly by the builder which has direct access to query type and future SQL config. Each implementation receives only its required dependencies: PPLQueryParser takes Settings, CalciteSqlQueryParser takes CalcitePlanContext. UnifiedQueryPlanner.CustomVisitorStrategy now obtains the parser from the context via the interface type. Signed-off-by: Chen Dai <daichen@amazon.com> * Fix flaky TPC-H Q1 test due to bugs in `MatcherUtils.closeTo()` (#5283) * Fix the flaky tpch Q1 Signed-off-by: Lantao Jin <ltjin@amazon.com> * Change to ULP-aware to handle floating-point precision differences Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com> Signed-off-by: Lantao Jin <ltjin@amazon.com> Signed-off-by: Simeon Widdis <sawiddis@gmail.com> Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com> Signed-off-by: Chen Dai <daichen@amazon.com> Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> Co-authored-by: qianheng <qianheng@amazon.com> Co-authored-by: Simeon Widdis <sawiddis@gmail.com> Co-authored-by: Jialiang Liang <jiallian@amazon.com> Co-authored-by: Lantao Jin <ltjin@amazon.com> Co-authored-by: Sunil Ramchandra Pawar <pawar_sr@apple.com> Co-authored-by: Chen Dai <daichen@amazon.com> Co-authored-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>

Add query cancellation support via _tasks/_cancel API for PPL queries

024a8ca

Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>

ahkcs added PPL Piped processing language enhancement New feature or request labels Mar 23, 2026

ahkcs reviewed Mar 23, 2026

View reviewed changes

Swiddis requested changes Mar 23, 2026

View reviewed changes

Swiddis reviewed Mar 23, 2026

View reviewed changes

Swiddis reviewed Mar 24, 2026

View reviewed changes

Refactor PPL query cancellation to cooperative model and other PR sug…

b2f0dc0

…gestions. Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com>

sunil9977 force-pushed the issue-4887 branch from fd7642c to b2f0dc0 Compare March 25, 2026 02:55

sunil9977 requested a review from songkant-aws as a code owner March 25, 2026 02:55

Swiddis self-requested a review March 25, 2026 03:13

Swiddis approved these changes Mar 25, 2026

View reviewed changes

Swiddis enabled auto-merge (squash) March 25, 2026 07:43

ahkcs approved these changes Mar 25, 2026

View reviewed changes

Swiddis merged commit d903f24 into opensearch-project:main Mar 25, 2026
38 checks passed

This was referenced Mar 25, 2026

Add PPL query cancellation via Discover cancel button opensearch-project/OpenSearch-Dashboards#11593

Merged

Classify TaskCancelledException as client error instead of system error #5273

Draft

Conversation

sunil9977 commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit b2f0dc0)

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

ahkcs commented Mar 23, 2026

Uh oh!

ahkcs Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

ahkcs commented Mar 23, 2026

Uh oh!

ahkcs commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Swiddis Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Swiddis Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Swiddis commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Swiddis Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

sunil9977 Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

ahkcs commented Mar 23, 2026

Uh oh!

ahkcs commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Swiddis commented Mar 23, 2026

Uh oh!

Swiddis commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

sunil9977 commented Mar 24, 2026

Uh oh!

Swiddis Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Swiddis commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 23, 2026 •

edited

Loading

(Review updated until commit `b2f0dc0`)

github-actions bot commented Mar 23, 2026 •

edited

Loading

ahkcs commented Mar 23, 2026 •

edited

Loading

Swiddis commented Mar 23, 2026 •

edited

Loading

ahkcs commented Mar 23, 2026 •

edited

Loading

Swiddis commented Mar 23, 2026 •

edited

Loading

github-actions bot commented Mar 24, 2026 •

edited

Loading

Swiddis Mar 24, 2026 •

edited

Loading