Abort pending blocking search tasks when we drop/invalidate the search by timvisee · Pull Request #7530 · qdrant/qdrant

timvisee · 2025-11-13T16:37:59Z

Critical fix for large clusters with huge search load.

Searches spawn blocking search tasks on a dedicated thread pool. The pool size is limited, and so these search tasks may be queued. If there is a humongous amount of incoming searches, this queue may get infinitely long.

When a search is invalidated (timed out, or completed through fan out), we drop the async task. The idea is that we abort all work related to this search to release resources.

It turns out that these spawned tasks will always remain queued and will be run to completion, even if the owning future was already dropped and aborted. Instead, we must explicitly cancel these pending spawned blocking tasks.

Luckily tokio provides us the AbortOnDropHandle utility, which is what I've implemented in this PR.

On a huge cluster that accumulate a massive amount of pending searches, it was possible to keep segments busy for more than an hour. Even though each search itself might only be a few seconds of work. This can break optimizations if they cannot release old segments for over an hour. This PR will help prevent this issue from happening by cancelling all invalidated searches early. More specifically, it helps prevent this error from happening:

Service internal error: Removing proxy segment which is still in use

All Submissions:

Contributions should target the dev branch. Did you create your branch from dev?
Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

Does your submission pass tests?
Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
Have you checked your code using cargo clippy --all --all-features command?

agourlay

I trust that you have tested this manually 👍

timvisee · 2025-11-14T07:47:41Z

I trust that you have tested this manually 👍

Correct.

We can add a test for it, but I'd consider that obsolete. Because then we'd be testing tokio fundamentals. spawn_ functions are intended to keep the task even if the caller is dropped. It was simply an oversight from our side when implementing this.

All shard read operations, such as retrieve, scroll, facets and more can be safely aborted prematurely. Related to: <#7530>

#7530) * Cancel pending blocking search tasks when we drop/invalidate the search * Mention PR with explanation in comment

* Prematurely abort blocking task in `spawn_cancel_on_drop` on drop These tasks are intended to be cancellable. Now we prematurely abort the task if the future was dropped before the task is executed. * Prematurely abort blocking task in `spawn_cancel_on_token` on cancel These tasks are intended to be cancellable. Now we prematurely abort the task if the cancellation token is triggered before the task is executed. * Prematurely abort blocking task for fetching telemetry * Prematurely abort stoppable task on drop, all are safe to abort early * Make `move_dir` either move everything, or nothing at all That is with the exception of file IO errors in which case data may be partially moved. Before this PR it was possible for the new target directory to be created without moving all data into it. Now we either do all, or nothing. * Prematurely abort task for creating full snapshot It is fine to either create it, or not at all. * Prematurely abort blocking task for waiting on consensus leader * Prematurely abort blocking cardinality estimation and shard info tasks * Prematurely abort blocking point deduplication task * Prematurely abort blocking task for checking available disk space * Prematurely abort blocking shard read operations All shard read operations, such as retrieve, scroll, facets and more can be safely aborted prematurely. Related to: <#7530> * Prematurely abort blocking task for waiting on replica state * Prematurely abort blocking task for waiting on transfer replica states * Prematurely abort blocking task for loading segment This can safely be aborted before the task is started * Prematurely abort blocking task waiting for replica states * Prematurely abort blocking task for creating snapshot file Safe because it aborts before writing any snapshot files to disk

Cancel pending blocking search tasks when we drop/invalidate the search

37a9511

timvisee added bug Something isn't working release:1.16.0 Pull requests that should be merged for the Qdrant 1.16.0 release. labels Nov 13, 2025

Mention PR with explanation in comment

4ccad86

timvisee requested review from agourlay, ffuugoo and generall November 13, 2025 16:38

ffuugoo approved these changes Nov 13, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

qdrant deleted a comment from coderabbitai bot Nov 13, 2025

agourlay approved these changes Nov 13, 2025

View reviewed changes

timvisee changed the title ~~Cancel pending blocking search tasks when we drop/invalidate the search~~ Abort pending blocking search tasks when we drop/invalidate the search Nov 14, 2025

timvisee merged commit c86aa1a into dev Nov 14, 2025
16 checks passed

timvisee deleted the cancel-search-tasks-on-drop branch November 14, 2025 07:49

timvisee added a commit that referenced this pull request Nov 14, 2025

Prematurely abort blocking shard read operations

bf2a254

All shard read operations, such as retrieve, scroll, facets and more can be safely aborted prematurely. Related to: <#7530>

timvisee mentioned this pull request Nov 14, 2025

Audit all spawn blocking calls, prematurely abort them #7533

Merged

6 tasks

timvisee added a commit that referenced this pull request Nov 14, 2025

Abort pending blocking search tasks when we drop/invalidate the search (

4c10db4

#7530) * Cancel pending blocking search tasks when we drop/invalidate the search * Mention PR with explanation in comment

timvisee mentioned this pull request Nov 14, 2025

Bump version to 1.16.0 #7535

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abort pending blocking search tasks when we drop/invalidate the search#7530

Abort pending blocking search tasks when we drop/invalidate the search#7530
timvisee merged 2 commits intodevfrom
cancel-search-tasks-on-drop

timvisee commented Nov 13, 2025 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

agourlay left a comment

Uh oh!

timvisee commented Nov 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

timvisee commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

All Submissions:

New Feature Submissions:

Uh oh!

This comment was marked as resolved.

Uh oh!

agourlay left a comment

Choose a reason for hiding this comment

Uh oh!

timvisee commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

timvisee commented Nov 13, 2025 •

edited

Loading

timvisee commented Nov 14, 2025 •

edited

Loading