Skip to content

[Bugfix][Frontend] Keep a reference to the background abort task in disagg api_router#45249

Open
kratos0718 wants to merge 1 commit into
vllm-project:mainfrom
kratos0718:fix/abort-task-reference
Open

[Bugfix][Frontend] Keep a reference to the background abort task in disagg api_router#45249
kratos0718 wants to merge 1 commit into
vllm-project:mainfrom
kratos0718:fix/abort-task-reference

Conversation

@kratos0718

Copy link
Copy Markdown

Purpose

The /abort_requests endpoint (disaggregated api_router.py) schedules the abort with a bare asyncio.create_task(...) and discards the returned task:

# Abort requests in background
asyncio.create_task(engine_client(raw_request).abort(request_ids))
return Response(status_code=200)

Per the asyncio docs, the event loop only keeps a weak reference to a task. A task with no other reference can be garbage-collected before it finishes. Here that means the abort coroutine can be collected mid-run, so under load the requested aborts may silently not happen — the endpoint returns 200 either way, so the failure is invisible.

Fix

Keep a strong reference to the task in a module-level set and remove it on completion via add_done_callback (the pattern recommended in the asyncio docs):

task = asyncio.create_task(engine_client(raw_request).abort(request_ids))
_background_tasks.add(task)
task.add_done_callback(_background_tasks.discard)

No behavior change beyond guaranteeing the abort task actually runs to completion. Single file, +9/-2.

Test Plan

  • python -m py_compile clean.
  • No functional change to the endpoint's contract; the task is simply retained until it finishes rather than being eligible for GC.

The /abort_requests handler schedules engine_client.abort() with
asyncio.create_task() and discards the returned task. The event loop
only holds a weak reference to a task, so it can be garbage-collected
before abort() finishes — meaning the requested aborts may silently not
happen under load.

Track the task in a module-level set and drop it on completion via
add_done_callback, so a strong reference is held until the abort runs.

Signed-off-by: Abhinav Tarigoppula <abhinaaavvv07187@gmail.com>
@kratos0718 kratos0718 requested a review from njhill as a code owner June 11, 2026 07:18
@mergify mergify Bot added frontend bug Something isn't working labels Jun 11, 2026
@github-actions

Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant