Skip to content

fix(server): clean up orphaned results when removing batch of task ids#83

Merged
marksverdhei merged 1 commit into
htfrom
fix/server-queue-orphaned-results
Jun 12, 2026
Merged

fix(server): clean up orphaned results when removing batch of task ids#83
marksverdhei merged 1 commit into
htfrom
fix/server-queue-orphaned-results

Conversation

@marksverdhei

Copy link
Copy Markdown

Summary

Bug-hunt round 3 finding (sweep on tools/server/server-queue.cpp). The multi-id variant server_response::remove_waiting_task_ids only erased entries from waiting_task_ids — its single-id sibling remove_waiting_task_id also dropped any pending entries in queue_results for the same id. That asymmetry leaks orphaned partials when a reader tears down mid-stream.

Why this matters

When an HTTP client disconnects mid-stream (or any other path that triggers ~server_response_reader → stop() while results are still in flight):

  1. stop() calls remove_waiting_task_ids(id_tasks) — clears waiting set, but leaves matching partials sitting in queue_results.
  2. Cancel branch (has_next() && !cancelled) may or may not run depending on state; even when it does, the per-id cleanup in the cancel loop only covers the not-yet-finished case.
  3. Orphaned partials accumulate. They never match any future recv() call's id_tasks, but the cv.wait predicate !queue_results.empty() is satisfied, so the next recv() for an unrelated task spin-waits at 100% CPU until its own result arrives and the for-loop finds it.

Fix

Make remove_waiting_task_ids clean up queue_results symmetrically with remove_waiting_task_id, then drop the now-redundant per-id calls from server_response_reader::stop()'s cancel loop. Inline comment explains the spin-wait failure mode for future readers.

Caller audit

  • remove_waiting_task_ids has exactly one production callsite (server_response_reader::stop). Limited blast radius.
  • remove_waiting_task_id (single-id) still in use by per-task cleanup; unchanged.
  • The dropped per-id calls in stop()'s cancel loop were 1 extra mutex acquire per cancelled task; removing them is a minor perf side effect.

Test plan

  • Local rebuild (CPU) clean.
  • 45/46 server unit tests pass (the 1 failure is a network-bound model download unrelated to this change).
  • No new test added: triggering this requires mid-stream client disconnect with another in-flight task — observable only via CPU usage and only racey to reproduce. Code change is small, comment is explicit, and the symmetry argument is the verification.

🤖 Generated with Claude Code

Bug-hunt round 3 finding (sweep on tools/server/server-queue.cpp).
server_response::remove_waiting_task_ids (multi-id) only erased entries
from waiting_task_ids — unlike its single-id sibling remove_waiting_task_id
which also dropped any pending entries in queue_results for the same id.

When a reader tears down mid-stream (HTTP client disconnect, abort, or
ordinary destruction during streaming with results still in flight), the
asymmetric multi-id path leaves partial results stranded in
queue_results. Those entries can never be matched by a future recv()
(no caller waits on those ids any more), but the predicate
`!queue_results.empty()` in recv()'s cv.wait still fires immediately,
so the next recv() for an unrelated task id spin-waits at 100% CPU
until that task's own result arrives and the for-loop finds it.

Add the same erase-if pass to remove_waiting_task_ids and drop the
now-redundant per-id remove_waiting_task_id calls from
server_response_reader::stop() — they were the only thing covering
this orphan case before, and only on the cancel branch.

Caller audit: remove_waiting_task_ids has exactly one production
call site (server_response_reader::stop), so the cleanup is
limited-blast-radius. The cancel-loop's redundant single-id calls
were 1 extra mutex acquire per cancelled task; removing them
recovers that as a minor side effect.
@marksverdhei marksverdhei merged commit 6de2e40 into ht Jun 12, 2026
4 of 8 checks passed
@marksverdhei marksverdhei deleted the fix/server-queue-orphaned-results branch June 12, 2026 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant