fix(server): clean up orphaned results when removing batch of task ids#83
Merged
Merged
Conversation
Bug-hunt round 3 finding (sweep on tools/server/server-queue.cpp). server_response::remove_waiting_task_ids (multi-id) only erased entries from waiting_task_ids — unlike its single-id sibling remove_waiting_task_id which also dropped any pending entries in queue_results for the same id. When a reader tears down mid-stream (HTTP client disconnect, abort, or ordinary destruction during streaming with results still in flight), the asymmetric multi-id path leaves partial results stranded in queue_results. Those entries can never be matched by a future recv() (no caller waits on those ids any more), but the predicate `!queue_results.empty()` in recv()'s cv.wait still fires immediately, so the next recv() for an unrelated task id spin-waits at 100% CPU until that task's own result arrives and the for-loop finds it. Add the same erase-if pass to remove_waiting_task_ids and drop the now-redundant per-id remove_waiting_task_id calls from server_response_reader::stop() — they were the only thing covering this orphan case before, and only on the cancel branch. Caller audit: remove_waiting_task_ids has exactly one production call site (server_response_reader::stop), so the cleanup is limited-blast-radius. The cancel-loop's redundant single-id calls were 1 extra mutex acquire per cancelled task; removing them recovers that as a minor side effect.
This was referenced Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bug-hunt round 3 finding (sweep on
tools/server/server-queue.cpp). The multi-id variantserver_response::remove_waiting_task_idsonly erased entries fromwaiting_task_ids— its single-id siblingremove_waiting_task_idalso dropped any pending entries inqueue_resultsfor the same id. That asymmetry leaks orphaned partials when a reader tears down mid-stream.Why this matters
When an HTTP client disconnects mid-stream (or any other path that triggers
~server_response_reader → stop()while results are still in flight):stop()callsremove_waiting_task_ids(id_tasks)— clears waiting set, but leaves matching partials sitting inqueue_results.has_next() && !cancelled) may or may not run depending on state; even when it does, the per-id cleanup in the cancel loop only covers the not-yet-finished case.recv()call'sid_tasks, but the cv.wait predicate!queue_results.empty()is satisfied, so the nextrecv()for an unrelated task spin-waits at 100% CPU until its own result arrives and the for-loop finds it.Fix
Make
remove_waiting_task_idsclean upqueue_resultssymmetrically withremove_waiting_task_id, then drop the now-redundant per-id calls fromserver_response_reader::stop()'s cancel loop. Inline comment explains the spin-wait failure mode for future readers.Caller audit
remove_waiting_task_idshas exactly one production callsite (server_response_reader::stop). Limited blast radius.remove_waiting_task_id(single-id) still in use by per-task cleanup; unchanged.Test plan
🤖 Generated with Claude Code