Skip to content

fix(api-server): persist response snapshot on client disconnect AND server cancellation (salvages #15171)#15392

Merged
teknium1 merged 3 commits into
mainfrom
hermes/hermes-1070584a
Apr 24, 2026
Merged

fix(api-server): persist response snapshot on client disconnect AND server cancellation (salvages #15171)#15392
teknium1 merged 3 commits into
mainfrom
hermes/hermes-1070584a

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Salvages @UgwujaGeorge's PR #15171 with a follow-up for the asyncio.CancelledError path.

Summary

Streamed /v1/responses calls with store=true now leave a retrievable snapshot in ResponseStore whether the stream ends cleanly, on client disconnect, OR on server-side cancellation (shutdown, request-level timeout). Previously only the clean path wrote to the store, so GET /v1/responses/{id} and previous_response_id chaining 404'd after any abrupt termination.

Changes

  • Commit 1 (@UgwujaGeorge, cherry-picked from fix(api-server): persist response snapshot on client disconnect when store=True #15171): persists an in_progress snapshot immediately after response.created, updates it to completed/failed on terminal events, writes an incomplete snapshot on ConnectionResetError / BrokenPipeError / OSError.
  • Commit 2 (follow-up): also persists incomplete on asyncio.CancelledError (server shutdown, request timeout). Factors the incomplete-snapshot build into _persist_incomplete_if_needed() so both branches share one implementation; the cancellation handler re-raises to preserve cooperative-cancel semantics. Adds two direct _write_sse_responses tests (TestClient-level disconnect races the server handler, flaky for an end-to-end assertion).
  • Commit 3: AUTHOR_MAP entry for @UgwujaGeorge so release notes attribute correctly.

Validation

scripts/run_tests.sh tests/gateway/test_api_server.py
119 passed, 74 warnings in 3.38s

Closes #15171.

UgwujaGeorge and others added 3 commits April 24, 2026 15:16
…r too

Extends PR #15171 to also cover the server-side cancellation path (aiohttp
shutdown, request-level timeout) — previously only ConnectionResetError
triggered the incomplete-snapshot write, so cancellations left the store
stuck at the in_progress snapshot written on response.created.

Factors the incomplete-snapshot build into a _persist_incomplete_if_needed()
helper called from both the ConnectionResetError and CancelledError
branches; the CancelledError handler re-raises so cooperative cancellation
semantics are preserved.

Adds two regression tests that drive _write_sse_responses directly (the
TestClient disconnect path races the server handler, which makes the
end-to-end assertion flaky).
@teknium1 teknium1 merged commit c7d62b3 into main Apr 24, 2026
9 of 11 checks passed
@teknium1 teknium1 deleted the hermes/hermes-1070584a branch April 24, 2026 22:22
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants