fix(api-server): Implement background response run recovery#15492
fix(api-server): Implement background response run recovery#15492zhboner wants to merge 2 commits into
Conversation
|
Hi, just want to follow up on this PR. I know it is not a small one, so I totally understand if it's not easy to review quickly. I didn't break it down because all changes are made for one goal: the compelete background tasks running. Following are changes that this PR made and are delibrated in the rest of this comment:
1.
|
What does this PR do?
This PR implements true server-side background execution for streaming
POST /v1/responsesrequests.Previously, a
stream=trueResponses API request was tightly coupled to the SSE client connection. If the client disconnected, the agent execution could be interrupted and the response could not be reliably recovered. This PR decouples response execution from the SSE subscriber lifecycle so the server continues running the response in the background and allows clients to recover state throughGET /v1/responses/{response_id}.Current limitations
This PR does not attempt to make
streamandstorefully orthogonal across all/v1/responsesmodes.The issue addressed here is specific to
stream=true: durable streaming responses should continue running after SSE disconnect and should be recoverable by response ID. This PR therefore unifies the two streaming modes aroundResponseRun:stream=true + store=true: durable/background/recoverablestream=true + store=false: ephemeral/connection-owned, cancelled on disconnectThe existing
stream=falsesynchronous path is left unchanged.Making all four combinations fully orthogonal would require moving non-streaming execution onto
ResponseRunas well. That is a larger lifecycle refactor touching cancellation semantics, conversation active-state tracking, idempotency, error handling, tests, and client expectations. Since that is outside the scope of the reported background streaming issue, it is deferred to a future PR.Related Issue
Fixes #15026
Type of Change
Changes Made
ResponseRun/ResponseRunManagerto own background/v1/responsesexecution lifecycle./v1/responsesbehavior so SSE client disconnects only detach the subscriber and do not cancel the underlying agent run.GET /v1/responses/{response_id}.store=trueforstream=trueResponses API requests.stream=true+store=falsenow returns400 store_required.Idempotency-Keysupport.409 idempotency_key_conflict.response.snapshotSSE event for active-run reattach/idempotent retry.previous_response_idandconversationchaining so onlycompletedresponses are accepted.latest_completed_response_idactive_response_idconversations.response_idschema._write_sse_responses(...)path in favor ofResponseRun.How to Test
1. Run the targeted test suite
~/.hermes/hermes-agent/venv/bin/python -m pytest -o addopts= tests/gateway/test_api_server.py tests/gateway/test_sse_agent_cancel.py -qExpected result:
2. Verify
stream=true + store=falseworks as ephemeral streamingRun:
Expected result:
200response.createdresponse.completedCopy the
response.idfrom the stream, then run:Expected result:
404store=falseresponses are ephemeral and not recoverable3. Verify
stream=true + store=falserejectsIdempotency-KeyRun:
Expected result:
400"idempotency_requires_store"This confirms that idempotent streaming replay/reattach is only available for durable
store=truestreams.4. Verify
stream=true + store=truesurvives client disconnectStart a durable streaming response:
After receiving the first
response.createdevent, stop the client withCtrl+C.Expected server behavior:
Then recover the response:
Expected result:
status: "in_progress"5. Verify durable streaming idempotency retry
Start a durable streaming request with an idempotency key:
Retry the exact same request with the same
Idempotency-Key:Expected result:
response.id6. Verify idempotency conflict handling
Reuse the same idempotency key with a different request body:
Expected result:
409"idempotency_key_conflict"7. Verify conversation behavior for
store=falsestreamingFirst create a stored checkpoint in a conversation:
Then send an ephemeral streaming response in the same conversation:
Expected result:
latest_completed_response_idactive_response_idGET /v1/responses/{response_id}Then send another stored request in the same conversation:
Expected result:
8. Verify active conversation protection
Start a durable background stream in a conversation:
While it is still running, send another request to the same conversation:
Expected result:
409"conversation_response_not_completed"This confirms that conversation ordering is protected while a durable response is active.
Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/A