Bug Description
When using the OpenAI-compatible POST /v1/responses endpoint with:
{
"stream": true,
"store": true
}
the server currently only persists the response to ResponseStore after the streaming run completes successfully.
If the client disconnects mid-stream — for example app killed, network interruption, foreground/background transition, or SSE connection dropped — Hermes interrupts and cancels the agent task. The partially generated response is not stored, and subsequent:
GET /v1/responses/{response_id}
returns 404 Response not found.
This makes it impossible for clients to implement ChatGPT/Claude-like recovery after stream interruption, even though the client already received a response.created event with a valid response_id.
Steps to Reproduce
- Start a streaming response:
curl http://127.0.0.1:8642/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "hermes-agent",
"input": "Do something that takes a while or calls tools.",
"stream": true,
"store": true
}'
- Wait until the client receives:
event: response.created
data: {"type":"response.created","response":{"id":"resp_xxx", ...}}
- Optionally wait until some text deltas or tool events arrive:
response.output_text.delta
response.output_item.added
response.output_item.done
-
Disconnect the client before response.completed.
-
Try to retrieve the response:
curl http://127.0.0.1:8642/v1/responses/resp_xxx \
-H "Authorization: Bearer $API_KEY"
Expected Behavior
When store=true, after the server emits response.created, the response should be queryable via:
GET /v1/responses/{response_id}
At minimum, the store should contain a recoverable snapshot with a meaningful status:
{
"id": "resp_xxx",
"object": "response",
"status": "in_progress",
"output": []
}
During streaming, the stored response should be updated as output arrives:
response.output_text.delta
response.output_text.done
response.output_item.added
response.output_item.done
response.failed
response.completed
If the client disconnects before completion, the server should continue running in background.
For store=true, do not cancel the agent task on SSE client disconnect. Let the response continue to completion and persist the final response to ResponseStore.
This gives ChatGPT/Claude-like recovery:
- client disconnects
- server continues generation
- client relaunches
- client calls
GET /v1/responses/{response_id}
- final response is available once completed
Actual Behavior
GET /v1/responses/{response_id} returns:
{
"error": {
"message": "Response not found: resp_xxx",
"type": "invalid_request_error",
"param": null,
"code": null
}
}
The client cannot recover:
- partial assistant text
- completed tool calls
- tool outputs
- final assistant message
The response id previously emitted by response.created is therefore not useful for recovery unless the stream reaches response.completed.
Affected Component
Gateway (Telegram/Discord/Slack/WhatsApp)
Messaging Platform (if gateway-related)
No response
Debug Report
Report https://paste.rs/veGtu
agent.log https://paste.rs/3eRBA
gateway.log https://paste.rs/BJmsi
Operating System
ubuntu 26.04
Python Version
3.11.15
Hermes Version
0.11.0
Additional Logs / Traceback (optional)
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
For store=true, decouple the agent task from the SSE connection:
- SSE connection is only a live transport
- response execution continues independently
ResponseStore is the source of truth
GET /v1/responses/{response_id} can return:
queued
in_progress
completed
failed
incomplete
Client disconnect should not necessarily cancel response execution if the user requested store=true.
Are you willing to submit a PR for this?
Bug Description
When using the OpenAI-compatible
POST /v1/responsesendpoint with:{ "stream": true, "store": true }the server currently only persists the response to
ResponseStoreafter the streaming run completes successfully.If the client disconnects mid-stream — for example app killed, network interruption, foreground/background transition, or SSE connection dropped — Hermes interrupts and cancels the agent task. The partially generated response is not stored, and subsequent:
returns
404 Response not found.This makes it impossible for clients to implement ChatGPT/Claude-like recovery after stream interruption, even though the client already received a
response.createdevent with a validresponse_id.Steps to Reproduce
Disconnect the client before
response.completed.Try to retrieve the response:
curl http://127.0.0.1:8642/v1/responses/resp_xxx \ -H "Authorization: Bearer $API_KEY"Expected Behavior
When
store=true, after the server emitsresponse.created, the response should be queryable via:At minimum, the store should contain a recoverable snapshot with a meaningful status:
{ "id": "resp_xxx", "object": "response", "status": "in_progress", "output": [] }During streaming, the stored response should be updated as output arrives:
response.output_text.deltaresponse.output_text.doneresponse.output_item.addedresponse.output_item.doneresponse.failedresponse.completedIf the client disconnects before completion, the server should continue running in background.
For
store=true, do not cancel the agent task on SSE client disconnect. Let the response continue to completion and persist the final response toResponseStore.This gives ChatGPT/Claude-like recovery:
GET /v1/responses/{response_id}Actual Behavior
GET /v1/responses/{response_id}returns:{ "error": { "message": "Response not found: resp_xxx", "type": "invalid_request_error", "param": null, "code": null } }The client cannot recover:
The response id previously emitted by
response.createdis therefore not useful for recovery unless the stream reachesresponse.completed.Affected Component
Gateway (Telegram/Discord/Slack/WhatsApp)
Messaging Platform (if gateway-related)
No response
Debug Report
Operating System
ubuntu 26.04
Python Version
3.11.15
Hermes Version
0.11.0
Additional Logs / Traceback (optional)
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
For
store=true, decouple the agent task from the SSE connection:ResponseStoreis the source of truthGET /v1/responses/{response_id}can return:queuedin_progresscompletedfailedincompleteClient disconnect should not necessarily cancel response execution if the user requested
store=true.Are you willing to submit a PR for this?