[Bug]: Streaming /v1/responses cannot be recovered after client disconnect because partial/in-progress responses are not persisted

### Bug Description


When using the OpenAI-compatible `POST /v1/responses` endpoint with:

```json
{
  "stream": true,
  "store": true
}
```

the server currently only persists the response to `ResponseStore` after the streaming run completes successfully.

If the client disconnects mid-stream — for example app killed, network interruption, foreground/background transition, or SSE connection dropped — Hermes interrupts and cancels the agent task. The partially generated response is not stored, and subsequent:

```http
GET /v1/responses/{response_id}
```

returns `404 Response not found`.

This makes it impossible for clients to implement ChatGPT/Claude-like recovery after stream interruption, even though the client already received a `response.created` event with a valid `response_id`.


### Steps to Reproduce

1. Start a streaming response:

```bash
curl http://127.0.0.1:8642/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "hermes-agent",
    "input": "Do something that takes a while or calls tools.",
    "stream": true,
    "store": true
  }'
```

2. Wait until the client receives:

```text
event: response.created
data: {"type":"response.created","response":{"id":"resp_xxx", ...}}
```

3. Optionally wait until some text deltas or tool events arrive:

```text
response.output_text.delta
response.output_item.added
response.output_item.done
```

4. Disconnect the client before `response.completed`.

5. Try to retrieve the response:

```bash
curl http://127.0.0.1:8642/v1/responses/resp_xxx \
  -H "Authorization: Bearer $API_KEY"
```

### Expected Behavior


When `store=true`, after the server emits `response.created`, the response should be queryable via:

```http
GET /v1/responses/{response_id}
```

At minimum, the store should contain a recoverable snapshot with a meaningful status:

```json
{
  "id": "resp_xxx",
  "object": "response",
  "status": "in_progress",
  "output": []
}
```

During streaming, the stored response should be updated as output arrives:

- `response.output_text.delta`
- `response.output_text.done`
- `response.output_item.added`
- `response.output_item.done`
- `response.failed`
- `response.completed`

If the client disconnects before completion, the server should continue running in background. 

For `store=true`, do not cancel the agent task on SSE client disconnect. Let the response continue to completion and persist the final response to `ResponseStore`.

This gives ChatGPT/Claude-like recovery:

1. client disconnects
2. server continues generation
3. client relaunches
4. client calls `GET /v1/responses/{response_id}`
5. final response is available once completed

### Actual Behavior


`GET /v1/responses/{response_id}` returns:

```json
{
  "error": {
    "message": "Response not found: resp_xxx",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}
```

The client cannot recover:

- partial assistant text
- completed tool calls
- tool outputs
- final assistant message

The response id previously emitted by `response.created` is therefore not useful for recovery unless the stream reaches `response.completed`.


### Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp)

### Messaging Platform (if gateway-related)

_No response_

### Debug Report

```shell
Report       https://paste.rs/veGtu
  agent.log    https://paste.rs/3eRBA
  gateway.log  https://paste.rs/BJmsi
```

### Operating System

ubuntu 26.04

### Python Version

3.11.15

### Hermes Version

0.11.0

### Additional Logs / Traceback (optional)

```shell

```

### Root Cause Analysis (optional)

_No response_

### Proposed Fix (optional)

For `store=true`, decouple the agent task from the SSE connection:

- SSE connection is only a live transport
- response execution continues independently
- `ResponseStore` is the source of truth
- `GET /v1/responses/{response_id}` can return:
  - `queued`
  - `in_progress`
  - `completed`
  - `failed`
  - `incomplete`

Client disconnect should not necessarily cancel response execution if the user requested `store=true`.


### Are you willing to submit a PR for this?

- [x] I'd like to fix this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Streaming /v1/responses cannot be recovered after client disconnect because partial/in-progress responses are not persisted #15026

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: Streaming /v1/responses cannot be recovered after client disconnect because partial/in-progress responses are not persisted #15026

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions