[Bug]: Local claude-cli custom provider timeout is reported as Empty response and fallback loops

## Bug Description

When Hermes routes a selected `claude-cli` model through an OpenAI-compatible local shim/custom provider, long Claude CLI turns that exceed the shim's internal 120s per-turn timeout surface to the user as `Empty response from model` and trigger retry/fallback behavior. In the observed setup, fallback can select the same `claude-cli` path again, so retries loop against the same timeout surface rather than recovering.

This is not the same class as intentional thinking-only or group-silence empty responses. The underlying provider process is producing/continuing work, but the local OpenAI-compatible shim times out first.

## Environment

- Hermes model picker entry: `claude-cli` / `claude-opus-4-7`
- Routing path: `custom_providers.claude-cli` -> OpenAI-compatible endpoint -> local shim at `http://127.0.0.1:7891/v1`
- Shim process: persistent Claude CLI subprocess
- Claude CLI args observed in live child process include:
  - `-p`
  - `--output-format stream-json`
  - `--input-format stream-json`
  - `--include-partial-messages`
  - `--verbose`
  - `--permission-mode dontAsk`
  - `--model claude-opus-4-7`
  - `--resume <session-id>`

## Local evidence from the shim implementation

In the local shim, the hard timeout is fixed:

```python
SUBPROCESS_TIMEOUT = 120
PROCESS_IDLE_TIMEOUT = 1800
...
content, usage, finish_reason, tool_calls = self._read_response(SUBPROCESS_TIMEOUT)
...
raise RuntimeError("claude CLI turn timed out")
```

Idle eviction after 1800 seconds forces cold/resumed spawns, which are more likely to exceed the 120s per-turn budget when context is heavy or MCP/tool state reconnects.

## Observed Log Pattern

The shim log shows 120s/127s timeout failures followed immediately by successful turns, which indicates the model/CLI can continue but the wrapper budget is too short:

```jsonl
{"event":"spawn","model":"claude-opus-4-7","resume":true,"session_id":"38869d6c-...","has_system_prompt":true}
{"request_id":"1dcdfd75-...","model":"claude-opus-4-7","latency_s":127.411,"status":"error","error_class":"RuntimeError","error":"claude CLI turn timed out"}
{"request_id":"170feabf-...","model":"claude-opus-4-7","latency_s":3.515,"status":"ok","prompt_tokens":6,"completion_tokens":46,"has_tool_calls":false}
...
{"event":"idle_evict","session_id":"38869d6c-...","idle_s":1800}
{"event":"spawn","model":"claude-opus-4-7","resume":true,"session_id":"38869d6c-...","has_system_prompt":true}
{"request_id":"e5743941-...","model":"claude-opus-4-7","latency_s":120.014,"status":"error","error_class":"RuntimeError","error":"claude CLI turn timed out"}
{"request_id":"2d5bfaed-...","model":"claude-opus-4-7","latency_s":81.105,"status":"ok","prompt_tokens":14,"completion_tokens":2951,"has_tool_calls":true}
{"request_id":"8d56ffbf-...","model":"claude-opus-4-7","latency_s":52.101,"status":"ok","prompt_tokens":9,"completion_tokens":2916,"has_tool_calls":true}
```

## Steps to Reproduce

1. Configure a local OpenAI-compatible custom provider that wraps Claude CLI with a 120s per-turn timeout.
2. Select that provider/model via Hermes model picker.
3. Let the shim idle long enough to evict the warm child process, or use a heavy resumed context/tool-call turn.
4. Send a message that causes the Claude CLI turn to take longer than 120 seconds.
5. Observe Hermes reporting `Empty response from model` / retrying, even though the underlying failure is a provider timeout.

## Expected Behavior

Hermes should classify this as a provider timeout/failure, not as a model empty-content response. It should either:

- surface a clear provider timeout error with the provider/model name and elapsed timeout,
- allow custom providers to advertise/request longer per-request timeout budgets,
- avoid switching fallback to the same provider/model path that just timed out,
- optionally mark first turn after spawn/resume as eligible for a longer timeout budget.

## Actual Behavior

Hermes wraps the local provider timeout as an empty response, retries several times, and can fall back to the same `claude-cli` provider path. This looks like Claude produced an empty answer, but the real error is `RuntimeError: claude CLI turn timed out` from the shim.

## Proposed Fixes

1. Preserve custom-provider timeout/errors as timeout/error classes rather than normalizing them into empty-content retries.
2. Add/configure per-custom-provider request timeout metadata, for example `timeout_s` or `request_timeout_ms`, and thread it into the provider call path.
3. Detect fallback self-selection: if current provider/model and fallback provider/model resolve to the same endpoint/model, skip or pick a different fallback.
4. Consider first-turn-after-spawn / resumed-session timeout budgets separately from warm-turn budgets.
5. Improve logs/user-visible errors so `Empty response from model` is reserved for genuinely empty model output, not transport/provider timeout.

## Related

- Different from #13248, which covers intentional empty responses in group-chat/slack addressing semantics.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Local claude-cli custom provider timeout is reported as Empty response and fallback loops #22548

Bug Description

Environment

Local evidence from the shim implementation

Observed Log Pattern

Steps to Reproduce

Expected Behavior

Actual Behavior

Proposed Fixes

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: Local claude-cli custom provider timeout is reported as Empty response and fallback loops #22548

Description

Bug Description

Environment

Local evidence from the shim implementation

Observed Log Pattern

Steps to Reproduce

Expected Behavior

Actual Behavior

Proposed Fixes

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions