Bug Description
When Hermes routes a selected claude-cli model through an OpenAI-compatible local shim/custom provider, long Claude CLI turns that exceed the shim's internal 120s per-turn timeout surface to the user as Empty response from model and trigger retry/fallback behavior. In the observed setup, fallback can select the same claude-cli path again, so retries loop against the same timeout surface rather than recovering.
This is not the same class as intentional thinking-only or group-silence empty responses. The underlying provider process is producing/continuing work, but the local OpenAI-compatible shim times out first.
Environment
- Hermes model picker entry:
claude-cli / claude-opus-4-7
- Routing path:
custom_providers.claude-cli -> OpenAI-compatible endpoint -> local shim at http://127.0.0.1:7891/v1
- Shim process: persistent Claude CLI subprocess
- Claude CLI args observed in live child process include:
-p
--output-format stream-json
--input-format stream-json
--include-partial-messages
--verbose
--permission-mode dontAsk
--model claude-opus-4-7
--resume <session-id>
Local evidence from the shim implementation
In the local shim, the hard timeout is fixed:
SUBPROCESS_TIMEOUT = 120
PROCESS_IDLE_TIMEOUT = 1800
...
content, usage, finish_reason, tool_calls = self._read_response(SUBPROCESS_TIMEOUT)
...
raise RuntimeError("claude CLI turn timed out")
Idle eviction after 1800 seconds forces cold/resumed spawns, which are more likely to exceed the 120s per-turn budget when context is heavy or MCP/tool state reconnects.
Observed Log Pattern
The shim log shows 120s/127s timeout failures followed immediately by successful turns, which indicates the model/CLI can continue but the wrapper budget is too short:
{"event":"spawn","model":"claude-opus-4-7","resume":true,"session_id":"38869d6c-...","has_system_prompt":true}
{"request_id":"1dcdfd75-...","model":"claude-opus-4-7","latency_s":127.411,"status":"error","error_class":"RuntimeError","error":"claude CLI turn timed out"}
{"request_id":"170feabf-...","model":"claude-opus-4-7","latency_s":3.515,"status":"ok","prompt_tokens":6,"completion_tokens":46,"has_tool_calls":false}
...
{"event":"idle_evict","session_id":"38869d6c-...","idle_s":1800}
{"event":"spawn","model":"claude-opus-4-7","resume":true,"session_id":"38869d6c-...","has_system_prompt":true}
{"request_id":"e5743941-...","model":"claude-opus-4-7","latency_s":120.014,"status":"error","error_class":"RuntimeError","error":"claude CLI turn timed out"}
{"request_id":"2d5bfaed-...","model":"claude-opus-4-7","latency_s":81.105,"status":"ok","prompt_tokens":14,"completion_tokens":2951,"has_tool_calls":true}
{"request_id":"8d56ffbf-...","model":"claude-opus-4-7","latency_s":52.101,"status":"ok","prompt_tokens":9,"completion_tokens":2916,"has_tool_calls":true}
Steps to Reproduce
- Configure a local OpenAI-compatible custom provider that wraps Claude CLI with a 120s per-turn timeout.
- Select that provider/model via Hermes model picker.
- Let the shim idle long enough to evict the warm child process, or use a heavy resumed context/tool-call turn.
- Send a message that causes the Claude CLI turn to take longer than 120 seconds.
- Observe Hermes reporting
Empty response from model / retrying, even though the underlying failure is a provider timeout.
Expected Behavior
Hermes should classify this as a provider timeout/failure, not as a model empty-content response. It should either:
- surface a clear provider timeout error with the provider/model name and elapsed timeout,
- allow custom providers to advertise/request longer per-request timeout budgets,
- avoid switching fallback to the same provider/model path that just timed out,
- optionally mark first turn after spawn/resume as eligible for a longer timeout budget.
Actual Behavior
Hermes wraps the local provider timeout as an empty response, retries several times, and can fall back to the same claude-cli provider path. This looks like Claude produced an empty answer, but the real error is RuntimeError: claude CLI turn timed out from the shim.
Proposed Fixes
- Preserve custom-provider timeout/errors as timeout/error classes rather than normalizing them into empty-content retries.
- Add/configure per-custom-provider request timeout metadata, for example
timeout_s or request_timeout_ms, and thread it into the provider call path.
- Detect fallback self-selection: if current provider/model and fallback provider/model resolve to the same endpoint/model, skip or pick a different fallback.
- Consider first-turn-after-spawn / resumed-session timeout budgets separately from warm-turn budgets.
- Improve logs/user-visible errors so
Empty response from model is reserved for genuinely empty model output, not transport/provider timeout.
Related
Bug Description
When Hermes routes a selected
claude-climodel through an OpenAI-compatible local shim/custom provider, long Claude CLI turns that exceed the shim's internal 120s per-turn timeout surface to the user asEmpty response from modeland trigger retry/fallback behavior. In the observed setup, fallback can select the sameclaude-clipath again, so retries loop against the same timeout surface rather than recovering.This is not the same class as intentional thinking-only or group-silence empty responses. The underlying provider process is producing/continuing work, but the local OpenAI-compatible shim times out first.
Environment
claude-cli/claude-opus-4-7custom_providers.claude-cli-> OpenAI-compatible endpoint -> local shim athttp://127.0.0.1:7891/v1-p--output-format stream-json--input-format stream-json--include-partial-messages--verbose--permission-mode dontAsk--model claude-opus-4-7--resume <session-id>Local evidence from the shim implementation
In the local shim, the hard timeout is fixed:
Idle eviction after 1800 seconds forces cold/resumed spawns, which are more likely to exceed the 120s per-turn budget when context is heavy or MCP/tool state reconnects.
Observed Log Pattern
The shim log shows 120s/127s timeout failures followed immediately by successful turns, which indicates the model/CLI can continue but the wrapper budget is too short:
{"event":"spawn","model":"claude-opus-4-7","resume":true,"session_id":"38869d6c-...","has_system_prompt":true} {"request_id":"1dcdfd75-...","model":"claude-opus-4-7","latency_s":127.411,"status":"error","error_class":"RuntimeError","error":"claude CLI turn timed out"} {"request_id":"170feabf-...","model":"claude-opus-4-7","latency_s":3.515,"status":"ok","prompt_tokens":6,"completion_tokens":46,"has_tool_calls":false} ... {"event":"idle_evict","session_id":"38869d6c-...","idle_s":1800} {"event":"spawn","model":"claude-opus-4-7","resume":true,"session_id":"38869d6c-...","has_system_prompt":true} {"request_id":"e5743941-...","model":"claude-opus-4-7","latency_s":120.014,"status":"error","error_class":"RuntimeError","error":"claude CLI turn timed out"} {"request_id":"2d5bfaed-...","model":"claude-opus-4-7","latency_s":81.105,"status":"ok","prompt_tokens":14,"completion_tokens":2951,"has_tool_calls":true} {"request_id":"8d56ffbf-...","model":"claude-opus-4-7","latency_s":52.101,"status":"ok","prompt_tokens":9,"completion_tokens":2916,"has_tool_calls":true}Steps to Reproduce
Empty response from model/ retrying, even though the underlying failure is a provider timeout.Expected Behavior
Hermes should classify this as a provider timeout/failure, not as a model empty-content response. It should either:
Actual Behavior
Hermes wraps the local provider timeout as an empty response, retries several times, and can fall back to the same
claude-cliprovider path. This looks like Claude produced an empty answer, but the real error isRuntimeError: claude CLI turn timed outfrom the shim.Proposed Fixes
timeout_sorrequest_timeout_ms, and thread it into the provider call path.Empty response from modelis reserved for genuinely empty model output, not transport/provider timeout.Related