[Bug]: auxiliary_client (Vision auto-detect / title generation) hangs indefinitely with no timeout when provider stalls; CLI input blocks

### Summary

When the main provider experiences a transient `ReadTimeout`, the main turn reconnects cleanly (`⚠️ Connection to provider dropped (ReadTimeout). Reconnecting… 🔄 Reconnected — resuming…`), **but a concurrent `agent.auxiliary_client` call (e.g. Vision auto-detect or title generation) never completes and never times out**. The CLI timer keeps counting, the input area remains locked, and the only recovery is `kill -TERM` on the agent process. No error is written to logs — the last line is the auxiliary client's "using main provider" INFO log, followed by complete silence.

This appears distinct from #13617 (approval prompt — closed), #8340 (terminal tool setsid+disown hang — closed), and #13033 (Linux paste freeze — open).

### Environment

- OS: macOS (Apple Silicon)
- Hermes: installed via pipx / user-local (Python 3.11)
- Provider: Anthropic (`claude-opus-4-6`)
- MCP servers loaded: `github`, `atlassian`, one internal stdio MCP server, plus code-execution
- Session in progress for ~38 min, ~328K/1M context used at time of hang

### Reproduction (observed in the wild, not yet minimized)

1. Start a long-running session with multiple MCP tool calls interspersed with model turns.
2. At some point the main provider emits a `ReadTimeout`; the UI shows `Reconnecting… (attempt 2/3)` then `Reconnected — resuming…`.
3. After the reconnect, the next auxiliary call fires — in my case `Vision auto-detect` (also reproduces with `title_generation` per `auxiliary_client` logs).
4. `agent.log` shows the "using main provider" INFO line for the auxiliary call, then **nothing** for 25+ minutes.
5. The CLI `⏲` timer continues to count, the `❯` prompt is rendered but accepts no keyboard input (Ctrl+C, Ctrl+D, Enter all ignored).
6. `ps` shows the agent process alive at ~0% CPU in `S` state — blocked on an await.

### Last log evidence

`~/.hermes/logs/agent.log` (final lines before 25+ min of silence):

```
2026-XX-XX 19:32:31,460 INFO agent.auxiliary_client: Auxiliary auto-detect: using main provider anthropic (claude-opus-4-6)
2026-XX-XX 19:32:31,460 INFO agent.auxiliary_client: Auxiliary title_generation: using auto (claude-opus-4-6) at https://api.anthropic.com
2026-XX-XX 19:39:41,513 INFO agent.auxiliary_client: Vision auto-detect: using main provider anthropic (claude-opus-4-6)
<nothing for 25+ minutes>
```

No entry in `errors.log` at the hang timestamp. No broken-pipe or exception trace tied to the hang (there was an unrelated `BrokenPipeError` earlier in `mcp-stderr.log`, but it preceded the hang by hours and shouldn't be the cause).

### Suspected root cause

`agent/auxiliary_client.py` (or equivalent) appears to issue the HTTP request to the auxiliary provider **without a `timeout=` kwarg on the httpx / anthropic SDK call**, and without wrapping the call in `asyncio.wait_for(...)` or a circuit breaker. When the underlying connection stalls (possibly because the socket pool is in a half-dead state post-reconnect of the main client), the coroutine awaits forever. There is no retry-with-timeout path on this branch, unlike the main turn's `reconnect` logic.

Related candidate: the auxiliary client may be reusing the HTTPX client / pool that the main provider just reconnected, inheriting a stale connection that then silently hangs on the first auxiliary request.

### Expected behavior

1. All auxiliary calls (`Vision auto-detect`, `title_generation`, etc.) should have an explicit request timeout (e.g. 30s) applied via the SDK `timeout=` parameter **or** `asyncio.wait_for`.
2. On timeout, the auxiliary call should:
   - log `WARNING` with the auxiliary kind (vision / title / etc.) and the elapsed time,
   - fail open (e.g. skip vision hint, use default title) rather than blocking the main turn,
   - not propagate into the CLI input lock.
3. Main turn should never be gated on auxiliary completion. If the current architecture makes them coupled, add a deadline to the wrapping task group.

### Workaround (for other users hitting this)

None at runtime — once hung, the CLI cannot be recovered without `kill -TERM <pid>` on the agent process. Before restarting, back up `~/.hermes/sessions/session_<id>.json` and, if using `--resume`, archive / remove the stuck session file first to avoid a resume loop (see #7536).

### Additional notes

- Found a related-but-separate issue worth fixing: `tools.checkpoint_manager` fails repeatedly on `git add -A` with `index.lock: File exists` after a prior kill, and never self-heals. Resulted in ~6 hours of silently lost checkpoints on my end. Happy to open a separate issue for that.
- Would be useful to emit a `WARNING` when an auxiliary call exceeds some threshold even if not cancelled, so users can diagnose stalls without reading source.

### What I can provide on request

- Redacted `agent.log` window around the hang
- Process state snapshot (`ps`, `py-spy dump` if needed next time I reproduce)
- Session JSON structure (without conversation content)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: auxiliary_client (Vision auto-detect / title generation) hangs indefinitely with no timeout when provider stalls; CLI input blocks #15194

Summary

Environment

Reproduction (observed in the wild, not yet minimized)

Last log evidence

Suspected root cause

Expected behavior

Workaround (for other users hitting this)

Additional notes

What I can provide on request

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: auxiliary_client (Vision auto-detect / title generation) hangs indefinitely with no timeout when provider stalls; CLI input blocks #15194

Description

Summary

Environment

Reproduction (observed in the wild, not yet minimized)

Last log evidence

Suspected root cause

Expected behavior

Workaround (for other users hitting this)

Additional notes

What I can provide on request

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions