Bug Description
Hermes reconnects after 180 seconds of no response from the provider, even though oMLX is still actively processing the request.
Steps to Reproduce
- Run Hermes on latest release
- Use oMLX 0.3.5 as a local provider
- Use
Qwen3.5-122B-A10B-8bit as main model
- Run a long-context session with substantial accumulated history
- Wait for a slow request / long prefill phase
- Hermes emits:
No response from provider for 180s ... Reconnecting...
- oMLX continues working and does not appear dead
Expected Behavior
Hermes should not reconnect or abandon the provider request if the provider is still actively processing a long-running request.
Ideally one of these should happen:
- The provider watchdog should be configurable
- Hermes should distinguish between “provider silent but still processing” vs “provider actually dead”
- Long local-provider workloads should be allowed to run longer without forced reconnects
Actual Behavior
Hermes reconnects after 180s of no response, even though oMLX is still processing normally.
This causes:
- abandoned long-running requests
- repeated reconnect / retry behavior
- sessions failing before reaching compression
- instability in long-context workflows
Affected Component
CLI (interactive chat), Agent Core (conversation loop, context compression, memory)
Messaging Platform (if gateway-related)
N/A (CLI only)
Operating System
macOS: 26.3.1
Python Version
3.14.3
Hermes Version
0.7.0
Relevant Logs / Traceback
oMLX:
Using boundary cache snapshot ...
Client disconnected, cancelling generation task
Prefill interrupted ...
Rescheduled 1 requests for re-prefill
Chat completion: 329 tokens in 250.72s
Chat completion: 891 tokens in 676.80s
Hermes message:
No response from provider for 180s (model: Qwen3.5-122B-A10B-8bit, context: ~32,834 tokens). Reconnecting...
### Root Cause Analysis (optional)
_No response_
### Proposed Fix (optional)
_No response_
### Are you willing to submit a PR for this?
- [ ] I'd like to fix this myself and submit a PR
Bug Description
Hermes reconnects after 180 seconds of no response from the provider, even though oMLX is still actively processing the request.
Steps to Reproduce
Qwen3.5-122B-A10B-8bitas main modelNo response from provider for 180s ... Reconnecting...Expected Behavior
Hermes should not reconnect or abandon the provider request if the provider is still actively processing a long-running request.
Ideally one of these should happen:
Actual Behavior
Hermes reconnects after 180s of no response, even though oMLX is still processing normally.
This causes:
Affected Component
CLI (interactive chat), Agent Core (conversation loop, context compression, memory)
Messaging Platform (if gateway-related)
N/A (CLI only)
Operating System
macOS: 26.3.1
Python Version
3.14.3
Hermes Version
0.7.0
Relevant Logs / Traceback
Hermes message: