fix(agent): disable stale stream timeout for local providers by teknium1 · Pull Request #6368 · NousResearch/hermes-agent

teknium1 · 2026-04-09T02:07:53Z

Summary

Local inference providers (Ollama, oMLX, llama-cpp) can take 300+ seconds for prefill on large contexts. The 180s stale stream detector was killing these connections while the provider was still actively processing, causing spurious reconnects and abandoned requests.

Inspired by PR #6123 by @Archerouyang (who identified the issue in #5889). This implementation uses the existing is_local_endpoint() from agent/model_metadata.py instead of creating a new detection method — it does proper URL parsing with localhost, RFC-1918, IPv6, and WSL support without the false positives from substring matching.

Changes

run_agent.py — 1 file, +18/-11 lines

Before the existing token-scaled stale timeout logic, check if:

The timeout is at the default 180s (user hasn't explicitly set HERMES_STREAM_STALE_TIMEOUT)
A base_url is set (not the SDK default)
is_local_endpoint() identifies it as local

If all three, disable the stale detector (float('inf')). Otherwise, fall through to the existing token-scaled logic unchanged.

Behavior matrix

Scenario	Stale timeout
Ollama on localhost, default config	Disabled (inf)
Ollama on localhost, explicit `HERMES_STREAM_STALE_TIMEOUT=300`	300s (user setting honored)
LAN server (192.168.x.x)	Disabled (inf)
OpenRouter / cloud provider	180s (with token scaling)
Cloud Ollama proxy (api.ollama.com)	180s (correctly NOT detected as local)
Empty/None URL (SDK default)	180s (correctly NOT detected as local)

Test plan

671 run_agent tests pass, 5 skipped
E2E verified: is_local_endpoint() correctly identifies all local patterns and rejects cloud URLs
py_compile clean

Fixes #5889

Local inference providers (Ollama, oMLX, llama-cpp) can take 300+ seconds for prefill on large contexts. The 180s stale stream detector was killing these connections while the provider was still processing. Uses the existing is_local_endpoint() (proper URL parsing with RFC-1918, localhost, WSL detection) instead of ad-hoc substring matching. The stale timeout is only disabled when the user hasn't explicitly set HERMES_STREAM_STALE_TIMEOUT — explicit user config is always honored. Fixes #5889

…earch#6368) Local inference providers (Ollama, oMLX, llama-cpp) can take 300+ seconds for prefill on large contexts. The 180s stale stream detector was killing these connections while the provider was still processing. Uses the existing is_local_endpoint() (proper URL parsing with RFC-1918, localhost, WSL detection) instead of ad-hoc substring matching. The stale timeout is only disabled when the user hasn't explicitly set HERMES_STREAM_STALE_TIMEOUT — explicit user config is always honored. Fixes NousResearch#5889

) Extends is_local_endpoint() to detect local LLM proxies accessed via container DNS names (e.g. hermes-litellm, ollama), fixing the stale stream timeout (180s) firing on local providers during prefill. Three additions: 1. Unqualified hostnames (no dots) → always local. Docker/Podman DNS, mDNS, and /etc/hosts entries are always on the local network. 2. DNS resolution fallback — resolve hostname to IP with socket.gethostbyname(), check if the resolved address is private. 3. Configurable model.local_endpoints in config.yaml — explicit list of hostnames to treat as local for edge cases where DNS resolution isn't available. Fixes NousResearch#7905 Related: NousResearch#7069, NousResearch#6368 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…earch#6368) Local inference providers (Ollama, oMLX, llama-cpp) can take 300+ seconds for prefill on large contexts. The 180s stale stream detector was killing these connections while the provider was still processing. Uses the existing is_local_endpoint() (proper URL parsing with RFC-1918, localhost, WSL detection) instead of ad-hoc substring matching. The stale timeout is only disabled when the user hasn't explicitly set HERMES_STREAM_STALE_TIMEOUT — explicit user config is always honored. Fixes NousResearch#5889

teknium1 merged commit ae4a884 into main Apr 9, 2026
5 of 6 checks passed

teknium1 mentioned this pull request Apr 9, 2026

fix(run_agent): disable stale timeout for local providers (#5889) #6123

Closed

3 tasks

kristianvast mentioned this pull request Apr 9, 2026

fix(agent,gateway): voice interrupts + cascading interrupt hang #6600

Closed

malaiwah mentioned this pull request Apr 11, 2026

is_local_endpoint misses Docker/Podman DNS names — stale timeout fires on local LLM proxies #7905

Closed

malaiwah mentioned this pull request Apr 11, 2026

fix: is_local_endpoint misses Docker/Podman DNS names #7906

Closed

8 tasks

tomqiaozc mentioned this pull request Apr 12, 2026

fix(agent): detect Docker/Podman DNS names in is_local_endpoint #8021

Closed

7 tasks

shagghiesuperstar mentioned this pull request Apr 30, 2026

fix(agent): suppress intermediate retry status messages in chat thread #18064

Closed

mehmetallar mentioned this pull request May 5, 2026

[Bug]: using DNS for local provider is not addressed in recent fix #20346

Open

1 task

Urfread mentioned this pull request May 30, 2026

[Reliability] Repeated tool-call errors (path/retry patterns) inflate latency; need stronger fail-fast + dedupe #35114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): disable stale stream timeout for local providers#6368

fix(agent): disable stale stream timeout for local providers#6368
teknium1 merged 1 commit into
mainfrom
hermes/hermes-b0a4b31e

teknium1 commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented Apr 9, 2026

Summary

Changes

Behavior matrix

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant