Skip to content

fix(run_agent): disable stale timeout for local providers (#5889)#6123

Closed
Archerouyang wants to merge 4 commits into
NousResearch:mainfrom
Archerouyang:fix/5889-local-provider-timeout
Closed

fix(run_agent): disable stale timeout for local providers (#5889)#6123
Archerouyang wants to merge 4 commits into
NousResearch:mainfrom
Archerouyang:fix/5889-local-provider-timeout

Conversation

@Archerouyang

Copy link
Copy Markdown
Contributor

Fix #5889: Local Provider Timeout Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Fix 180s timeout for local providers (oMLX, Ollama, etc.) so long-running local inference isn't killed prematurely

Architecture: Detect local providers by URL pattern (localhost/127.0.0.1) and disable/extend stale stream timeout for them. This is a minimal, non-breaking change that respects the existing timeout mechanism for cloud providers while allowing local models to run uninterrupted.

Tech Stack: Python, existing Hermes agent streaming infrastructure


Problem Analysis

Current behavior in run_agent.py:4705:

  • HERMES_STREAM_STALE_TIMEOUT defaults to 180s
  • Dynamic scaling only based on token count (50k/100k thresholds)
  • No distinction between cloud API and local inference

For local providers (oMLX, Ollama, llama-cpp):

  • Prefill can legitimately take 300s+ for large contexts
  • 180s timeout causes false-positive "stale stream" detection
  • Results in abandoned requests and wasted compute

Solution Design

Detect local providers and adjust stale timeout:

  1. Local provider detection: Check if base_url contains localhost, 127.0.0.1, or is empty (default local)
  2. Timeout adjustment: Set _stream_stale_timeout = float('inf') or very large value for local providers
  3. Configurability: Respect HERMES_STREAM_STALE_TIMEOUT if explicitly set

Task 1: Create Helper Function for Local Provider Detection

Files:

  • Modify: run_agent.py (find _stream_stale_timeout calculation section)

  • Step 1: Add local provider detection function

def _is_local_provider(self) -> bool:
    """Detect if provider is local (oMLX, Ollama, etc.) vs cloud API.
    
    Local providers may have long prefill times that shouldn't trigger
    stale stream detection.
    """
    base_url = str(self.base_url or "").lower()
    # Local providers typically use localhost/127.0.0.1 or no URL
    local_patterns = [
        "localhost",
        "127.0.0.1",
        "0.0.0.0",
        "/tmp/",  # Unix sockets
        "ollama",  # Common local setups
    ]
    return any(p in base_url for p in local_patterns) or not base_url
  • Step 2: Commit the helper function
git add run_agent.py
git commit -m "feat(run_agent): add _is_local_provider() helper function

Add method to detect local inference providers (oMLX, Ollama, etc.)
for special timeout handling."

Task 2: Modify Stale Timeout Logic for Local Providers

Files:

  • Modify: run_agent.py:4705-4718 (stale timeout calculation)

  • Step 3: Add local provider timeout override

Find this section (around line 4705):

_stream_stale_timeout_base = float(os.getenv("HERMES_STREAM_STALE_TIMEOUT", 180.0))
# Scale the stale timeout for large contexts: slow models (like Opus)
# can legitimately think for minutes before producing the first token
# when the context is large.  Without this, the stale detector kills
# healthy connections during the model's thinking phase, producing
# spurious RemoteProtocolError ("peer closed connection").

Replace with:

_stream_stale_timeout_base = float(os.getenv("HERMES_STREAM_STALE_TIMEOUT", 180.0))
# Scale the stale timeout for large contexts: slow models (like Opus)
# can legitimately think for minutes before producing the first token
# when the context is large.  Without this, the stale detector kills
# healthy connections during the model's thinking phase, producing
# spurious RemoteProtocolError ("peer closed connection").

# Local providers (oMLX, Ollama, etc.) may take much longer for prefill
# without being "stale". Disable timeout for local providers unless
# explicitly configured via HERMES_STREAM_STALE_TIMEOUT.

欧阳 added 4 commits April 8, 2026 17:40
…ct CAMOFOX_PROFILE_DIR docs

- Add missing import for get_hermes_home in hindsight plugin
- Remove incorrect CAMOFOX_PROFILE_DIR documentation (not a real Camofox env var)

Fixes NousResearch#6098, NousResearch#6087
Remove skill file uploads from Daytona and Modal environments.
Skills are loaded on the host side via skill_view(), build_skills_system_prompt(),
and _load_skill_payload() - the synced files were never read inside sandboxes.

Impact:
- Daytona: saves ~275 seconds per session start (445 files × 2 SDK calls)
- Modal: reduces sandbox creation overhead significantly

Fixes NousResearch#6035
Replace Python 3.10+ union syntax (X | Y) with Optional[X] for
core module that may be imported in various environments.
…ch#5889)

Local providers like oMLX and Ollama may have legitimately long
prefill times (300s+ for large contexts). Disable the 180s stale
stream timeout for detected local providers.

- Add _is_local_provider() to detect localhost/127.0.0.1/ollama URLs
- Skip stale detection when timeout is infinity
- Respect HERMES_STREAM_STALE_TIMEOUT if explicitly set

Fixes NousResearch#5889
@teknium1

teknium1 commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Closed in favor of PR #6368, which fixes the same issue (#5889) using the existing is_local_endpoint() from agent/model_metadata.py — proper URL parsing with RFC-1918/localhost/WSL detection, no false positives from substring matching. Thanks for identifying the problem, @Archerouyang!

@teknium1 teknium1 closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]:Hermes reconnects after 180s of provider silence even though oMLX is still processing

2 participants