Skip to content

feat(agent): configurable timeouts for auxiliary LLM calls and context compression#3406

Closed
alanfwilliams wants to merge 1 commit into
NousResearch:mainfrom
alanfwilliams:feat/configurable-aux-timeouts
Closed

feat(agent): configurable timeouts for auxiliary LLM calls and context compression#3406
alanfwilliams wants to merge 1 commit into
NousResearch:mainfrom
alanfwilliams:feat/configurable-aux-timeouts

Conversation

@alanfwilliams

Copy link
Copy Markdown
Contributor

Summary

  • Add HERMES_AUX_TIMEOUT env var (default 90s) for auxiliary call_llm/async_call_llm and title generation (was hardcoded 30s/15s)
  • Add HERMES_COMPRESSION_TIMEOUT env var (default 120s) for context compression (was hardcoded 45s)
  • Follows the existing HERMES_API_TIMEOUT and HERMES_STREAM_STALE_TIMEOUT patterns

Motivation

On local setups (Ollama, oMLX, llama.cpp) running a single model for both main and auxiliary tasks, auxiliary requests queue behind the main generation. The 30s default fires before prefill completes, causing compression failures, title generation timeouts, and session search loops. Previously addressed for the main client (#1010) and vision (#2107) but not for auxiliary calls or compression.

Closes #3404

Test plan

  • Set HERMES_AUX_TIMEOUT=90 and HERMES_COMPRESSION_TIMEOUT=120, run Hermes with a local model, verify auxiliary calls no longer time out during main generation
  • Verify defaults work without env vars set (90s for aux, 120s for compression)
  • pytest tests/ -v passes

Files changed

  • agent/auxiliary_client.py — 3 timeout defaults read from HERMES_AUX_TIMEOUT
  • agent/context_compressor.py — compression timeout reads from HERMES_COMPRESSION_TIMEOUT
  • agent/title_generator.py — title timeout reads from HERMES_AUX_TIMEOUT

Tested on macOS M1 Max, oMLX with Qwen3.5-35B-A3B-4bit.

…t compression

The hardcoded 30s timeout in call_llm/async_call_llm and 45s timeout in
context_compressor are too short for local models where auxiliary requests
queue behind the main generation. This causes compression failures, title
generation timeouts, and session search loops on consumer hardware.

Add environment variable overrides following the existing HERMES_API_TIMEOUT
and HERMES_STREAM_STALE_TIMEOUT patterns:

- HERMES_AUX_TIMEOUT (default 90s): auxiliary_client.py call_llm,
  async_call_llm, _build_call_kwargs, and title_generator.py
- HERMES_COMPRESSION_TIMEOUT (default 120s): context_compressor.py

Closes NousResearch#3404
teknium1 pushed a commit that referenced this pull request Mar 28, 2026
….yaml

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.
teknium1 added a commit that referenced this pull request Mar 28, 2026
….yaml (#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #3597. Your idea was implemented using config.yaml settings (under auxiliary.{task}.timeout) instead of env vars, per project conventions. Authorship preserved. Thanks for the contribution @alanfwilliams!

@teknium1 teknium1 closed this Mar 28, 2026
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
….yaml (NousResearch#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR NousResearch#3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
….yaml (NousResearch#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR NousResearch#3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
….yaml (NousResearch#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR NousResearch#3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
….yaml (NousResearch#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR NousResearch#3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
….yaml (NousResearch#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR NousResearch#3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Configurable timeouts for auxiliary call_llm and context compression

2 participants