feat(agent): configurable timeouts for auxiliary LLM calls and context compression by alanfwilliams · Pull Request #3406 · NousResearch/hermes-agent

alanfwilliams · 2026-03-27T15:27:29Z

Summary

Add HERMES_AUX_TIMEOUT env var (default 90s) for auxiliary call_llm/async_call_llm and title generation (was hardcoded 30s/15s)
Add HERMES_COMPRESSION_TIMEOUT env var (default 120s) for context compression (was hardcoded 45s)
Follows the existing HERMES_API_TIMEOUT and HERMES_STREAM_STALE_TIMEOUT patterns

Motivation

On local setups (Ollama, oMLX, llama.cpp) running a single model for both main and auxiliary tasks, auxiliary requests queue behind the main generation. The 30s default fires before prefill completes, causing compression failures, title generation timeouts, and session search loops. Previously addressed for the main client (#1010) and vision (#2107) but not for auxiliary calls or compression.

Closes #3404

Test plan

Set HERMES_AUX_TIMEOUT=90 and HERMES_COMPRESSION_TIMEOUT=120, run Hermes with a local model, verify auxiliary calls no longer time out during main generation
Verify defaults work without env vars set (90s for aux, 120s for compression)
pytest tests/ -v passes

Files changed

agent/auxiliary_client.py — 3 timeout defaults read from HERMES_AUX_TIMEOUT
agent/context_compressor.py — compression timeout reads from HERMES_COMPRESSION_TIMEOUT
agent/title_generator.py — title timeout reads from HERMES_AUX_TIMEOUT

Tested on macOS M1 Max, oMLX with Qwen3.5-35B-A3B-4bit.

…t compression The hardcoded 30s timeout in call_llm/async_call_llm and 45s timeout in context_compressor are too short for local models where auxiliary requests queue behind the main generation. This causes compression failures, title generation timeouts, and session search loops on consumer hardware. Add environment variable overrides following the existing HERMES_API_TIMEOUT and HERMES_STREAM_STALE_TIMEOUT patterns: - HERMES_AUX_TIMEOUT (default 90s): auxiliary_client.py call_llm, async_call_llm, _build_call_kwargs, and title_generator.py - HERMES_COMPRESSION_TIMEOUT (default 120s): context_compressor.py Closes NousResearch#3404

….yaml Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml instead of hardcoded values. Users with slow local models (Ollama, llama.cpp) can now increase timeouts for compression, vision, session search, etc. Defaults: - auxiliary.compression.timeout: 120s (was hardcoded 45s) - auxiliary.vision.timeout: 30s (unchanged) - all other aux tasks: 30s (was hardcoded 30s) - title_generator: 30s (was hardcoded 15s) call_llm/async_call_llm now auto-resolve timeout from config when not explicitly passed. Callers can still override with an explicit timeout arg. Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml per project conventions.

….yaml (#3597) Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml instead of hardcoded values. Users with slow local models (Ollama, llama.cpp) can now increase timeouts for compression, vision, session search, etc. Defaults: - auxiliary.compression.timeout: 120s (was hardcoded 45s) - auxiliary.vision.timeout: 30s (unchanged) - all other aux tasks: 30s (was hardcoded 30s) - title_generator: 30s (was hardcoded 15s) call_llm/async_call_llm now auto-resolve timeout from config when not explicitly passed. Callers can still override with an explicit timeout arg. Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml per project conventions. Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>

teknium1 · 2026-03-28T21:35:37Z

Merged via PR #3597. Your idea was implemented using config.yaml settings (under auxiliary.{task}.timeout) instead of env vars, per project conventions. Authorship preserved. Thanks for the contribution @alanfwilliams!

….yaml (NousResearch#3597) Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml instead of hardcoded values. Users with slow local models (Ollama, llama.cpp) can now increase timeouts for compression, vision, session search, etc. Defaults: - auxiliary.compression.timeout: 120s (was hardcoded 45s) - auxiliary.vision.timeout: 30s (unchanged) - all other aux tasks: 30s (was hardcoded 30s) - title_generator: 30s (was hardcoded 15s) call_llm/async_call_llm now auto-resolve timeout from config when not explicitly passed. Callers can still override with an explicit timeout arg. Based on PR NousResearch#3406 by alanfwilliams. Converted from env vars to config.yaml per project conventions. Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>

teknium1 mentioned this pull request Mar 28, 2026

feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (salvage #3406) #3597

Merged

teknium1 closed this Mar 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): configurable timeouts for auxiliary LLM calls and context compression#3406

feat(agent): configurable timeouts for auxiliary LLM calls and context compression#3406
alanfwilliams wants to merge 1 commit into
NousResearch:mainfrom
alanfwilliams:feat/configurable-aux-timeouts

alanfwilliams commented Mar 27, 2026

Uh oh!

teknium1 commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alanfwilliams commented Mar 27, 2026

Summary

Motivation

Test plan

Files changed

Uh oh!

teknium1 commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants