feat(agent): configurable timeouts for auxiliary LLM calls and context compression#3406
Closed
alanfwilliams wants to merge 1 commit into
Closed
feat(agent): configurable timeouts for auxiliary LLM calls and context compression#3406alanfwilliams wants to merge 1 commit into
alanfwilliams wants to merge 1 commit into
Conversation
…t compression The hardcoded 30s timeout in call_llm/async_call_llm and 45s timeout in context_compressor are too short for local models where auxiliary requests queue behind the main generation. This causes compression failures, title generation timeouts, and session search loops on consumer hardware. Add environment variable overrides following the existing HERMES_API_TIMEOUT and HERMES_STREAM_STALE_TIMEOUT patterns: - HERMES_AUX_TIMEOUT (default 90s): auxiliary_client.py call_llm, async_call_llm, _build_call_kwargs, and title_generator.py - HERMES_COMPRESSION_TIMEOUT (default 120s): context_compressor.py Closes NousResearch#3404
teknium1
pushed a commit
that referenced
this pull request
Mar 28, 2026
….yaml
Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.
Defaults:
- auxiliary.compression.timeout: 120s (was hardcoded 45s)
- auxiliary.vision.timeout: 30s (unchanged)
- all other aux tasks: 30s (was hardcoded 30s)
- title_generator: 30s (was hardcoded 15s)
call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.
Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.
teknium1
added a commit
that referenced
this pull request
Mar 28, 2026
….yaml (#3597) Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml instead of hardcoded values. Users with slow local models (Ollama, llama.cpp) can now increase timeouts for compression, vision, session search, etc. Defaults: - auxiliary.compression.timeout: 120s (was hardcoded 45s) - auxiliary.vision.timeout: 30s (unchanged) - all other aux tasks: 30s (was hardcoded 30s) - title_generator: 30s (was hardcoded 15s) call_llm/async_call_llm now auto-resolve timeout from config when not explicitly passed. Callers can still override with an explicit timeout arg. Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml per project conventions. Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
Contributor
|
Merged via PR #3597. Your idea was implemented using config.yaml settings (under |
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 27, 2026
….yaml (NousResearch#3597) Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml instead of hardcoded values. Users with slow local models (Ollama, llama.cpp) can now increase timeouts for compression, vision, session search, etc. Defaults: - auxiliary.compression.timeout: 120s (was hardcoded 45s) - auxiliary.vision.timeout: 30s (unchanged) - all other aux tasks: 30s (was hardcoded 30s) - title_generator: 30s (was hardcoded 15s) call_llm/async_call_llm now auto-resolve timeout from config when not explicitly passed. Callers can still override with an explicit timeout arg. Based on PR NousResearch#3406 by alanfwilliams. Converted from env vars to config.yaml per project conventions. Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
….yaml (NousResearch#3597) Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml instead of hardcoded values. Users with slow local models (Ollama, llama.cpp) can now increase timeouts for compression, vision, session search, etc. Defaults: - auxiliary.compression.timeout: 120s (was hardcoded 45s) - auxiliary.vision.timeout: 30s (unchanged) - all other aux tasks: 30s (was hardcoded 30s) - title_generator: 30s (was hardcoded 15s) call_llm/async_call_llm now auto-resolve timeout from config when not explicitly passed. Callers can still override with an explicit timeout arg. Based on PR NousResearch#3406 by alanfwilliams. Converted from env vars to config.yaml per project conventions. Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
….yaml (NousResearch#3597) Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml instead of hardcoded values. Users with slow local models (Ollama, llama.cpp) can now increase timeouts for compression, vision, session search, etc. Defaults: - auxiliary.compression.timeout: 120s (was hardcoded 45s) - auxiliary.vision.timeout: 30s (unchanged) - all other aux tasks: 30s (was hardcoded 30s) - title_generator: 30s (was hardcoded 15s) call_llm/async_call_llm now auto-resolve timeout from config when not explicitly passed. Callers can still override with an explicit timeout arg. Based on PR NousResearch#3406 by alanfwilliams. Converted from env vars to config.yaml per project conventions. Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
….yaml (NousResearch#3597) Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml instead of hardcoded values. Users with slow local models (Ollama, llama.cpp) can now increase timeouts for compression, vision, session search, etc. Defaults: - auxiliary.compression.timeout: 120s (was hardcoded 45s) - auxiliary.vision.timeout: 30s (unchanged) - all other aux tasks: 30s (was hardcoded 30s) - title_generator: 30s (was hardcoded 15s) call_llm/async_call_llm now auto-resolve timeout from config when not explicitly passed. Callers can still override with an explicit timeout arg. Based on PR NousResearch#3406 by alanfwilliams. Converted from env vars to config.yaml per project conventions. Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
….yaml (NousResearch#3597) Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml instead of hardcoded values. Users with slow local models (Ollama, llama.cpp) can now increase timeouts for compression, vision, session search, etc. Defaults: - auxiliary.compression.timeout: 120s (was hardcoded 45s) - auxiliary.vision.timeout: 30s (unchanged) - all other aux tasks: 30s (was hardcoded 30s) - title_generator: 30s (was hardcoded 15s) call_llm/async_call_llm now auto-resolve timeout from config when not explicitly passed. Callers can still override with an explicit timeout arg. Based on PR NousResearch#3406 by alanfwilliams. Converted from env vars to config.yaml per project conventions. Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HERMES_AUX_TIMEOUTenv var (default 90s) for auxiliarycall_llm/async_call_llmand title generation (was hardcoded 30s/15s)HERMES_COMPRESSION_TIMEOUTenv var (default 120s) for context compression (was hardcoded 45s)HERMES_API_TIMEOUTandHERMES_STREAM_STALE_TIMEOUTpatternsMotivation
On local setups (Ollama, oMLX, llama.cpp) running a single model for both main and auxiliary tasks, auxiliary requests queue behind the main generation. The 30s default fires before prefill completes, causing compression failures, title generation timeouts, and session search loops. Previously addressed for the main client (#1010) and vision (#2107) but not for auxiliary calls or compression.
Closes #3404
Test plan
HERMES_AUX_TIMEOUT=90andHERMES_COMPRESSION_TIMEOUT=120, run Hermes with a local model, verify auxiliary calls no longer time out during main generationpytest tests/ -vpassesFiles changed
agent/auxiliary_client.py— 3 timeout defaults read fromHERMES_AUX_TIMEOUTagent/context_compressor.py— compression timeout reads fromHERMES_COMPRESSION_TIMEOUTagent/title_generator.py— title timeout reads fromHERMES_AUX_TIMEOUTTested on macOS M1 Max, oMLX with Qwen3.5-35B-A3B-4bit.