[Feature]: Configurable timeouts for auxiliary call_llm and context compression

### Problem or Use Case

When using local LLM providers (Ollama, oMLX, llama.cpp) on consumer hardware, the hardcoded 30s timeout in `auxiliary_client.py` and 45s timeout in `context_compressor.py` are too short. Local models need time for prefill, especially when the main model is already generating and auxiliary requests queue behind it.

This was partially addressed for the main client (#1010 → `HERMES_API_TIMEOUT`) and for vision (#2107 → `auxiliary.vision.timeout` in config.yaml), but the pattern wasn't extended to:

1. **`auxiliary_client.py`** — `call_llm()`, `async_call_llm()`, and `_build_call_kwargs()` all default to `timeout: float = 30.0`
2. **`context_compressor.py`** — hardcoded `"timeout": 45.0` at line 350
3. **`title_generator.py`** — hardcoded `timeout: float = 15.0`

### Impact

On a local setup running a single model for both main inference and auxiliary tasks (compression, session search, skills_hub, flush_memories, title generation), requests queue behind the main generation. A 30s timeout fires before prefill even completes, causing:

- Context compression failures → context grows until it exceeds the context window
- Title generation failures (15s is particularly tight)
- Session search timeout loops (auxiliary request queues, times out, retries, times out again)

### Proposed Solution

Follow the existing pattern from `HERMES_API_TIMEOUT` and `auxiliary.vision.timeout`:

**Option A (env var):** `HERMES_AUX_TIMEOUT` for auxiliary calls, `HERMES_COMPRESSION_TIMEOUT` for compression — consistent with `HERMES_API_TIMEOUT` and `HERMES_STREAM_STALE_TIMEOUT`.

**Option B (config.yaml):** Add timeout fields under existing config sections:
```yaml
compression:
  timeout: 120      # was hardcoded 45

auxiliary:
  default_timeout: 90  # was hardcoded 30 in call_llm/async_call_llm
```

Option B is cleaner long-term. Option A is a one-line patch per call site.

### Workaround

Currently patching the defaults in `auxiliary_client.py` and `context_compressor.py` to read from env vars. These patches are lost on `hermes update`.

### Environment

- macOS, M1 Max 32GB
- oMLX serving Qwen3.5-35B-A3B-4bit locally
- Single model handling both main and auxiliary tasks
- Hermes v0.3.0 (latest as of 2026-03-27)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Configurable timeouts for auxiliary call_llm and context compression #3404

Problem or Use Case

Impact

Proposed Solution

Workaround

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: Configurable timeouts for auxiliary call_llm and context compression #3404

Description

Problem or Use Case

Impact

Proposed Solution

Workaround

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions