gateway agent timeout (HERMES_AGENT_TIMEOUT) kills legitimate long-running tasks

## Problem

Commit `970042de` introduced a hard 10-minute timeout (`HERMES_AGENT_TIMEOUT`, default 600s) on gateway agent execution via `asyncio.wait_for()`. This was designed to prevent stuck sessions, but it also kills **legitimate long-running tasks** — particularly subagent/delegate work with reasoning-heavy models.

When a user sets the agent to work autonomously (e.g., multi-step research, code implementation via `delegate_task`, or reasoning models with long chain-of-thought), hitting the 10-minute wall results in:
1. Agent is force-interrupted mid-work (`agent.interrupt()`)
2. All intermediate progress is lost
3. User gets a generic error: *"Request timed out after 10 minutes. Try again, or use /reset to start fresh."*
4. The session transcript is marked as `failed: true`, breaking conversation continuity

This significantly reduces the appeal of autonomous/unsupervised agent workflows, which is a core value proposition.

## Environment

- OS: Oracle Linux 9 (aarch64)
- Python: 3.11.15
- Hermes Agent: v0.7.0 (v2026.4.3)
- `gateway/run.py` line 6042-6070: `asyncio.wait_for(loop.run_in_executor(None, run_sync), timeout=_agent_timeout)`
- `cron/scheduler.py`: Similar `HERMES_CRON_TIMEOUT` for cron jobs
- Timeout is env-var only (`HERMES_AGENT_TIMEOUT=600`), not exposed in `config.yaml` or `DEFAULT_CONFIG`

## Steps to Reproduce

1. Start hermes gateway: `hermes gateway`
2. Send a task that requires subagent delegation with a reasoning model (e.g., via Telegram):
   - "Analyze the Hermes codebase for all timeout-related code"
   - This triggers `delegate_task`, which spawns a subagent doing multiple tool calls
3. Wait for the agent to work for 10+ minutes
4. Observe: the agent is interrupted and the user receives:
   > ⏱️ Request timed out after 10 minutes. The agent may have been stuck on a tool or API call.
   > Try again, or use /reset to start fresh.

## Suggested Improvements

### 1. Activity-based timeout instead of wall-clock timeout
Instead of a fixed wall-clock limit, track the last "active" timestamp (updated on each tool_call completion / API response). Only trigger timeout if there's been no activity for N seconds. This distinguishes "working hard on a complex task" from "hung on a dead API call."

```python
# Pseudocode: reset on each successful tool/API round-trip
self._last_activity = time.time()
# Timeout check: time.time() - self._last_activity > INACTIVITY_TIMEOUT
```

### 2. Expose timeout in config.yaml
`HERMES_AGENT_TIMEOUT` is env-var only. Users shouldn't need to edit `.env` for a behavioral setting. Add to `DEFAULT_CONFIG`:

```yaml
agent:
  gateway_timeout: 600  # seconds, 0 = unlimited
```

### 3. Timeout extension prompt
When approaching the timeout (e.g., at 80% elapsed), send a non-blocking notification to the user with an option to extend. On platforms that support it (Telegram inline keyboard), offer a "Continue" button that resets the timer.

### 4. Graceful degradation over hard kill
On timeout, instead of `agent.interrupt()` (which kills everything), consider:
- Saving the current conversation state for resumption
- Returning a partial result with what was completed so far
- Offering `/resume` or auto-retry with a fresh context window

### 5. Subagent-aware timeout accounting
When the main agent delegates to a subagent (`delegate_task`), the subagent runs in a separate process/thread. The main agent is effectively "waiting" (idle). This idle-wait time should not count against the timeout, since the agent isn't stuck — it's waiting for a child to finish.

## Workaround
Set a larger timeout via environment variable:
```bash
# In ~/.hermes/.env
HERMES_AGENT_TIMEOUT=1800  # 30 minutes
HERMES_CRON_TIMEOUT=1800   # also for cron jobs
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gateway agent timeout (HERMES_AGENT_TIMEOUT) kills legitimate long-running tasks #4815

Problem

Environment

Steps to Reproduce

Suggested Improvements

1. Activity-based timeout instead of wall-clock timeout

2. Expose timeout in config.yaml

3. Timeout extension prompt

4. Graceful degradation over hard kill

5. Subagent-aware timeout accounting

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

gateway agent timeout (HERMES_AGENT_TIMEOUT) kills legitimate long-running tasks #4815

Description

Problem

Environment

Steps to Reproduce

Suggested Improvements

1. Activity-based timeout instead of wall-clock timeout

2. Expose timeout in config.yaml

3. Timeout extension prompt

4. Graceful degradation over hard kill

5. Subagent-aware timeout accounting

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions