Skip to content

fix(cron): replace wall-clock timeout with inactivity-based timeout#5440

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-39d533b8
Apr 6, 2026
Merged

fix(cron): replace wall-clock timeout with inactivity-based timeout#5440
teknium1 merged 1 commit into
mainfrom
hermes/hermes-39d533b8

Conversation

@teknium1

@teknium1 teknium1 commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Ports the gateway's inactivity-based timeout pattern (PR #5389) to the cron scheduler. Fixes the Sunday PR scout cron jobs (openclaw, nanoclaw, ironclaw) which all timed out at the hard 600s wall-clock limit while actively working.

Problem

The cron scheduler used a flat concurrent.futures.TimeoutError after 600s wall-clock, regardless of whether the agent was actively working. PR scout jobs that load the hermes-agent-dev skill (massive context), scan GitHub repos, and analyze diffs legitimately need 15-20+ minutes of active tool-calling on Opus.

Fix

1. Inactivity-based timeout (cron/scheduler.py)

  • Replace flat future.result(timeout=N) with a polling loop that checks agent.get_activity_summary()['seconds_since_activity'] every 5s
  • Same pattern as the gateway timeout from PR fix(gateway): replace wall-clock agent timeout with inactivity-based timeout #5389
  • Jobs can run for hours if actively working — only genuine inactivity triggers timeout
  • HERMES_CRON_TIMEOUT env var still works (default 600s = 10 min inactivity), 0 = unlimited
  • Timeout error now includes diagnostics: last activity desc, idle duration, current tool, iteration count

2. Fix hermes_time ModuleNotFoundError

  • Move sys.path.insert() before repo-level imports (hermes_constants, hermes_cli.config, hermes_time)
  • Previously at line 46 (after imports at lines 29-33), causing ModuleNotFoundError when scheduler was loaded in contexts without the repo root on sys.path

Test plan

  • 9 new tests covering active/idle/unlimited/env-var/diagnostic/backward-compat scenarios
  • All 117 existing cron tests pass (0 failures)
  • E2E verification with real imports from worktree

Error logs this fixes

2026-04-05 17:10:37 ERROR cron.scheduler: Job 'openclaw-pr-scout' timed out after 600s
2026-04-05 17:20:38 ERROR cron.scheduler: Job 'nanoclaw-pr-scout' timed out after 600s
2026-04-05 17:30:39 ERROR cron.scheduler: Job 'ironclaw-pr-scout' timed out after 600s

Port the gateway's inactivity-based timeout pattern (PR #5389) to the
cron scheduler. The agent can now run for hours if it's actively calling
tools or receiving stream tokens — only genuine inactivity (no activity
for HERMES_CRON_TIMEOUT seconds, default 600s) triggers a timeout.

This fixes the Sunday PR scouts (openclaw, nanoclaw, ironclaw) which
all hit the hard 600s wall-clock limit while actively working.

Changes:
- Replace flat future.result(timeout=N) with a polling loop that checks
  agent.get_activity_summary() every 5s (same pattern as gateway)
- Timeout error now includes diagnostic info: last activity description,
  idle duration, current tool, iteration count
- HERMES_CRON_TIMEOUT=0 means unlimited (no timeout)
- Move sys.path.insert before repo-level imports to fix
  ModuleNotFoundError for hermes_time on stale gateway processes
- Add time import needed by the polling loop
- Add 9 tests covering active/idle/unlimited/env-var/diagnostic scenarios
@teknium1 teknium1 merged commit d6ef7fd into main Apr 6, 2026
3 of 4 checks passed
saxster pushed a commit to saxster/hermes-agent that referenced this pull request Apr 8, 2026
…ousResearch#5440)

Port the gateway's inactivity-based timeout pattern (PR NousResearch#5389) to the
cron scheduler. The agent can now run for hours if it's actively calling
tools or receiving stream tokens — only genuine inactivity (no activity
for HERMES_CRON_TIMEOUT seconds, default 600s) triggers a timeout.

This fixes the Sunday PR scouts (openclaw, nanoclaw, ironclaw) which
all hit the hard 600s wall-clock limit while actively working.

Changes:
- Replace flat future.result(timeout=N) with a polling loop that checks
  agent.get_activity_summary() every 5s (same pattern as gateway)
- Timeout error now includes diagnostic info: last activity description,
  idle duration, current tool, iteration count
- HERMES_CRON_TIMEOUT=0 means unlimited (no timeout)
- Move sys.path.insert before repo-level imports to fix
  ModuleNotFoundError for hermes_time on stale gateway processes
- Add time import needed by the polling loop
- Add 9 tests covering active/idle/unlimited/env-var/diagnostic scenarios
Tommyeds pushed a commit to Tommyeds/hermes-agent that referenced this pull request Apr 12, 2026
…ousResearch#5440)

Port the gateway's inactivity-based timeout pattern (PR NousResearch#5389) to the
cron scheduler. The agent can now run for hours if it's actively calling
tools or receiving stream tokens — only genuine inactivity (no activity
for HERMES_CRON_TIMEOUT seconds, default 600s) triggers a timeout.

This fixes the Sunday PR scouts (openclaw, nanoclaw, ironclaw) which
all hit the hard 600s wall-clock limit while actively working.

Changes:
- Replace flat future.result(timeout=N) with a polling loop that checks
  agent.get_activity_summary() every 5s (same pattern as gateway)
- Timeout error now includes diagnostic info: last activity description,
  idle duration, current tool, iteration count
- HERMES_CRON_TIMEOUT=0 means unlimited (no timeout)
- Move sys.path.insert before repo-level imports to fix
  ModuleNotFoundError for hermes_time on stale gateway processes
- Add time import needed by the polling loop
- Add 9 tests covering active/idle/unlimited/env-var/diagnostic scenarios
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
…ousResearch#5440)

Port the gateway's inactivity-based timeout pattern (PR NousResearch#5389) to the
cron scheduler. The agent can now run for hours if it's actively calling
tools or receiving stream tokens — only genuine inactivity (no activity
for HERMES_CRON_TIMEOUT seconds, default 600s) triggers a timeout.

This fixes the Sunday PR scouts (openclaw, nanoclaw, ironclaw) which
all hit the hard 600s wall-clock limit while actively working.

Changes:
- Replace flat future.result(timeout=N) with a polling loop that checks
  agent.get_activity_summary() every 5s (same pattern as gateway)
- Timeout error now includes diagnostic info: last activity description,
  idle duration, current tool, iteration count
- HERMES_CRON_TIMEOUT=0 means unlimited (no timeout)
- Move sys.path.insert before repo-level imports to fix
  ModuleNotFoundError for hermes_time on stale gateway processes
- Add time import needed by the polling loop
- Add 9 tests covering active/idle/unlimited/env-var/diagnostic scenarios
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…ousResearch#5440)

Port the gateway's inactivity-based timeout pattern (PR NousResearch#5389) to the
cron scheduler. The agent can now run for hours if it's actively calling
tools or receiving stream tokens — only genuine inactivity (no activity
for HERMES_CRON_TIMEOUT seconds, default 600s) triggers a timeout.

This fixes the Sunday PR scouts (openclaw, nanoclaw, ironclaw) which
all hit the hard 600s wall-clock limit while actively working.

Changes:
- Replace flat future.result(timeout=N) with a polling loop that checks
  agent.get_activity_summary() every 5s (same pattern as gateway)
- Timeout error now includes diagnostic info: last activity description,
  idle duration, current tool, iteration count
- HERMES_CRON_TIMEOUT=0 means unlimited (no timeout)
- Move sys.path.insert before repo-level imports to fix
  ModuleNotFoundError for hermes_time on stale gateway processes
- Add time import needed by the polling loop
- Add 9 tests covering active/idle/unlimited/env-var/diagnostic scenarios
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
…ousResearch#5440)

Port the gateway's inactivity-based timeout pattern (PR NousResearch#5389) to the
cron scheduler. The agent can now run for hours if it's actively calling
tools or receiving stream tokens — only genuine inactivity (no activity
for HERMES_CRON_TIMEOUT seconds, default 600s) triggers a timeout.

This fixes the Sunday PR scouts (openclaw, nanoclaw, ironclaw) which
all hit the hard 600s wall-clock limit while actively working.

Changes:
- Replace flat future.result(timeout=N) with a polling loop that checks
  agent.get_activity_summary() every 5s (same pattern as gateway)
- Timeout error now includes diagnostic info: last activity description,
  idle duration, current tool, iteration count
- HERMES_CRON_TIMEOUT=0 means unlimited (no timeout)
- Move sys.path.insert before repo-level imports to fix
  ModuleNotFoundError for hermes_time on stale gateway processes
- Add time import needed by the polling loop
- Add 9 tests covering active/idle/unlimited/env-var/diagnostic scenarios
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ousResearch#5440)

Port the gateway's inactivity-based timeout pattern (PR NousResearch#5389) to the
cron scheduler. The agent can now run for hours if it's actively calling
tools or receiving stream tokens — only genuine inactivity (no activity
for HERMES_CRON_TIMEOUT seconds, default 600s) triggers a timeout.

This fixes the Sunday PR scouts (openclaw, nanoclaw, ironclaw) which
all hit the hard 600s wall-clock limit while actively working.

Changes:
- Replace flat future.result(timeout=N) with a polling loop that checks
  agent.get_activity_summary() every 5s (same pattern as gateway)
- Timeout error now includes diagnostic info: last activity description,
  idle duration, current tool, iteration count
- HERMES_CRON_TIMEOUT=0 means unlimited (no timeout)
- Move sys.path.insert before repo-level imports to fix
  ModuleNotFoundError for hermes_time on stale gateway processes
- Add time import needed by the polling loop
- Add 9 tests covering active/idle/unlimited/env-var/diagnostic scenarios
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…ousResearch#5440)

Port the gateway's inactivity-based timeout pattern (PR NousResearch#5389) to the
cron scheduler. The agent can now run for hours if it's actively calling
tools or receiving stream tokens — only genuine inactivity (no activity
for HERMES_CRON_TIMEOUT seconds, default 600s) triggers a timeout.

This fixes the Sunday PR scouts (openclaw, nanoclaw, ironclaw) which
all hit the hard 600s wall-clock limit while actively working.

Changes:
- Replace flat future.result(timeout=N) with a polling loop that checks
  agent.get_activity_summary() every 5s (same pattern as gateway)
- Timeout error now includes diagnostic info: last activity description,
  idle duration, current tool, iteration count
- HERMES_CRON_TIMEOUT=0 means unlimited (no timeout)
- Move sys.path.insert before repo-level imports to fix
  ModuleNotFoundError for hermes_time on stale gateway processes
- Add time import needed by the polling loop
- Add 9 tests covering active/idle/unlimited/env-var/diagnostic scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant