Skip to content

fix(cron): harden ticker thread, null-safe deliver, and BSM env loader#33527

Open
zccyman wants to merge 1 commit into
NousResearch:mainfrom
atyou2happy:fix/cron-hardening-ticker-watchdog-null-bsm
Open

fix(cron): harden ticker thread, null-safe deliver, and BSM env loader#33527
zccyman wants to merge 1 commit into
NousResearch:mainfrom
atyou2happy:fix/cron-hardening-ticker-watchdog-null-bsm

Conversation

@zccyman

@zccyman zccyman commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Three cron bugs fixed in one PR (root cause cluster: cron lifecycle fragility).

Bug 1 — Ticker thread stops silently (#32895) P1

Root cause: _start_cron_ticker() in gateway/run.py wraps cron_tick() in try/except, but the rest of the while loop body (channel directory refresh, cache cleanup, curator run) runs outside any exception handler. An unexpected error in any of these kills the thread with no error log.

Fix: Wrap the entire while loop body in a top-level try/except Exception with logger.error(..., exc_info=True). The thread now survives any single-iteration failure and retries on the next cycle.

Bug 2 — hermes cron list crashes on null deliver (#32896) P2

Root cause: job.get("deliver", ["local"]) returns None when the key exists but has a null value. Then ", ".join(None) throws TypeError.

Fix: job.get("deliver") or ["local"] — handles both missing key AND null value.

Bug 3 — Cron uses bare load_dotenv() instead of load_hermes_dotenv() (#33465) P2

Root cause: cron/scheduler.py calls from dotenv import load_dotenv directly, skipping BSM secret resolution, .env sanitization, and encoding fallback that load_hermes_dotenv() provides. Cron jobs needing BSM-managed credentials fail with HTTP 401.

Fix: Replace with from hermes_cli.env_loader import load_hermes_dotenv and call load_hermes_dotenv(hermes_home=_get_hermes_home()) to preserve profile-aware path resolution.

Files Changed

File Change
gateway/run.py Wrap entire ticker while body in try/except
cron/scheduler.py load_dotenv()load_hermes_dotenv(hermes_home=...)
hermes_cli/cron.py job.get("deliver", [...])job.get("deliver") or [...]
tests/cron/test_cron_profile.py Mock load_hermes_dotenv in profile tests

Testing

  • 381 tests passed (tests/cron/, tests/hermes_cli/test_cron.py, tests/hermes_cli/test_env_loader.py)
  • Updated profile context tests to mock the new load_hermes_dotenv call

Closes #32895, Closes #32896, Closes #33465

Three cron bugs fixed in one PR (root cause cluster: cron lifecycle fragility):

1. NousResearch#32895 P1: Ticker thread dies silently on unexpected exceptions.
   The while loop body was only partially wrapped — errors from channel
   directory refresh, cache cleanup, or curator runs propagated up and
   killed the thread with no error log. Fix: wrap the ENTIRE loop body
   in a top-level try/except with ERROR-level logging.

2. NousResearch#32896 P2: `hermes cron list` crashes on null deliver field.
   job.get("deliver", ["local"]) returns None when the key exists but
   has null value. Fix: use `or` fallback instead of default arg.

3. NousResearch#33465 P2: cron/scheduler.py uses bare load_dotenv() instead of
   load_hermes_dotenv(). This skips BSM secret resolution, so any
   cron job needing BSM-managed credentials fails with HTTP 401.
   Fix: swap to load_hermes_dotenv(hermes_home=_get_hermes_home())
   to preserve profile-aware path resolution.

Tests: 381 passed (including updated profile context tests).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/cron Cron scheduler and job management comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

2 participants