Skip to content

perf(cron): narrow file lock scope + heartbeat ticker#21901

Open
gengwb wants to merge 1 commit into
NousResearch:mainfrom
gengwb:local-patches
Open

perf(cron): narrow file lock scope + heartbeat ticker#21901
gengwb wants to merge 1 commit into
NousResearch:mainfrom
gengwb:local-patches

Conversation

@gengwb

@gengwb gengwb commented May 8, 2026

Copy link
Copy Markdown

Summary

Two small but impactful fixes for cron ticker reliability and observability.

1. cron/scheduler.py — Narrow file lock scope + fire-and-forget execution

Problem: The file lock in tick() covered the entire cycle — scheduling AND job execution. A long-running job would block subsequent ticks, potentially delaying or missing future job deadlines in high-frequency schedules.

Fix:

  • Narrow the file lock to only the scheduling phase (get_due_jobs + advance_next_run), then release it before any job execution begins.
  • Use fire-and-forget pattern (ThreadPoolExecutor.submit + shutdown(wait=False)) for parallel jobs, so tick() returns immediately instead of blocking on job completion.

This means subsequent ticks are never blocked by running jobs — the lock only serializes the scheduling bookkeeping.

2. gateway/run.py — Heartbeat + robust error logging

  • Heartbeat file: Write a timestamp to ~/.hermes/cron/.ticker_heartbeat every 5 ticks (~5 minutes). Enables external watchdogs to detect a dead ticker process.
  • Heartbeat log: Log an INFO-level "Cron ticker heartbeat" message at the same interval, so agent.log always shows the ticker is alive.
  • Exception → BaseException: All except Exception blocks changed to except BaseException with exc_info=True, so SystemExit/KeyboardInterrupt/other critical signals are logged with full traceback instead of being silently swallowed.

Testing

  • tick() runs scheduling under lock, releases before execution
  • Fire-and-forget parallel execution does not block subsequent ticks
  • Heartbeat file written every 5 ticks
  • Heartbeat log emitted at correct interval
  • All existing cron functionality preserved

@alt-glitch alt-glitch added type/perf Performance improvement or optimization P2 Medium — degraded but workaround exists comp/cron Cron scheduler and job management comp/gateway Gateway runner, session dispatch, delivery labels May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cron Cron scheduler and job management comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/perf Performance improvement or optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants