Skip to content

fix(cron): treat non-dict origin as missing instead of crashing tick#19013

Closed
Tranquil-Flow wants to merge 1 commit into
NousResearch:mainfrom
Tranquil-Flow:fix/18722-cron-null-next-run-and-non-dict-origin
Closed

fix(cron): treat non-dict origin as missing instead of crashing tick#19013
Tranquil-Flow wants to merge 1 commit into
NousResearch:mainfrom
Tranquil-Flow:fix/18722-cron-null-next-run-and-non-dict-origin

Conversation

@Tranquil-Flow

Copy link
Copy Markdown
Contributor

What does this PR do?

_resolve_origin called origin.get('platform') on whatever job.get('origin') returned. The leading if not origin: return None short-circuited the falsy cases (None, empty dict, "") but a non-empty string passed that guard and crashed with AttributeError: 'str' object has no attribute 'get' on every fire attempt. Observed in the wild after a migration script tagged jobs with free-form provenance strings (e.g. "combined-digest-replaces-x-and-y-20260503").

mark_job_run did record last_status: error, last_error: "'str' object has no attribute 'get'" once, but the next tick re-loaded the same poisoned origin and crashed identically. The job stayed enabled and accumulated cascading errors until origin was patched manually.

Replace the falsy guard with isinstance(origin, dict). Non-dict origins (string, int, list, tuple, float — anything that survived a hand-edit, JSON-script write, or migration) are now treated the same as a missing origin: the job continues with deliver falling back through its normal home-channel path instead of crashing the scheduler loop.

Scope: the non-dict-origin crash sub-bug from #18722. The next_run_at: null recurring-job recovery (the second sub-bug) is independently addressed by the in-flight #18825, which extends the never-silently-disable defense from #16265 to get_due_jobs(). Either one can land first.

Related Issue

Fixes #18722 (non-dict origin crash; recurring-job recovery covered by #18825)

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • cron/scheduler.py_resolve_origin guards isinstance(origin, dict) before .get() calls; updated docstring with the production trigger pattern.
  • tests/cron/test_scheduler.pyTestResolveOrigin.test_non_dict_origin_returns_none_instead_of_crashing parametrises over the non-dict shapes (str, int, list, tuple, float).

How to Test

  1. Edit `~/.hermes/cron/jobs.json` and set a job's origin to a string like "my-migration-tag".
  2. Restart the gateway / wait for the cron tick.
  3. Before this PR: every fire crashes with AttributeError; after: job runs with default delivery routing.
  4. `pytest tests/cron/test_scheduler.py::TestResolveOrigin -q` → 10/10 pass.

Checklist

Code

Documentation & Housekeeping

  • I've updated relevant documentation — N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture — N/A
  • I've considered cross-platform impact — N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

``_resolve_origin`` called ``origin.get('platform')`` on whatever
``job.get('origin')`` returned. The leading ``if not origin: return None``
short-circuited the falsy cases (None, empty dict, "") but a non-empty
string passed that guard and then crashed with
``AttributeError: 'str' object has no attribute 'get'`` on every fire
attempt. Observed in the wild after a migration script tagged jobs with
free-form provenance strings (e.g.
``"combined-digest-replaces-x-and-y-20260503"``).

``mark_job_run`` did record ``last_status: error,
last_error: "'str' object has no attribute 'get'"`` once, but the next
tick re-loaded the same poisoned origin and crashed identically. The
job stayed enabled, fired every tick, and accumulated cascading errors
in the log until ``origin`` was patched manually.

Replace the falsy guard with ``isinstance(origin, dict)``. Non-dict
origins (string, int, list, tuple, float — anything that survived a
hand-edit, JSON-script write, or migration) are now treated the same
as a missing origin: the job continues with ``deliver`` falling back
through its normal home-channel path instead of crashing the scheduler
loop.

Test parametrises the non-dict shapes that can appear in jobs.json
through external writers and asserts ``_resolve_origin`` returns None
for each.

Note: this fix scope is the non-dict-``origin`` crash only. The
``next_run_at: null`` recurring-job recovery (the second sub-bug in
NousResearch#18722) is independently addressed by the in-flight NousResearch#18825, which
extends the never-silently-disable defense from NousResearch#16265 to
``get_due_jobs()`` — that approach is well-aligned with the existing
recovery pattern and ships fine without a competing change here.

Fixes NousResearch#18722 (non-dict origin crash; recurring-job recovery covered by NousResearch#18825)
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/cron Cron scheduler and job management labels May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cron Cron scheduler and job management P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cron: jobs with null next_run_at silently skipped; non-dict origin crashes ticker

2 participants