fix(curator): skip archival of skills referenced by active cron jobs (#29912)#30108
Open
friday-james wants to merge 12 commits into
Open
fix(curator): skip archival of skills referenced by active cron jobs (#29912)#30108friday-james wants to merge 12 commits into
friday-james wants to merge 12 commits into
Conversation
…ousResearch#29912) Closes NousResearch#29912. The curator could archive a skill that an enabled cron job still depended on, causing the next scheduled run to silently fail. Three coordinated layers close the gap: 1. `cron/jobs.py` — add `get_active_skill_refs()`: calls `list_jobs(include_disabled=False)` (single source of truth) and returns the set of skill names referenced by any enabled job. 2. `agent/curator.py` — add `_load_cron_protected()`: wraps the import + call behind a two-tier error boundary. `ModuleNotFoundError` → cron not installed, return `set()`. Any other exception → cron present but broken, return `None` (fail-closed signal). `apply_automatic_transitions()` and `run_curator_review()` snapshot cron-protected refs once and propagate the `None` signal: transitions are skipped entirely when the cron store is unreadable. `_render_candidate_list()` emits a visible LLM warning and adds a "Cron-protected skills" section to the prompt so the model won't archive a live dependency. 3. `tools/skill_manager_tool.py` — harden `_delete_skill()` as a last-resort guard: inside a curator background-review fork (`is_background_review()` ContextVar), reject any delete of a cron-referenced skill with a structured error before touching disk. Same two-tier import boundary; fail-closed on cron-store errors. All three layers share `get_active_skill_refs()` as their single source of truth.
The handler uses `except Exception` (to catch broken-import errors like RuntimeError during module init), but the inline comment still said "Other ImportError". Align the comment with the actual code.
…otected kwarg
The stub in test_real_run_takes_pre_snapshot used `lambda now=None: {...}`
which did not accept the new `cron_protected` keyword argument added by the
fix for NousResearch#29912. When run_curator_review called
`apply_automatic_transitions(now=start, cron_protected=cron_protected)` the
stub raised TypeError. Update to `lambda now=None, cron_protected=None: {...}`.
On Python ≤ 3.11, setting sys.modules["cron.jobs"] = None causes `from cron.jobs import ...` to raise plain ImportError instead of ModuleNotFoundError. Both _load_cron_protected() and _delete_skill() were only catching ModuleNotFoundError, so on Python 3.11 the "not installed" path would fall through to the fail-closed handler. Switch to `except ImportError as e` and check e.name to distinguish "module absent" (name in cron/cron.jobs → empty set / pass) from "dependency broken" (other name → fail-closed). Also update test comment to document the Python version behaviour difference. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When cron.jobs is installed but get_active_skill_refs is absent (e.g.
partial upgrade / old module version), Python raises plain ImportError
with e.name == "cron.jobs" — indistinguishable from a missing module
if we only check e.name. Both _load_cron_protected() and the
_delete_skill() cron guard were returning "not installed" (empty
set / pass) in this case, silently disabling the protection.
Fix: catch ModuleNotFoundError and ImportError separately. Only
ModuleNotFoundError with e.name in ("cron", "cron.jobs") means the
subsystem is absent; plain ImportError always means fail-closed.
Also fix trailing \n inside a string appended to a join-joined list
in _render_candidate_list (produces double blank line).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…uard non-iterable skills field Two bugs: 1. curator.py _run_llm_review() created the review AIAgent without setting _memory_write_origin = "background_review", so is_background_review() returned False inside all curator-LLM tool calls. The _delete_skill() cron protection guard (added in this PR) was dead code for curator- initiated deletes. Fix: set the attribute before run_conversation(). 2. cron/jobs.py _normalize_skill_list() called list(skills) in the else branch with no TypeError guard. A corrupted job record with a non- iterable skills field (e.g. integer from hand-edited JSON) raised TypeError that escaped through get_active_skill_refs() and was caught by _load_cron_protected()'s except-Exception handler, returning None and suppressing all auto-archive transitions for the entire run. Fix: wrap list(skills) in try/except TypeError, treating bad records as having no skills rather than halting the whole snapshot. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- curator.py: use BACKGROUND_REVIEW constant (from tools.skill_provenance) instead of the bare string literal when setting _memory_write_origin, so it stays in sync with is_background_review() automatically. - curator.py: add exc_info=True to all WARNING calls in _load_cron_protected so import/runtime errors include a traceback for easier diagnosis. - skill_manager_tool.py: collapse three identical except branches in the _delete_skill cron guard into one except Exception as e with an isinstance check, eliminating the duplicated log call and error-dict literal. - tests/tools/test_skill_manager_tool.py: replace manual sys.modules save/restore with monkeypatch.setitem for correct teardown semantics. - cron/jobs.py: log a WARNING when _normalize_skill_list swallows TypeError so operators know to repair a malformed job record. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…chain Add a comment clarifying that conversation_loop.py reads _memory_write_origin and calls set_current_write_origin() at the start of run_conversation(), which is what sets the ContextVar that is_background_review() checks in tool handlers such as _delete_skill(). The mechanism is correct; the comment makes the indirect path visible to reviewers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ules monkeypatch.setitem only restores sys.modules["cron.jobs"]; if the import machinery ever sets cron.jobs on the parent package, the attribute leaks into later tests. Patch both together so monkeypatch teardown restores the complete import state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a job record has a malformed non-iterable 'skills' field but a valid legacy 'skill' field, the TypeError handler was dropping the skill entirely, causing get_active_skill_refs() to miss the dependency and potentially allowing curator to archive an in-use skill. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…skill_list Asserts that a job with a non-iterable 'skills' field (e.g., an integer from corrupted JSON) still contributes its legacy 'skill' value to get_active_skill_refs(), and that a WARNING is logged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… docstring - test_curator.py: also patch cron.jobs on the parent package so monkeypatch teardown restores the full import state (same fix already applied to test_skill_manager_tool.py) - _load_cron_protected() docstring: empty set means "no protected skills" (cron absent OR installed with no enabled skill refs), not just "absent" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This was referenced May 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The curator was archiving skills while a cron job that references them was still
enabled — the job would then silently fail on its next scheduled run with no
skill to invoke.
Bug
Reported in issue #29912. The curator's automatic transition phase
(
apply_automatic_transitions) and its LLM review prompt had no visibility intowhich skills were pinned by the cron scheduler. A skill could be stale-by-age
and get archived even though
cron listshowed an active job pointing at it.Changes
cron/jobs.py—get_active_skill_refs(): new single source of truth;iterates enabled jobs via
list_jobs(include_disabled=False)and returns theset of referenced skill names.
agent/curator.py—_load_cron_protected(): snapshot function called onceat the top of
run_curator_review(). Returns an empty set when the cronsubsystem is not installed,
Noneon store error (fail-closed: auto-archivetransitions are skipped for that run rather than proceeding unprotected).
agent/curator.py—apply_automatic_transitions(): accepts the pre-computedcron_protectedset; any skill in that set bypasses the stale/archivetransitions. Handles
None(cron unreadable) by logging and skipping.agent/curator.py—_render_candidate_list(): surfaces the protected set asa labelled section in the LLM prompt. When the cron store is unreadable, emits
a hard warning instructing the review agent not to archive anything this run.
agent/curator.py—run_curator_review(): snapshot computed once, passed toboth the automatic phase and the LLM prompt to prevent TOCTOU drift.
tools/skill_manager_tool.py—_delete_skill(): hard enforcement layerinside the curator review fork (
is_background_review()). Even if the LLMtries to delete a cron-referenced skill, the tool call is rejected with a
structured error and a remediation hint.
tests/agent/test_curator.py(+98 LOC): 4 new tests — cron-referenced skillnot archived,
_load_cron_protectedreturns empty set when cron absent, returnsNoneand warns on runtime error, transitions skip when cron store raises.tests/agent/test_curator_backup.py(+2/-2 LOC): lambda stub updated toaccept the new
cron_protectedkwarg inapply_automatic_transitions.tests/cron/test_jobs.py(+41 LOC):TestGetActiveSkillRefs— 6 testscovering empty scheduler, single-skill jobs, multi-skill jobs, disabled job
filtering, and missing skill key handling.
tests/tools/test_skill_manager_tool.py(+73 LOC):TestDeleteSkillCronGuard— 4tests covering blocked deletion inside curator fork, allowed when skill is not
referenced, pass-through outside the curator fork, and structured failure
response on cron store error.
Python compatibility
from cron.jobs import ...withsys.modules["cron.jobs"] = Noneraises plainImportErroron Python 3.11 (notModuleNotFoundError). Both guards now useexcept ImportError as eand checke.nameto distinguish "subsystem absent"from "broken dependency", covering all supported Python versions (≥ 3.11).
Validation
Targeted test files via
python3 -m pytest:252 passed.
E2E: skill aged past
archive_after_days, cron job enabled referencing it →curator run completes, skill directory intact,
run.jsonshowsarchived: 0,LLM prompt contains the "Cron-protected skills" section with the skill name.
Closes #29912