Skip to content

fix(curator): skip archival of skills referenced by active cron jobs (#29912)#30108

Open
friday-james wants to merge 12 commits into
NousResearch:mainfrom
friday-james:fix/curator-cron-protection-29912-final
Open

fix(curator): skip archival of skills referenced by active cron jobs (#29912)#30108
friday-james wants to merge 12 commits into
NousResearch:mainfrom
friday-james:fix/curator-cron-protection-29912-final

Conversation

@friday-james

@friday-james friday-james commented May 21, 2026

Copy link
Copy Markdown

The curator was archiving skills while a cron job that references them was still
enabled — the job would then silently fail on its next scheduled run with no
skill to invoke.

Bug

Reported in issue #29912. The curator's automatic transition phase
(apply_automatic_transitions) and its LLM review prompt had no visibility into
which skills were pinned by the cron scheduler. A skill could be stale-by-age
and get archived even though cron list showed an active job pointing at it.

Changes

  • cron/jobs.pyget_active_skill_refs(): new single source of truth;
    iterates enabled jobs via list_jobs(include_disabled=False) and returns the
    set of referenced skill names.

  • agent/curator.py_load_cron_protected(): snapshot function called once
    at the top of run_curator_review(). Returns an empty set when the cron
    subsystem is not installed, None on store error (fail-closed: auto-archive
    transitions are skipped for that run rather than proceeding unprotected).

  • agent/curator.pyapply_automatic_transitions(): accepts the pre-computed
    cron_protected set; any skill in that set bypasses the stale/archive
    transitions. Handles None (cron unreadable) by logging and skipping.

  • agent/curator.py_render_candidate_list(): surfaces the protected set as
    a labelled section in the LLM prompt. When the cron store is unreadable, emits
    a hard warning instructing the review agent not to archive anything this run.

  • agent/curator.pyrun_curator_review(): snapshot computed once, passed to
    both the automatic phase and the LLM prompt to prevent TOCTOU drift.

  • tools/skill_manager_tool.py_delete_skill(): hard enforcement layer
    inside the curator review fork (is_background_review()). Even if the LLM
    tries to delete a cron-referenced skill, the tool call is rejected with a
    structured error and a remediation hint.

  • tests/agent/test_curator.py (+98 LOC): 4 new tests — cron-referenced skill
    not archived, _load_cron_protected returns empty set when cron absent, returns
    None and warns on runtime error, transitions skip when cron store raises.

  • tests/agent/test_curator_backup.py (+2/-2 LOC): lambda stub updated to
    accept the new cron_protected kwarg in apply_automatic_transitions.

  • tests/cron/test_jobs.py (+41 LOC): TestGetActiveSkillRefs — 6 tests
    covering empty scheduler, single-skill jobs, multi-skill jobs, disabled job
    filtering, and missing skill key handling.

  • tests/tools/test_skill_manager_tool.py (+73 LOC): TestDeleteSkillCronGuard — 4
    tests covering blocked deletion inside curator fork, allowed when skill is not
    referenced, pass-through outside the curator fork, and structured failure
    response on cron store error.

Python compatibility

from cron.jobs import ... with sys.modules["cron.jobs"] = None raises plain
ImportError on Python 3.11 (not ModuleNotFoundError). Both guards now use
except ImportError as e and check e.name to distinguish "subsystem absent"
from "broken dependency", covering all supported Python versions (≥ 3.11).

Validation

Targeted test files via python3 -m pytest:

tests/agent/test_curator.py
tests/agent/test_curator_backup.py
tests/cron/test_jobs.py
tests/tools/test_skill_manager_tool.py

252 passed.

E2E: skill aged past archive_after_days, cron job enabled referencing it →
curator run completes, skill directory intact, run.json shows archived: 0,
LLM prompt contains the "Cron-protected skills" section with the skill name.

Closes #29912

…ousResearch#29912)

Closes NousResearch#29912.

The curator could archive a skill that an enabled cron job still depended on,
causing the next scheduled run to silently fail.  Three coordinated layers
close the gap:

1. `cron/jobs.py` — add `get_active_skill_refs()`: calls `list_jobs(include_disabled=False)`
   (single source of truth) and returns the set of skill names referenced by any
   enabled job.

2. `agent/curator.py` — add `_load_cron_protected()`: wraps the import + call behind
   a two-tier error boundary.  `ModuleNotFoundError` → cron not installed, return `set()`.
   Any other exception → cron present but broken, return `None` (fail-closed signal).
   `apply_automatic_transitions()` and `run_curator_review()` snapshot cron-protected
   refs once and propagate the `None` signal: transitions are skipped entirely when the
   cron store is unreadable.  `_render_candidate_list()` emits a visible LLM warning
   and adds a "Cron-protected skills" section to the prompt so the model won't archive
   a live dependency.

3. `tools/skill_manager_tool.py` — harden `_delete_skill()` as a last-resort guard:
   inside a curator background-review fork (`is_background_review()` ContextVar), reject
   any delete of a cron-referenced skill with a structured error before touching disk.
   Same two-tier import boundary; fail-closed on cron-store errors.

All three layers share `get_active_skill_refs()` as their single source of truth.
Copilot AI review requested due to automatic review settings May 21, 2026 23:29
The handler uses `except Exception` (to catch broken-import errors like
RuntimeError during module init), but the inline comment still said
"Other ImportError". Align the comment with the actual code.
@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder comp/cron Cron scheduler and job management tool/skills Skills system (list, view, manage) P1 High — major feature broken, no workaround labels May 21, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

friday-james and others added 2 commits May 22, 2026 00:04
…otected kwarg

The stub in test_real_run_takes_pre_snapshot used `lambda now=None: {...}`
which did not accept the new `cron_protected` keyword argument added by the
fix for NousResearch#29912. When run_curator_review called
`apply_automatic_transitions(now=start, cron_protected=cron_protected)` the
stub raised TypeError. Update to `lambda now=None, cron_protected=None: {...}`.
On Python ≤ 3.11, setting sys.modules["cron.jobs"] = None causes
`from cron.jobs import ...` to raise plain ImportError instead of
ModuleNotFoundError. Both _load_cron_protected() and _delete_skill()
were only catching ModuleNotFoundError, so on Python 3.11 the
"not installed" path would fall through to the fail-closed handler.

Switch to `except ImportError as e` and check e.name to distinguish
"module absent" (name in cron/cron.jobs → empty set / pass) from
"dependency broken" (other name → fail-closed). Also update test
comment to document the Python version behaviour difference.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@friday-james friday-james requested a review from Copilot May 22, 2026 00:41

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment thread agent/curator.py Outdated
Comment thread tools/skill_manager_tool.py
Comment thread tools/skill_manager_tool.py
Comment thread agent/curator.py
friday-james and others added 2 commits May 22, 2026 00:57
When cron.jobs is installed but get_active_skill_refs is absent (e.g.
partial upgrade / old module version), Python raises plain ImportError
with e.name == "cron.jobs" — indistinguishable from a missing module
if we only check e.name.  Both _load_cron_protected() and the
_delete_skill() cron guard were returning "not installed" (empty
set / pass) in this case, silently disabling the protection.

Fix: catch ModuleNotFoundError and ImportError separately.  Only
ModuleNotFoundError with e.name in ("cron", "cron.jobs") means the
subsystem is absent; plain ImportError always means fail-closed.

Also fix trailing \n inside a string appended to a join-joined list
in _render_candidate_list (produces double blank line).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…uard non-iterable skills field

Two bugs:

1. curator.py _run_llm_review() created the review AIAgent without setting
   _memory_write_origin = "background_review", so is_background_review()
   returned False inside all curator-LLM tool calls. The _delete_skill()
   cron protection guard (added in this PR) was dead code for curator-
   initiated deletes. Fix: set the attribute before run_conversation().

2. cron/jobs.py _normalize_skill_list() called list(skills) in the else
   branch with no TypeError guard. A corrupted job record with a non-
   iterable skills field (e.g. integer from hand-edited JSON) raised
   TypeError that escaped through get_active_skill_refs() and was caught
   by _load_cron_protected()'s except-Exception handler, returning None
   and suppressing all auto-archive transitions for the entire run.
   Fix: wrap list(skills) in try/except TypeError, treating bad records
   as having no skills rather than halting the whole snapshot.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@friday-james friday-james requested a review from Copilot May 22, 2026 01:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Comment thread agent/curator.py Outdated
Comment thread agent/curator.py
Comment thread tests/tools/test_skill_manager_tool.py Outdated
Comment thread tools/skill_manager_tool.py
Comment thread cron/jobs.py Outdated
- curator.py: use BACKGROUND_REVIEW constant (from tools.skill_provenance)
  instead of the bare string literal when setting _memory_write_origin,
  so it stays in sync with is_background_review() automatically.
- curator.py: add exc_info=True to all WARNING calls in _load_cron_protected
  so import/runtime errors include a traceback for easier diagnosis.
- skill_manager_tool.py: collapse three identical except branches in the
  _delete_skill cron guard into one except Exception as e with an isinstance
  check, eliminating the duplicated log call and error-dict literal.
- tests/tools/test_skill_manager_tool.py: replace manual sys.modules
  save/restore with monkeypatch.setitem for correct teardown semantics.
- cron/jobs.py: log a WARNING when _normalize_skill_list swallows TypeError
  so operators know to repair a malformed job record.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@friday-james friday-james requested a review from Copilot May 22, 2026 01:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread agent/curator.py
Comment thread tools/skill_manager_tool.py
…chain

Add a comment clarifying that conversation_loop.py reads _memory_write_origin
and calls set_current_write_origin() at the start of run_conversation(),
which is what sets the ContextVar that is_background_review() checks in
tool handlers such as _delete_skill(). The mechanism is correct; the
comment makes the indirect path visible to reviewers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@friday-james friday-james requested a review from Copilot May 22, 2026 01:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Comment thread tools/skill_manager_tool.py
Comment thread tools/skill_manager_tool.py
Comment thread tests/tools/test_skill_manager_tool.py Outdated
…ules

monkeypatch.setitem only restores sys.modules["cron.jobs"]; if the import
machinery ever sets cron.jobs on the parent package, the attribute leaks
into later tests. Patch both together so monkeypatch teardown restores
the complete import state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@friday-james friday-james requested a review from Copilot May 22, 2026 01:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread cron/jobs.py Outdated
Comment thread tools/skill_manager_tool.py
When a job record has a malformed non-iterable 'skills' field but a
valid legacy 'skill' field, the TypeError handler was dropping the
skill entirely, causing get_active_skill_refs() to miss the dependency
and potentially allowing curator to archive an in-use skill.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@friday-james friday-james requested a review from Copilot May 22, 2026 01:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread tools/skill_manager_tool.py
Comment thread cron/jobs.py
…skill_list

Asserts that a job with a non-iterable 'skills' field (e.g., an integer
from corrupted JSON) still contributes its legacy 'skill' value to
get_active_skill_refs(), and that a WARNING is logged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@friday-james friday-james requested a review from Copilot May 22, 2026 01:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Comment thread tests/agent/test_curator.py
Comment thread agent/curator.py
Comment thread tools/skill_manager_tool.py
… docstring

- test_curator.py: also patch cron.jobs on the parent package so
  monkeypatch teardown restores the full import state (same fix already
  applied to test_skill_manager_tool.py)
- _load_cron_protected() docstring: empty set means "no protected skills"
  (cron absent OR installed with no enabled skill refs), not just "absent"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cron Cron scheduler and job management P1 High — major feature broken, no workaround tool/skills Skills system (list, view, manage) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Curator may archive active skills during umbrella pass without verified consolidation (fail-open behavior)

3 participants