fix(janitor): require completion evidence before auto-closing on completed_at by Brecht-H · Pull Request #2 · Brecht-H/hermes-agent

Brecht-H · 2026-05-16T22:40:10Z

Root cause

The kanban janitor is a second phantom-done path. explicit_completion_evidence() (the completed_at_present branch) returned a CLOSE decision purely because completed_at was set on a non-terminal task — with no requirement of real result/summary/comment evidence:

if task.completed_at is not None and task.status not in TERMINAL_STATUSES:
    evidence = task.result or comments[-1].body if comments else task.result
    return "completed_at_present", shorten(evidence or "task has completed_at but non-terminal status")

build_scan() collected it and apply_closes() raw-UPDATEd status='done', stamping a tautological placeholder ("task has completed_at but non-terminal status") as the task's only "evidence". A fresh ready/triage/in_progress task that merely had completed_at stamped by a brief worker claim got rubber-stamped to done.

Live proof — `t_5dbfc384`

Task t_5dbfc384 carries a janitor_completed event closing it ~13 min after creation:

janitor_completed | {"reason": "completed_at_present", "previous_status": "in_progress",
                     "evidence": "task has completed_at but non-terminal status",
                     "source": "kanban-janitor"}

The completed event right before it had result_len: 0 — no real evidence. It had to be manually reopened and is now correctly in_review with a substantive result.

Why this is the 2nd path

This is the same pathology PR #1 fixed for complete_task() (CompletionEvidenceError). But the janitor bypasses complete_task entirely with a raw UPDATE status='done', so the PR #1 guard never fired here. This PR is the complement: it closes the janitor path.

Previously unversioned

The live janitor was a loose script at ~/.hermes/scripts/kanban_janitor.py, not in any git repo. This PR brings it under version control at scripts/kanban_janitor.py (verbatim copy of the live script as the starting point) and applies the fix to the repo copy.

The fix (`scripts/kanban_janitor.py`)

Add has_completion_evidence(task, comments) — True iff non-empty result (after strip) OR any comment with a non-empty stripped body. A bare completed_at timestamp is NOT evidence. Mirrors kanban_db._has_completion_evidence; the janitor uses its own Task/Comment dataclasses so it needs a local copy of the predicate.
explicit_completion_evidence() — the completed_at_present close decision is now gated on has_completion_evidence(...). Evidence-free completed_at tasks fall through (return None → not closed). Comment-pattern / trusted-operator checks still run normally.
build_scan() — new phantom_completed_at anomaly category (capped at 50): non-terminal, non-running tasks with completed_at set but no evidence.
render_markdown() — new "Phantom completed_at" triage section so operators see the anomaly instead of it being silently rubber-stamped.
apply_closes() unchanged — once explicit_completion_evidence stops emitting evidence-free closes, apply_closes naturally never closes them.

Tests (`tests/scripts/test_kanban_janitor_completion_evidence.py`)

10 tests, all passing:

has_completion_evidence: result-only True, comment-only True, whitespace-only False, nothing False.
explicit_completion_evidence: bare completed_at no-evidence → None; with result → ("completed_at_present", ...); with comment → ("completed_at_present", ...); whitespace-only → None.
build_scan-level: a fixture DB with a bare-completed_at no-evidence task → it appears in phantom_completed_at and NOT in close_decisions; a real-evidence task → in close_decisions, not flagged.

Verification

python3 -m py_compile scripts/kanban_janitor.py → OK
pytest tests/scripts/test_kanban_janitor_completion_evidence.py → 10 passed
Scan-only run (no --apply) against a read-only snapshot of the live kanban → 0 evidence-free completed_at_present closes, applied: 0.
Scan against a synthetic DB with an injected t_5dbfc384-style phantom task → task flagged in phantom_completed_at, absent from close_decisions, rendered in the markdown triage section. Pre-fix it would have been auto-closed.

Operator deploy step (Mac-gated)

After merge, sync the live janitor from the repo copy — deployment is an operator step, the live ~/.hermes/scripts/ file was intentionally left untouched by this PR:

cp <repo>/scripts/kanban_janitor.py ~/.hermes/scripts/kanban_janitor.py

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added a kanban janitor tool for managing Hermes kanban boards.
- Detects stale items, redundant tasks, and phantom completion records.
- Auto-completes tasks with proper evidence.
- Generates JSON and Markdown reports with optional external copy.
- Creates database backups before applying changes.
Tests
- Added test coverage for completion evidence detection logic.

coderabbitai · 2026-05-16T22:40:18Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bf200a2b-13dc-4c83-81b7-83fb38fffb5f

📥 Commits

Reviewing files that changed from the base of the PR and between 76f020e and fe490ef.

📒 Files selected for processing (2)

scripts/kanban_janitor.py
tests/scripts/test_kanban_janitor_completion_evidence.py

📝 Walkthrough

Walkthrough

A new kanban janitor script scans Hermes SQLite task boards to classify tasks by completion evidence, staleness, and redundancy. It detects safe auto-close candidates, optionally applies decisions within a transaction with DB backup, generates JSON and Markdown reports, and discovers cron scheduler metadata. Tests validate evidence detection and integration with the main scan function.

Changes

Kanban Janitor Task Cleanup

Layer / File(s)	Summary
Data model and core utilities `scripts/kanban_janitor.py` (lines 1–191)	Status constants, file paths, regex patterns for evidence detection; immutable `Task`, `Comment`, `CloseDecision` dataclasses; utility functions for timestamps, SQLite connection/row mapping, text normalization, and primary completion-evidence predicates.
Evidence classification logic `scripts/kanban_janitor.py` (lines 193–244)	Functions to detect explicit completion evidence (task result, comment patterns, trusted-operator comments), identify weak completion candidates, detect redundant/superseded tasks via references, and compute latest activity timestamps for staleness filtering.
Database scanning and task categorization `scripts/kanban_janitor.py` (lines 245–339)	`build_scan` aggregates task status counts, generates auto-close decisions with evidence validation, categorizes weak/redundant/phantom completed_at tasks, groups duplicates by normalized title, and sorts/truncates results for reporting.
Safe persistence and DB mutations `scripts/kanban_janitor.py` (lines 341–441)	`backup_db` creates timestamped copies with retention pruning; `apply_closes` backs up first, starts a transaction, re-validates each task, skips ineligible statuses, updates task fields and status, inserts janitor comments and task events, and optionally updates related `task_runs`.
Report generation and output `scripts/kanban_janitor.py` (lines 443–553)	`render_markdown` generates consolidated Markdown reports with status counts, scheduler metadata, auto-close summaries, and needs-triage sections (redundant, phantom, stale, duplicates); `write_reports` persists JSON, Markdown, JSONL history, and optional secondary Markdown copy.
Scheduler detection and CLI orchestration `scripts/kanban_janitor.py` (lines 555–677)	Discovers Hermes scheduler metadata from configured `jobs.json` files; orchestrates primary scan, optional apply/rescan, legacy DB scanning; constructs final report payload; defines argument parsing and main entrypoint.
Evidence detection unit tests `tests/scripts/test_kanban_janitor_completion_evidence.py` (lines 1–130)	Test scaffolding with dynamic imports and helper constructors; validates `has_completion_evidence` returns `True` for non-empty result or non-whitespace comments, `False` otherwise; validates `explicit_completion_evidence` returns `None` for bare `completed_at` without evidence and `completed_at_present` decision when real evidence is present.
Integration tests with database scanning `tests/scripts/test_kanban_janitor_completion_evidence.py` (lines 132–236)	SQLite schema and `_make_db` helper; tests verify `build_scan` flags tasks with bare `completed_at` and no evidence as phantom (skipping close decisions), and includes tasks with real result evidence in close decisions without phantom flags.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A janitor hops through task-board dust,
Sorting the done from the must-just-adjust,
With patterns and evidence, stale task decay—
Phantom completed_at whisked clean away!
Backups in place, transactions held tight,
Reports rendered proud in Markdown light. ✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch orion-cc/janitor-completion-evidence

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-16T22:40:32Z

🚨 CRITICAL Supply Chain Risk Detected

This PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging.

🚨 CRITICAL: Install-hook file added or modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py
skills/productivity/google-workspace/scripts/setup.py

Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting.

github-actions · 2026-05-16T22:40:44Z

🔎 Lint report: `orion-cc/janitor-completion-evidence` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8340 on HEAD, 8340 on base (➖ 0)

🆕 New issues (3):

Rule	Count
`invalid-argument-type`	3

First entries

run_agent.py:7736: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown | str, Unknown | str | dict[str, str]] | Any | ... omitted 3 union elements`
run_agent.py:14031: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown | str, Unknown | str | dict[str, str]] | Any | ... omitted 3 union elements`
run_agent.py:14034: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown | str, Unknown | str | dict[str, str]] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`

✅ Fixed issues (3):

Rule	Count
`invalid-argument-type`	3

First entries

run_agent.py:14031: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
run_agent.py:14034: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:7736: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`

Unchanged: 4352 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Brecht-H · 2026-05-16T22:41:19Z

⚠️ CI bypassed — GitHub Actions billing failure (org-wide)

Same account-level billing block affecting every Brecht-H PR. Local verification:

py_compile scripts/kanban_janitor.py → ✅ OK
pytest tests/scripts/test_kanban_janitor_completion_evidence.py → ✅ 10 passed
Scan-only run vs a read-only snapshot of the live kanban (411 tasks) → 0 evidence-free completed_at_present closes.
Scan vs a synthetic DB with an injected t_5dbfc384-style phantom (in_progress, completed_at set, result=NULL, 0 comments) → correctly flagged in phantom_completed_at, absent from close_decisions. Pre-fix it would have been auto-closed to done.

The fix logic is ~15 lines (verbatim-copy of the previously-unversioned janitor + the evidence gate); diffed against the live ~/.hermes/scripts/kanban_janitor.py to confirm only the intended change.

Workaround by orion-cc. Deploy after merge = Mac-gated sync to ~/.hermes/scripts/kanban_janitor.py.

…leted_at The kanban janitor is a SECOND phantom-done path. Its explicit_completion_evidence() completed_at_present branch returned a CLOSE decision purely because completed_at was set — with NO requirement of a real result/summary/comment. build_scan() collected it and apply_closes() raw-UPDATEd status='done', stamping a tautological placeholder ("task has completed_at but non-terminal status") as the task's only "evidence". A brief worker claim that stamps completed_at on a fresh ready/triage/in_progress task got rubber-stamped to done. Live proof: task t_5dbfc384 carries a janitor_completed event closing it ~13 min after creation on completed_at_present with result_len=0; it had to be manually reopened and is now correctly in_review with a real result. This is the same pathology hermes-agent PR #1 fixed for complete_task() (CompletionEvidenceError) — but the janitor bypasses complete_task entirely with a raw UPDATE. Changes (scripts/kanban_janitor.py): - Add has_completion_evidence(task, comments): True iff non-empty result OR any non-empty comment body. A bare completed_at is NOT evidence. Mirrors kanban_db._has_completion_evidence (janitor has its own dataclasses so it needs a local copy of the predicate). - explicit_completion_evidence(): the completed_at_present close decision is now gated on has_completion_evidence; evidence-free completed_at tasks fall through (return None -> not closed). - build_scan(): new phantom_completed_at anomaly category (capped 50) for non-terminal, non-running tasks with completed_at but no evidence. - render_markdown(): new "Phantom completed_at" triage section so operators see the anomaly instead of it being silently rubber-stamped. - apply_closes() unchanged — once evidence-free closes stop being emitted, it naturally never closes them. The janitor was previously an unversioned loose script at ~/.hermes/scripts/kanban_janitor.py; this commit brings it under version control. Operator deploy step (Mac-gated): after merge, sync ~/.hermes/scripts/kanban_janitor.py from the repo copy. Confidence: high Scope-risk: moderate Machine: orion-terminal Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Brecht-H marked this pull request as ready for review May 16, 2026 22:48

Brecht-H force-pushed the orion-cc/janitor-completion-evidence branch from b22a106 to fe490ef Compare May 16, 2026 22:53

Brecht-H merged commit 6a38004 into main May 16, 2026
13 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(janitor): require completion evidence before auto-closing on completed_at#2

fix(janitor): require completion evidence before auto-closing on completed_at#2
Brecht-H merged 1 commit into
mainfrom
orion-cc/janitor-completion-evidence

Brecht-H commented May 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 16, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026 •

edited

Loading

Uh oh!

Brecht-H commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Brecht-H commented May 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause

Live proof — t_5dbfc384

Why this is the 2nd path

Previously unversioned

The fix (scripts/kanban_janitor.py)

Tests (tests/scripts/test_kanban_janitor_completion_evidence.py)

Verification

Operator deploy step (Mac-gated)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

github-actions Bot commented May 16, 2026

🚨 CRITICAL Supply Chain Risk Detected

🚨 CRITICAL: Install-hook file added or modified

Uh oh!

github-actions Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: orion-cc/janitor-completion-evidence vs origin/main

ruff

ty (type checker)

Uh oh!

Brecht-H commented May 16, 2026

⚠️ CI bypassed — GitHub Actions billing failure (org-wide)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Brecht-H commented May 16, 2026 •

edited by coderabbitai Bot

Loading

Live proof — `t_5dbfc384`

The fix (`scripts/kanban_janitor.py`)

Tests (`tests/scripts/test_kanban_janitor_completion_evidence.py`)

coderabbitai Bot commented May 16, 2026 •

edited

Loading

github-actions Bot commented May 16, 2026 •

edited

Loading

🔎 Lint report: `orion-cc/janitor-completion-evidence` vs `origin/main`