v2.30.0

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance

Highlights

Nine new lifecycle skills let the agent handle bootstrapping, dependency audits, design reviews, performance analysis, refactoring, code review, scaffolding, and testing without manual invocation. A new ao harvest command pulls learnings from sibling workspaces so knowledge compounds across your entire multi-agent fleet, not just one repo. Context debugging is easier with ao context packet, and the hook system now formally supports both Claude Code and Codex runtimes.

What's New

9 lifecycle skills — bootstrap, deps, design, harvest, perf, refactor, review, scaffold, and test are now part of the RPI workflow with automatic invocation and mechanical gates
Cross-rig knowledge harvesting — ao harvest extracts and catalogs learnings from sibling crew workspaces so insights travel between agents
Context packet inspector — ao context packet lets you debug what inter-session handoff state the agent actually sees
Dual-runtime hook support — Hooks now have a formal runtime contract covering Claude Code, Codex, and manual execution modes

All Changes

Added

Nine lifecycle skills wired into the RPI workflow with auto-invocation
Cross-rig knowledge consolidation via ao harvest
Context packet inspection via ao context packet
Hook runtime contract with Claude/Codex/manual event mapping
Research provenance tracking on pending learnings
Context declarations for inject, provenance, and rpi skills
Evidence-backed output templates for goals and product commands

Changed

Documentation reframed around three-gap context lifecycle model
Hook docs updated with runtime modes table for dual-runtime support

Fixed

Four pre-existing CI failures resolved
Lookup retrieval gaps that caused empty results
Embedded file sync on first session start
Closure integrity with 24h grace window for evidence timing
Skill lint compliance across vibe, post-mortem, crank, and plan
Codex tool naming rule and five Claude-era tool references
ASCII diagram consistency across 23 documentation files
Fork exhaustion in validation script replaced with lightweight parser

Full changelog

Added

9 lifecycle skills — bootstrap, deps, design, harvest, perf, refactor, review, scaffold, and test skills wired into RPI with auto-invocation and mechanical gates
ao harvest — cross-rig knowledge consolidation extracts and catalogs learnings from sibling crew workspaces
ao context packet — inspect stigmergic context packets for debugging inter-session handoff state
Hook runtime contract — formal Claude/Codex/manual event mapping with runtime-aware hook tooling
Evidence-driven skill enrichment — production meta-knowledge, anti-patterns, flywheel metrics, and normalization defect detection baked into 9 skill reference files
Research provenance — pending learnings now carry full research provenance for discoverability and citation tracking
Context declarations — inject, provenance, and rpi skills declare their context requirements explicitly
Goals and product output templates — /goals and /product produce evidence-backed structured output

Changed

Three-gap context lifecycle contract — README, PRODUCT.md, positioning docs, and operational guides reframed around the context lifecycle model
Dual-runtime hook documentation — runtime modes table and troubleshooting updated for Claude + Codex hook coexistence

Fixed

CI reliability — resolved 4 pre-existing CI failures, restored headless runtime preflight, repaired codex parity drift checks
ao lookup retrieval — fixed retrieval gaps that caused lookup to return no results
Embedded sync — using-agentops SKILL.md and .agents/.gitignore now written correctly on first session start
Closure integrity — 24h grace window for close-before-commit evidence, normalized file parsing
Skill lint compliance — vibe, post-mortem, crank, and plan skills trimmed or restructured to stay under 800-line limit
Codex tool naming — added CLAUDE_TOOL_NAMING rule and fixed 5 Claude-era tool references in codex skills
ASCII diagram consistency — aligned box-drawing characters across 23 documentation files
Fork exhaustion prevention — replaced jq with awk in validate-go-fast to prevent fork bombs on large repos

Full Changelog: v2.30.0...v2.31.0

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance

v2.30.0 — Codex hookless lifecycle, PROGRAM.md workflows, and stronger long-running RPI runs

Highlights

AgentOps now handles Codex hookless sessions more cleanly, gives autonomous workflows a clearer PROGRAM.md contract, and makes long-running RPI runs much easier to inspect. This release also hardens the local release and validation path itself, so the same gate stack you rely on for shipping is more trustworthy under headless and generated-artifact-heavy workflows.

What's New

Hookless Codex lifecycle support — Codex sessions can now run through startup, follow-up, validation, and closeout without depending on legacy hook assumptions.
PROGRAM.md for autonomous work — Autodev and evolve flows now share a concrete program contract instead of relying on looser ad hoc context.
Artifact-aware long RPI runs — Mission control now shows run artifacts and evaluator output so you can inspect what happened during multi-phase autonomous runs.
More reliable release validation — Headless runtime checks, reverse-engineer hygiene, and release-gate coverage are more deterministic.

All Changes

Added

Hookless Codex lifecycle support across CLI commands and skill orchestration
A first-class PROGRAM.md contract for autodev and evolve-driven workflows
Artifact and evaluator visibility for long-running RPI sessions

Changed

Codex bundle maintenance, lifecycle guidance, and release validation coverage around the expanded Codex execution path

Fixed

Codex RPI scope and closeout issues that caused follow-up and validation drift
Release-gate regressions in headless runtime validation and learning coherence
Reverse-engineer repo scans so generated or temporary trees no longer contaminate detected CLI surfaces

Full changelog

Added

Codex hookless lifecycle support — ao codex runtime commands, lifecycle fallback, and Codex skill orchestration now cover hookless sessions end to end
PROGRAM.md autodev contract — Added a first-class PROGRAM.md contract for autodev flows and taught /evolve and related RPI paths to use it
Long-running RPI artifact visibility — Mission control now exposes run artifacts and evaluator output so long-running RPI sessions are replayable and easier to inspect

Changed

Codex runtime maintenance flow — Refreshed Codex bundle hashes, lifecycle guards, runtime docs, and release validation coverage around the expanded Codex execution path

Fixed

Codex RPI scoping and closeout — Tightened objective scope, epic scope, closeout ownership, and validation gaps in the Codex RPI lifecycle
Release gate reliability — Restored headless runtime coverage, runtime-aware Claude inventory checks, and release-gate coherence validation
Reverse-engineer repo hygiene — Repo-mode reverse engineer now ignores generated and temp trees when identifying CLI and module surfaces

Full Changelog: v2.29.0...v2.30.0

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance

v2.29.0 — Config control, broader search, and stronger flywheel proof

Highlights

AgentOps now gives you more control over model spend, a broader default search path, and a tighter proof path for the knowledge flywheel. You can assign agent models by cost tier through config, ao search now pulls from both repo-local knowledge and upstream session history, and the flywheel claim is backed by deterministic proof fixtures instead of manual spot checks.

What's New

Per-agent model routing — ao config now supports model cost tiers and direct config writes, so teams can tune quality and spend without manual file edits.
Broader default search — ao search now brokers across upstream cass history and repo-local AgentOps artifacts instead of making you choose one surface up front.
Stronger flywheel evidence — Close-loop validation now preserves research provenance and uses executable proof fixtures plus artifact-specific citation feedback.
Richer review guidance — Council, research, swarm, vibe, athena, and post-mortem picked up new reference packs for reviewer routing, retrieval patterns, and write-time quality checks.

All Changes

Added

Model cost tiers and direct config writes for per-agent routing
Search brokerage across session history and repo-local knowledge
New reference packs for reviewer routing, iterative retrieval, confidence scoring, conflict recovery, and write-time quality

Changed

Comparison docs, command docs, and release smoke coverage around the expanded search and config surface

Fixed

Flywheel proof, citation feedback, and closure reporting now agree on actual state
Search stays aligned with forged session history and fallback behavior
Pre-push and release validation is more deterministic under hook-launched git environments
Council profile docs are synced between source and checked-in Codex artifacts

Full changelog

Added

Model cost tiers and config writes — ao config can now assign per-agent models by cost tier and persist repo configuration changes directly
Search brokerage over session history and repo knowledge — ao search now wraps upstream cass results with repo-local AgentOps artifacts by default
Reviewer and post-mortem reference packs — Added model-routing, iterative-retrieval, confidence-scoring, write-time-quality, and conflict-recovery guidance across council, research, swarm, vibe, athena, and related skills

Changed

Competitive comparison and CLI docs — Refreshed comparison docs, release smoke coverage, and command documentation around the expanded search/config surface

Fixed

Flywheel proof and citation loop — Added deterministic proof fixtures, preserved exact research provenance, and made citation feedback artifact-specific so flywheel health reflects real closure state
Search alignment with forged session history — Search now stays aligned with forged session artifacts and fallback behavior
Hook-launched validation — Pre-push and release gates now isolate inherited git env/stdin correctly and cover newer hook scripts in integration tests
Codex council profile parity — Source and checked-in Codex council docs are back in sync for the shared profile contract

Full Changelog: v2.28.0...v2.29.0

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance

v2.28.0 — Competitive Feature Integration

Five features adopted from reverse-engineering GSD v1.27 and Compound Engineer v2.47:

Highlights

Smarter failure recovery — Crank now classifies failures and auto-recovers (retry, decompose, or escalate) instead of blindly retrying
Knowledge stays clean — Athena defrag runs at every session end, pruning stale artifacts automatically
Per-project review config — Drop a .agents/reviewer-config.md to control which council judges run
Right-sized plans — Plans auto-scale detail level (minimal/standard/deep) based on complexity
Red-team your ideas — Brainstorm now stress-tests every approach before you choose

All Changes

See CHANGELOG.md for the complete list.

Full changelog

Added

Node repair operator — Crank now classifies task failures as RETRY (transient), DECOMPOSE (too complex), or PRUNE (blocked) with budget-controlled recovery
Knowledge refresh auto-trigger — Lightweight athena defrag runs automatically at session end via new SessionEnd hook
Configurable review agents — Project-level .agents/reviewer-config.md controls which judge perspectives council and vibe spawn
Three-tier plan detail scaling — Plan auto-selects Minimal, Standard, or Deep templates based on issue count and complexity
Adversarial ideation — Brainstorm Phase 3b stress-tests each approach with four red-team questions before user selection

Fixed

Crank SKILL.md line limit — Consolidated duplicate References sections to stay under 800-line skill lint limit
Codex skill parity — Synced all five competitive features to skills-codex with reference file copies

Full Changelog: v2.27.1...v2.28.0

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance

v2.27.1 — Hotfix: Flywheel golden signals now visible by default

The flywheel status was telling you everything was fine while hiding the full picture behind an opt-in flag. ao flywheel status said "COMPOUNDING" but the golden signals analysis (hidden behind --golden) said "accumulating." Now golden signals always compute and display — no more misleading status.

What changed

Golden signals always shown — ao flywheel status now includes the four golden signals (velocity trend, citation pipeline, research closure, reuse concentration) and the overall verdict in every output format (table, JSON, YAML).
--golden flag deprecated — Kept for backward compatibility but now a no-op (hidden from help).

Full changelog

See CHANGELOG.md for complete details.

Full changelog

Fixed

Flywheel golden signals always shown — Golden signals were gated behind --golden flag, causing ao flywheel status to report "COMPOUNDING" while the hidden golden signals analysis showed "accumulating". Golden signals now compute and display by default.

Full Changelog: v2.27.0...v2.27.1

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance

Highlights

The knowledge flywheel now tells you whether it's actually working. Four golden signals answer the question every agent operator asks: is my knowledge compounding, or just collecting dust?

ao flywheel status --golden

What's New

Golden Signals for Flywheel Health

Four health indicators that go beyond escape velocity (σρ > δ):

Signal	Question It Answers
Velocity Trend	Is σρ−δ increasing over time, or sliding back?
Citation Pipeline	Are citations actually delivering value, or just noise?
Research Closure	Is research being mined into learnings, or hoarded?
Reuse Concentration	Is the whole knowledge pool active, or just a few items?

Each signal produces a verdict. Three or more healthy signals = compounding. Three or more critical = decaying. Mixed = accumulating — you know what to fix.

Forge-to-Pool Bridge

Forge now auto-writes pending learnings to .agents/knowledge/pending/ — closing the last manual gap in the flywheel loop. Knowledge flows from session → forge → pool → learnings → inject without intervention.

Session-Start Citation Priming

ao lookup runs at session start, surfacing relevant knowledge and creating the citation events that drive the feedback loop.

All Changes

Added

Flywheel golden signals (ao flywheel status --golden)
Forge-to-pool bridge for close-loop knowledge ingestion
SessionStart citation priming via ao lookup
Skill catalog quality improvements (descriptions, extraction, references)

Fixed

.agents/.gitignore scope — replaced broad !*/ with explicit subdirectory list
Codex runtime skill parity hardening
Codex install smoke test assertions

Changed

CLI reference docs regenerated

Full Changelog: v2.26.1...v2.27.0

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance

v2.26.1 — DAG-ify orchestrator skills

Hotfix: /rpi was stopping after implementation (Phase 2) without running validation (Phase 3). The execution steps were spread across prose sections with ### headings that created natural LLM stopping points.

Highlights

RPI now runs all three phases reliably. The execution sequence for /rpi, /discovery, and /validation is encoded as a compact DAG code block — no section breaks between steps, no natural stopping points for the LLM.
-577 lines across 6 skill files (3 source + 3 codex variants). Less prose, more program.

What's New

Fixed

/rpi stops after Phase 2 — restructured as compact DAG
/discovery and /validation restructured to match
Test patterns updated for new heading format

Changed

GOALS.md rebuilt from first principles
README leads with moats, progressive disclosure
CLI reference docs regenerated
Doctor + findings helper test coverage added

Full changelog

See CHANGELOG.md for the complete v2.26.1 entry.

Full changelog

Fixed

RPI stops after Phase 2 — Restructured rpi, discovery, and validation orchestrator skills as compact DAGs with execution sequence in a single code block; eliminates LLM stopping between phases due to ### section headings acting as natural breakpoints
Test grep patterns for DAG headings — Updated test-tuning-defaults.sh to match new complexity-scaled gate headings after DAG restructure

Changed

Goals reimagined — GOALS.md rebuilt from first principles with fitness gate fixes
README progressive disclosure — Lead with moats, collapse detail into expandable sections
CLI reference docs — Regenerated with updated date stamps
Doctor + findings helpers — Added CLI test coverage for extracted helpers

Full Changelog: v2.26.0...v2.26.1

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance

v2.26.0 Release Notes

Highlights

Test pyramid expanded to BF1–BF9 — Four new bug-finding levels cover regression replay, performance benchmarks, backward compatibility, and security-in-test patterns
Language-specific test patterns — Go and Python standards now include concrete examples for every new BF level
Codex audit: 60+ fixes — Orphaned references removed, lint warnings resolved, manifest hashes regenerated across all 54 Codex skills

What's New

BF6 (Regression): Bug-specific replay tests with ID-based naming (TestBug_AG_XYZ_... / test_bug_ag_xyz_...)
BF7 (Performance): Benchmark patterns using Go testing.B and Python pytest-benchmark
BF8 (Backward Compatibility): Fixture corpus approach with testdata/compat/ (Go) and tests/fixtures/compat/ (Python)
BF9 (Security): In-test secrets redaction and path traversal rejection patterns
Decision tree extended with 4 new routing questions
RPI phase mapping updated: bug fix mandates BF6, hot-path mandates BF7, format changes mandate BF8, secrets mandate BF9
regen-codex-hashes.sh script for Codex manifest maintenance

All Changes

Full changelog

Added

BF6–BF9 test pyramid levels with language-specific Go and Python patterns
Test pyramid decision tree expansion (4 new routing questions)
RPI phase mapping for BF6–BF9
regen-codex-hashes.sh manifest hash regeneration script

Changed

Go standards: benchmark, backward compat, regression, security test patterns
Python standards: Hypothesis, pytest-benchmark, compat fixtures, regression, security patterns
Coverage assessment template extended from BF1–BF5 to BF1–BF9

Fixed

Codex skill audit: 60+ findings across 54 skills
Skill lint warnings in crank, rpi, recover
README skill references and orphaned templates
Skill linter refs in reverse-engineer-rpi

Full Changelog: See CHANGELOG.md

Full changelog

Added

BF6–BF9 test pyramid levels — Regression (bug-specific replay), Performance/Benchmark, Backward Compatibility, and Security (in-test) bug-finding levels with language-specific patterns for Go and Python
Test pyramid decision tree expansion — 4 new routing questions for BF6–BF9 in the "When to Use" guide
RPI phase mapping for BF6–BF9 — Bug fix → BF6 mandatory, hot-path → BF7 benchmark, format change → BF8 compat fixture, secrets → BF9 redaction tests
regen-codex-hashes.sh — Manifest hash regeneration script for Codex skill maintenance

Changed

Go standards — Added benchmark tests (BF7), backward compat with testdata/compat/ (BF8), regression test naming convention (BF6), security tests for path traversal (BF9)
Python standards — Added Hypothesis property-based testing (BF1), pytest-benchmark patterns (BF7), backward compat with parametrized fixtures (BF8), regression test naming (BF6), secrets redaction tests (BF9)
Coverage assessment template — Extended BF pyramid table from BF1–BF5 to BF1–BF9

Fixed

Codex skill audit — 60+ findings fixed across all 54 Codex skills; removed orphaned claude-code-latest-features.md and claude-cli-verified-commands.md references
Skill lint warnings — Resolved all warnings in crank, rpi, recover skills
README skill references — Corrected broken references and linked orphaned templates
Skill linter refs — Fixed directory reference and backtick formatting in reverse-engineer-rpi

Full Changelog: v2.25.1...v2.26.0

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance

Fixed

Codex BF pyramid parity — Synced BF1/BF2/BF4 bug-finding level selection into skills-codex implement, post-mortem, and validation skills
Codex Claude backend cross-contamination — Removed orphaned backend-claude-teams.md files (Claude primitives: TeamCreate, SendMessage) from 4 Codex skills (council, research, shared, swarm)
Dead converter rule — Removed stale sed substitution for backend-claude-teams.md rename in converter script
Swarm reference integrity — Added Reference Documents section to swarm SKILL.md; updated validate.sh to check only Codex-native backend references

Full Changelog: v2.25.0...v2.25.1

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance

v2.25.0 — Test Pyramid + Autonomous Execution

Highlights

Test pyramid baked into every RPI phase. L0–L7 levels flow from discovery through post-mortem. Agents own L0–L3 autonomously; L4+ requires human input. Plans include test level metadata, pre-mortem validates coverage, post-mortem reports gaps.
RPI and Evolve now run fully autonomous by default. No human questions between phases. Three-Phase Rule enforces discovery → implementation → validation as a single uninterrupted flow. Anti-pattern tables catch 13 common failure modes.
Codex skill infrastructure matures. New API contract, DAG-based smoke test for 54 skills, durable overrides for crank/swarm/council, and a complete standards reference for Codex skill authoring.

What's New

Test Pyramid Standard

A shared reference (test-pyramid.md) defines 8 test levels with clear agent autonomy boundaries. Every RPI phase now knows which test levels to scope, plan, write, and validate.

Autonomous Execution

/rpi and /evolve enforce hands-free execution. No pausing to ask, no stopping after implementation, no narrating plans. The human touchpoint is the final report after all three phases complete.

Codex Platform

Output contracts on verdict skills, a conformance validator, and converter improvements that properly strip Claude primitives instead of mapping to non-existent tools.

All Changes

See CHANGELOG.md for the complete list of 24 commits.

Full changelog

Added

L0–L7 test pyramid standard — Shared reference doc (standards/references/test-pyramid.md) defining 8 test levels, agent autonomy boundaries (L0–L3 autonomous, L4+ human-guided), and RPI phase mapping
Test pyramid integration across RPI lifecycle — Discovery identifies test levels, plan classifies tests by level, pre-mortem validates coverage, implement selects TDD level, crank carries test_levels metadata, validation audits coverage, post-mortem reports gaps
RPI autonomous execution enforcement — Three-Phase Rule mandates discovery → implementation → validation without human interruption; anti-patterns table documents 7 failure modes
Evolve autonomous execution enforcement — Each cycle runs a complete 3-phase /rpi --auto; anti-patterns table documents 6 failure modes; large work decomposed into sub-RPI cycles
Codex skill standard — New standards/references/codex-skill.md with tool mapping, prohibited primitives, two-phase validation, DAG-first traversal, and prompt constraint boundaries
Codex-native overrides — Durable overrides for crank, swarm, council that survive regeneration
DAG-based Codex smoke test — scripts/smoke-test-codex-skills.sh validates 54 skills with dependency-ordered traversal
Codex skill API contract — docs/contracts/codex-skill-api.md with conformance validator
Output contract declarations — output_contract field on council, vibe, pre-mortem, research skills with canonical finding-item schema

Changed

Codex converter rewrite — Strips Claude primitives instead of mapping to unavailable tools; rewrites reference files through codex_rewrite_text
CI pipeline — Removed codex skill parity check (skills-codex/ now manually maintained); fixed shellcheck and embedded sync issues

Fixed

Converter primitive stripping — Task primitives (TaskCreate, TeamCreate, SendMessage) properly stripped instead of mapped to non-existent Codex equivalents
Embedded hook sync — Added missing test-pyramid.md and codex-skill.md to CLI embedded references
ShellCheck SC1125 — Fixed em-dash in shellcheck disable directive in smoke test script
Skill line limits — Moved verbose autonomy rules to reference files to stay under tier-specific line budgets

Full Changelog: v2.24.0...v2.25.0

Releases: boshu2/agentops

v2.31.0

Highlights

What's New

All Changes

Added

Changed

Fixed

Added

Changed

Fixed

Uh oh!

v2.30.0 — Codex hookless lifecycle, PROGRAM.md workflows, and stronger long-running RPI runs

Highlights

What's New

All Changes

Added

Changed

Fixed

Added

Changed

Fixed

Uh oh!

v2.29.0

v2.29.0 — Config control, broader search, and stronger flywheel proof

Highlights

What's New

All Changes

Added

Changed

Fixed

Added

Changed

Fixed

Uh oh!

v2.28.0

v2.28.0 — Competitive Feature Integration

Highlights

All Changes

Added

Fixed

Uh oh!

v2.27.1

v2.27.1 — Hotfix: Flywheel golden signals now visible by default

What changed

Full changelog

Fixed

Uh oh!

v2.27.0

Highlights

What's New

Golden Signals for Flywheel Health

Forge-to-Pool Bridge

Session-Start Citation Priming

All Changes

Added

Fixed

Changed

Uh oh!

v2.26.1

v2.26.1 — DAG-ify orchestrator skills

Highlights

What's New

Fixed

Changed

Fixed

Changed

Uh oh!

v2.26.0

v2.26.0 Release Notes

Highlights

What's New

All Changes

Added

Changed

Fixed

Added

Changed

Fixed