Releases: boshu2/agentops
v2.31.0
brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance
Highlights
Nine new lifecycle skills let the agent handle bootstrapping, dependency audits, design reviews, performance analysis, refactoring, code review, scaffolding, and testing without manual invocation. A new ao harvest command pulls learnings from sibling workspaces so knowledge compounds across your entire multi-agent fleet, not just one repo. Context debugging is easier with ao context packet, and the hook system now formally supports both Claude Code and Codex runtimes.
What's New
- 9 lifecycle skills — bootstrap, deps, design, harvest, perf, refactor, review, scaffold, and test are now part of the RPI workflow with automatic invocation and mechanical gates
- Cross-rig knowledge harvesting —
ao harvestextracts and catalogs learnings from sibling crew workspaces so insights travel between agents - Context packet inspector —
ao context packetlets you debug what inter-session handoff state the agent actually sees - Dual-runtime hook support — Hooks now have a formal runtime contract covering Claude Code, Codex, and manual execution modes
All Changes
Added
- Nine lifecycle skills wired into the RPI workflow with auto-invocation
- Cross-rig knowledge consolidation via
ao harvest - Context packet inspection via
ao context packet - Hook runtime contract with Claude/Codex/manual event mapping
- Research provenance tracking on pending learnings
- Context declarations for inject, provenance, and rpi skills
- Evidence-backed output templates for goals and product commands
Changed
- Documentation reframed around three-gap context lifecycle model
- Hook docs updated with runtime modes table for dual-runtime support
Fixed
- Four pre-existing CI failures resolved
- Lookup retrieval gaps that caused empty results
- Embedded file sync on first session start
- Closure integrity with 24h grace window for evidence timing
- Skill lint compliance across vibe, post-mortem, crank, and plan
- Codex tool naming rule and five Claude-era tool references
- ASCII diagram consistency across 23 documentation files
- Fork exhaustion in validation script replaced with lightweight parser
Full changelog
Added
- 9 lifecycle skills — bootstrap, deps, design, harvest, perf, refactor, review, scaffold, and test skills wired into RPI with auto-invocation and mechanical gates
ao harvest— cross-rig knowledge consolidation extracts and catalogs learnings from sibling crew workspacesao context packet— inspect stigmergic context packets for debugging inter-session handoff state- Hook runtime contract — formal Claude/Codex/manual event mapping with runtime-aware hook tooling
- Evidence-driven skill enrichment — production meta-knowledge, anti-patterns, flywheel metrics, and normalization defect detection baked into 9 skill reference files
- Research provenance — pending learnings now carry full research provenance for discoverability and citation tracking
- Context declarations — inject, provenance, and rpi skills declare their context requirements explicitly
- Goals and product output templates —
/goalsand/productproduce evidence-backed structured output
Changed
- Three-gap context lifecycle contract — README, PRODUCT.md, positioning docs, and operational guides reframed around the context lifecycle model
- Dual-runtime hook documentation — runtime modes table and troubleshooting updated for Claude + Codex hook coexistence
Fixed
- CI reliability — resolved 4 pre-existing CI failures, restored headless runtime preflight, repaired codex parity drift checks
ao lookupretrieval — fixed retrieval gaps that caused lookup to return no results- Embedded sync — using-agentops SKILL.md and
.agents/.gitignorenow written correctly on first session start - Closure integrity — 24h grace window for close-before-commit evidence, normalized file parsing
- Skill lint compliance — vibe, post-mortem, crank, and plan skills trimmed or restructured to stay under 800-line limit
- Codex tool naming — added CLAUDE_TOOL_NAMING rule and fixed 5 Claude-era tool references in codex skills
- ASCII diagram consistency — aligned box-drawing characters across 23 documentation files
- Fork exhaustion prevention — replaced jq with awk in validate-go-fast to prevent fork bombs on large repos
Full Changelog: v2.30.0...v2.31.0
v2.30.0
brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance
v2.30.0 — Codex hookless lifecycle, PROGRAM.md workflows, and stronger long-running RPI runs
Highlights
AgentOps now handles Codex hookless sessions more cleanly, gives autonomous workflows a clearer PROGRAM.md contract, and makes long-running RPI runs much easier to inspect. This release also hardens the local release and validation path itself, so the same gate stack you rely on for shipping is more trustworthy under headless and generated-artifact-heavy workflows.
What's New
- Hookless Codex lifecycle support — Codex sessions can now run through startup, follow-up, validation, and closeout without depending on legacy hook assumptions.
PROGRAM.mdfor autonomous work — Autodev and evolve flows now share a concrete program contract instead of relying on looser ad hoc context.- Artifact-aware long RPI runs — Mission control now shows run artifacts and evaluator output so you can inspect what happened during multi-phase autonomous runs.
- More reliable release validation — Headless runtime checks, reverse-engineer hygiene, and release-gate coverage are more deterministic.
All Changes
Added
- Hookless Codex lifecycle support across CLI commands and skill orchestration
- A first-class
PROGRAM.mdcontract for autodev and evolve-driven workflows - Artifact and evaluator visibility for long-running RPI sessions
Changed
- Codex bundle maintenance, lifecycle guidance, and release validation coverage around the expanded Codex execution path
Fixed
- Codex RPI scope and closeout issues that caused follow-up and validation drift
- Release-gate regressions in headless runtime validation and learning coherence
- Reverse-engineer repo scans so generated or temporary trees no longer contaminate detected CLI surfaces
Full changelog
Added
- Codex hookless lifecycle support —
ao codexruntime commands, lifecycle fallback, and Codex skill orchestration now cover hookless sessions end to end - PROGRAM.md autodev contract — Added a first-class
PROGRAM.mdcontract for autodev flows and taught/evolveand related RPI paths to use it - Long-running RPI artifact visibility — Mission control now exposes run artifacts and evaluator output so long-running RPI sessions are replayable and easier to inspect
Changed
- Codex runtime maintenance flow — Refreshed Codex bundle hashes, lifecycle guards, runtime docs, and release validation coverage around the expanded Codex execution path
Fixed
- Codex RPI scoping and closeout — Tightened objective scope, epic scope, closeout ownership, and validation gaps in the Codex RPI lifecycle
- Release gate reliability — Restored headless runtime coverage, runtime-aware Claude inventory checks, and release-gate coherence validation
- Reverse-engineer repo hygiene — Repo-mode reverse engineer now ignores generated and temp trees when identifying CLI and module surfaces
Full Changelog: v2.29.0...v2.30.0
v2.29.0
brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance
v2.29.0 — Config control, broader search, and stronger flywheel proof
Highlights
AgentOps now gives you more control over model spend, a broader default search path, and a tighter proof path for the knowledge flywheel. You can assign agent models by cost tier through config, ao search now pulls from both repo-local knowledge and upstream session history, and the flywheel claim is backed by deterministic proof fixtures instead of manual spot checks.
What's New
- Per-agent model routing —
ao confignow supports model cost tiers and direct config writes, so teams can tune quality and spend without manual file edits. - Broader default search —
ao searchnow brokers across upstreamcasshistory and repo-local AgentOps artifacts instead of making you choose one surface up front. - Stronger flywheel evidence — Close-loop validation now preserves research provenance and uses executable proof fixtures plus artifact-specific citation feedback.
- Richer review guidance — Council, research, swarm, vibe, athena, and post-mortem picked up new reference packs for reviewer routing, retrieval patterns, and write-time quality checks.
All Changes
Added
- Model cost tiers and direct config writes for per-agent routing
- Search brokerage across session history and repo-local knowledge
- New reference packs for reviewer routing, iterative retrieval, confidence scoring, conflict recovery, and write-time quality
Changed
- Comparison docs, command docs, and release smoke coverage around the expanded search and config surface
Fixed
- Flywheel proof, citation feedback, and closure reporting now agree on actual state
- Search stays aligned with forged session history and fallback behavior
- Pre-push and release validation is more deterministic under hook-launched git environments
- Council profile docs are synced between source and checked-in Codex artifacts
Full changelog
Added
- Model cost tiers and config writes —
ao configcan now assign per-agent models by cost tier and persist repo configuration changes directly - Search brokerage over session history and repo knowledge —
ao searchnow wraps upstreamcassresults with repo-local AgentOps artifacts by default - Reviewer and post-mortem reference packs — Added model-routing, iterative-retrieval, confidence-scoring, write-time-quality, and conflict-recovery guidance across council, research, swarm, vibe, athena, and related skills
Changed
- Competitive comparison and CLI docs — Refreshed comparison docs, release smoke coverage, and command documentation around the expanded search/config surface
Fixed
- Flywheel proof and citation loop — Added deterministic proof fixtures, preserved exact research provenance, and made citation feedback artifact-specific so flywheel health reflects real closure state
- Search alignment with forged session history — Search now stays aligned with forged session artifacts and fallback behavior
- Hook-launched validation — Pre-push and release gates now isolate inherited git env/stdin correctly and cover newer hook scripts in integration tests
- Codex council profile parity — Source and checked-in Codex council docs are back in sync for the shared profile contract
Full Changelog: v2.28.0...v2.29.0
v2.28.0
brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance
v2.28.0 — Competitive Feature Integration
Five features adopted from reverse-engineering GSD v1.27 and Compound Engineer v2.47:
Highlights
- Smarter failure recovery — Crank now classifies failures and auto-recovers (retry, decompose, or escalate) instead of blindly retrying
- Knowledge stays clean — Athena defrag runs at every session end, pruning stale artifacts automatically
- Per-project review config — Drop a
.agents/reviewer-config.mdto control which council judges run - Right-sized plans — Plans auto-scale detail level (minimal/standard/deep) based on complexity
- Red-team your ideas — Brainstorm now stress-tests every approach before you choose
All Changes
See CHANGELOG.md for the complete list.
Full changelog
Added
- Node repair operator — Crank now classifies task failures as RETRY (transient), DECOMPOSE (too complex), or PRUNE (blocked) with budget-controlled recovery
- Knowledge refresh auto-trigger — Lightweight athena defrag runs automatically at session end via new SessionEnd hook
- Configurable review agents — Project-level
.agents/reviewer-config.mdcontrols which judge perspectives council and vibe spawn - Three-tier plan detail scaling — Plan auto-selects Minimal, Standard, or Deep templates based on issue count and complexity
- Adversarial ideation — Brainstorm Phase 3b stress-tests each approach with four red-team questions before user selection
Fixed
- Crank SKILL.md line limit — Consolidated duplicate References sections to stay under 800-line skill lint limit
- Codex skill parity — Synced all five competitive features to skills-codex with reference file copies
Full Changelog: v2.27.1...v2.28.0
v2.27.1
brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance
v2.27.1 — Hotfix: Flywheel golden signals now visible by default
The flywheel status was telling you everything was fine while hiding the full picture behind an opt-in flag. ao flywheel status said "COMPOUNDING" but the golden signals analysis (hidden behind --golden) said "accumulating." Now golden signals always compute and display — no more misleading status.
What changed
- Golden signals always shown —
ao flywheel statusnow includes the four golden signals (velocity trend, citation pipeline, research closure, reuse concentration) and the overall verdict in every output format (table, JSON, YAML). --goldenflag deprecated — Kept for backward compatibility but now a no-op (hidden from help).
Full changelog
See CHANGELOG.md for complete details.
Full changelog
Fixed
- Flywheel golden signals always shown — Golden signals were gated behind
--goldenflag, causingao flywheel statusto report "COMPOUNDING" while the hidden golden signals analysis showed "accumulating". Golden signals now compute and display by default.
Full Changelog: v2.27.0...v2.27.1
v2.27.0
brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance
Highlights
The knowledge flywheel now tells you whether it's actually working. Four golden signals answer the question every agent operator asks: is my knowledge compounding, or just collecting dust?
ao flywheel status --goldenWhat's New
Golden Signals for Flywheel Health
Four health indicators that go beyond escape velocity (σρ > δ):
| Signal | Question It Answers |
|---|---|
| Velocity Trend | Is σρ−δ increasing over time, or sliding back? |
| Citation Pipeline | Are citations actually delivering value, or just noise? |
| Research Closure | Is research being mined into learnings, or hoarded? |
| Reuse Concentration | Is the whole knowledge pool active, or just a few items? |
Each signal produces a verdict. Three or more healthy signals = compounding. Three or more critical = decaying. Mixed = accumulating — you know what to fix.
Forge-to-Pool Bridge
Forge now auto-writes pending learnings to .agents/knowledge/pending/ — closing the last manual gap in the flywheel loop. Knowledge flows from session → forge → pool → learnings → inject without intervention.
Session-Start Citation Priming
ao lookup runs at session start, surfacing relevant knowledge and creating the citation events that drive the feedback loop.
All Changes
Added
- Flywheel golden signals (
ao flywheel status --golden) - Forge-to-pool bridge for close-loop knowledge ingestion
- SessionStart citation priming via
ao lookup - Skill catalog quality improvements (descriptions, extraction, references)
Fixed
.agents/.gitignorescope — replaced broad!*/with explicit subdirectory list- Codex runtime skill parity hardening
- Codex install smoke test assertions
Changed
- CLI reference docs regenerated
Full Changelog: v2.26.1...v2.27.0
v2.26.1
brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance
v2.26.1 — DAG-ify orchestrator skills
Hotfix: /rpi was stopping after implementation (Phase 2) without running validation (Phase 3). The execution steps were spread across prose sections with ### headings that created natural LLM stopping points.
Highlights
- RPI now runs all three phases reliably. The execution sequence for
/rpi,/discovery, and/validationis encoded as a compact DAG code block — no section breaks between steps, no natural stopping points for the LLM. - -577 lines across 6 skill files (3 source + 3 codex variants). Less prose, more program.
What's New
Fixed
/rpistops after Phase 2 — restructured as compact DAG/discoveryand/validationrestructured to match- Test patterns updated for new heading format
Changed
- GOALS.md rebuilt from first principles
- README leads with moats, progressive disclosure
- CLI reference docs regenerated
- Doctor + findings helper test coverage added
Full changelog
See CHANGELOG.md for the complete v2.26.1 entry.
Full changelog
Fixed
- RPI stops after Phase 2 — Restructured rpi, discovery, and validation orchestrator skills as compact DAGs with execution sequence in a single code block; eliminates LLM stopping between phases due to
###section headings acting as natural breakpoints - Test grep patterns for DAG headings — Updated
test-tuning-defaults.shto match new complexity-scaled gate headings after DAG restructure
Changed
- Goals reimagined — GOALS.md rebuilt from first principles with fitness gate fixes
- README progressive disclosure — Lead with moats, collapse detail into expandable sections
- CLI reference docs — Regenerated with updated date stamps
- Doctor + findings helpers — Added CLI test coverage for extracted helpers
Full Changelog: v2.26.0...v2.26.1
v2.26.0
brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance
v2.26.0 Release Notes
Highlights
- Test pyramid expanded to BF1–BF9 — Four new bug-finding levels cover regression replay, performance benchmarks, backward compatibility, and security-in-test patterns
- Language-specific test patterns — Go and Python standards now include concrete examples for every new BF level
- Codex audit: 60+ fixes — Orphaned references removed, lint warnings resolved, manifest hashes regenerated across all 54 Codex skills
What's New
- BF6 (Regression): Bug-specific replay tests with ID-based naming (
TestBug_AG_XYZ_.../test_bug_ag_xyz_...) - BF7 (Performance): Benchmark patterns using Go
testing.Band Pythonpytest-benchmark - BF8 (Backward Compatibility): Fixture corpus approach with
testdata/compat/(Go) andtests/fixtures/compat/(Python) - BF9 (Security): In-test secrets redaction and path traversal rejection patterns
- Decision tree extended with 4 new routing questions
- RPI phase mapping updated: bug fix mandates BF6, hot-path mandates BF7, format changes mandate BF8, secrets mandate BF9
regen-codex-hashes.shscript for Codex manifest maintenance
All Changes
Full changelog
Added
- BF6–BF9 test pyramid levels with language-specific Go and Python patterns
- Test pyramid decision tree expansion (4 new routing questions)
- RPI phase mapping for BF6–BF9
regen-codex-hashes.shmanifest hash regeneration script
Changed
- Go standards: benchmark, backward compat, regression, security test patterns
- Python standards: Hypothesis, pytest-benchmark, compat fixtures, regression, security patterns
- Coverage assessment template extended from BF1–BF5 to BF1–BF9
Fixed
- Codex skill audit: 60+ findings across 54 skills
- Skill lint warnings in crank, rpi, recover
- README skill references and orphaned templates
- Skill linter refs in reverse-engineer-rpi
Full Changelog: See CHANGELOG.md
Full changelog
Added
- BF6–BF9 test pyramid levels — Regression (bug-specific replay), Performance/Benchmark, Backward Compatibility, and Security (in-test) bug-finding levels with language-specific patterns for Go and Python
- Test pyramid decision tree expansion — 4 new routing questions for BF6–BF9 in the "When to Use" guide
- RPI phase mapping for BF6–BF9 — Bug fix → BF6 mandatory, hot-path → BF7 benchmark, format change → BF8 compat fixture, secrets → BF9 redaction tests
regen-codex-hashes.sh— Manifest hash regeneration script for Codex skill maintenance
Changed
- Go standards — Added benchmark tests (BF7), backward compat with
testdata/compat/(BF8), regression test naming convention (BF6), security tests for path traversal (BF9) - Python standards — Added Hypothesis property-based testing (BF1),
pytest-benchmarkpatterns (BF7), backward compat with parametrized fixtures (BF8), regression test naming (BF6), secrets redaction tests (BF9) - Coverage assessment template — Extended BF pyramid table from BF1–BF5 to BF1–BF9
Fixed
- Codex skill audit — 60+ findings fixed across all 54 Codex skills; removed orphaned
claude-code-latest-features.mdandclaude-cli-verified-commands.mdreferences - Skill lint warnings — Resolved all warnings in crank, rpi, recover skills
- README skill references — Corrected broken references and linked orphaned templates
- Skill linter refs — Fixed directory reference and backtick formatting in reverse-engineer-rpi
Full Changelog: v2.25.1...v2.26.0
v2.25.1
brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance
Fixed
- Codex BF pyramid parity — Synced BF1/BF2/BF4 bug-finding level selection into skills-codex implement, post-mortem, and validation skills
- Codex Claude backend cross-contamination — Removed orphaned
backend-claude-teams.mdfiles (Claude primitives: TeamCreate, SendMessage) from 4 Codex skills (council, research, shared, swarm) - Dead converter rule — Removed stale sed substitution for
backend-claude-teams.mdrename in converter script - Swarm reference integrity — Added Reference Documents section to swarm SKILL.md; updated validate.sh to check only Codex-native backend references
Full Changelog: v2.25.0...v2.25.1
v2.25.0
brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance
v2.25.0 — Test Pyramid + Autonomous Execution
Highlights
-
Test pyramid baked into every RPI phase. L0–L7 levels flow from discovery through post-mortem. Agents own L0–L3 autonomously; L4+ requires human input. Plans include test level metadata, pre-mortem validates coverage, post-mortem reports gaps.
-
RPI and Evolve now run fully autonomous by default. No human questions between phases. Three-Phase Rule enforces discovery → implementation → validation as a single uninterrupted flow. Anti-pattern tables catch 13 common failure modes.
-
Codex skill infrastructure matures. New API contract, DAG-based smoke test for 54 skills, durable overrides for crank/swarm/council, and a complete standards reference for Codex skill authoring.
What's New
Test Pyramid Standard
A shared reference (test-pyramid.md) defines 8 test levels with clear agent autonomy boundaries. Every RPI phase now knows which test levels to scope, plan, write, and validate.
Autonomous Execution
/rpi and /evolve enforce hands-free execution. No pausing to ask, no stopping after implementation, no narrating plans. The human touchpoint is the final report after all three phases complete.
Codex Platform
Output contracts on verdict skills, a conformance validator, and converter improvements that properly strip Claude primitives instead of mapping to non-existent tools.
All Changes
See CHANGELOG.md for the complete list of 24 commits.
Full changelog
Added
- L0–L7 test pyramid standard — Shared reference doc (
standards/references/test-pyramid.md) defining 8 test levels, agent autonomy boundaries (L0–L3 autonomous, L4+ human-guided), and RPI phase mapping - Test pyramid integration across RPI lifecycle — Discovery identifies test levels, plan classifies tests by level, pre-mortem validates coverage, implement selects TDD level, crank carries
test_levelsmetadata, validation audits coverage, post-mortem reports gaps - RPI autonomous execution enforcement — Three-Phase Rule mandates discovery → implementation → validation without human interruption; anti-patterns table documents 7 failure modes
- Evolve autonomous execution enforcement — Each cycle runs a complete 3-phase
/rpi --auto; anti-patterns table documents 6 failure modes; large work decomposed into sub-RPI cycles - Codex skill standard — New
standards/references/codex-skill.mdwith tool mapping, prohibited primitives, two-phase validation, DAG-first traversal, and prompt constraint boundaries - Codex-native overrides — Durable overrides for crank, swarm, council that survive regeneration
- DAG-based Codex smoke test —
scripts/smoke-test-codex-skills.shvalidates 54 skills with dependency-ordered traversal - Codex skill API contract —
docs/contracts/codex-skill-api.mdwith conformance validator - Output contract declarations —
output_contractfield on council, vibe, pre-mortem, research skills with canonical finding-item schema
Changed
- Codex converter rewrite — Strips Claude primitives instead of mapping to unavailable tools; rewrites reference files through
codex_rewrite_text - CI pipeline — Removed codex skill parity check (skills-codex/ now manually maintained); fixed shellcheck and embedded sync issues
Fixed
- Converter primitive stripping — Task primitives (TaskCreate, TeamCreate, SendMessage) properly stripped instead of mapped to non-existent Codex equivalents
- Embedded hook sync — Added missing
test-pyramid.mdandcodex-skill.mdto CLI embedded references - ShellCheck SC1125 — Fixed em-dash in shellcheck disable directive in smoke test script
- Skill line limits — Moved verbose autonomy rules to reference files to stay under tier-specific line budgets
Full Changelog: v2.24.0...v2.25.0