Skip to content

Releases: boshu2/agentops

v2.31.0

31 Mar 14:29

Choose a tag to compare

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance


Highlights

Nine new lifecycle skills let the agent handle bootstrapping, dependency audits, design reviews, performance analysis, refactoring, code review, scaffolding, and testing without manual invocation. A new ao harvest command pulls learnings from sibling workspaces so knowledge compounds across your entire multi-agent fleet, not just one repo. Context debugging is easier with ao context packet, and the hook system now formally supports both Claude Code and Codex runtimes.

What's New

  • 9 lifecycle skills — bootstrap, deps, design, harvest, perf, refactor, review, scaffold, and test are now part of the RPI workflow with automatic invocation and mechanical gates
  • Cross-rig knowledge harvestingao harvest extracts and catalogs learnings from sibling crew workspaces so insights travel between agents
  • Context packet inspectorao context packet lets you debug what inter-session handoff state the agent actually sees
  • Dual-runtime hook support — Hooks now have a formal runtime contract covering Claude Code, Codex, and manual execution modes

All Changes

Added

  • Nine lifecycle skills wired into the RPI workflow with auto-invocation
  • Cross-rig knowledge consolidation via ao harvest
  • Context packet inspection via ao context packet
  • Hook runtime contract with Claude/Codex/manual event mapping
  • Research provenance tracking on pending learnings
  • Context declarations for inject, provenance, and rpi skills
  • Evidence-backed output templates for goals and product commands

Changed

  • Documentation reframed around three-gap context lifecycle model
  • Hook docs updated with runtime modes table for dual-runtime support

Fixed

  • Four pre-existing CI failures resolved
  • Lookup retrieval gaps that caused empty results
  • Embedded file sync on first session start
  • Closure integrity with 24h grace window for evidence timing
  • Skill lint compliance across vibe, post-mortem, crank, and plan
  • Codex tool naming rule and five Claude-era tool references
  • ASCII diagram consistency across 23 documentation files
  • Fork exhaustion in validation script replaced with lightweight parser

Full changelog


Full changelog

Added

  • 9 lifecycle skills — bootstrap, deps, design, harvest, perf, refactor, review, scaffold, and test skills wired into RPI with auto-invocation and mechanical gates
  • ao harvest — cross-rig knowledge consolidation extracts and catalogs learnings from sibling crew workspaces
  • ao context packet — inspect stigmergic context packets for debugging inter-session handoff state
  • Hook runtime contract — formal Claude/Codex/manual event mapping with runtime-aware hook tooling
  • Evidence-driven skill enrichment — production meta-knowledge, anti-patterns, flywheel metrics, and normalization defect detection baked into 9 skill reference files
  • Research provenance — pending learnings now carry full research provenance for discoverability and citation tracking
  • Context declarations — inject, provenance, and rpi skills declare their context requirements explicitly
  • Goals and product output templates/goals and /product produce evidence-backed structured output

Changed

  • Three-gap context lifecycle contract — README, PRODUCT.md, positioning docs, and operational guides reframed around the context lifecycle model
  • Dual-runtime hook documentation — runtime modes table and troubleshooting updated for Claude + Codex hook coexistence

Fixed

  • CI reliability — resolved 4 pre-existing CI failures, restored headless runtime preflight, repaired codex parity drift checks
  • ao lookup retrieval — fixed retrieval gaps that caused lookup to return no results
  • Embedded sync — using-agentops SKILL.md and .agents/.gitignore now written correctly on first session start
  • Closure integrity — 24h grace window for close-before-commit evidence, normalized file parsing
  • Skill lint compliance — vibe, post-mortem, crank, and plan skills trimmed or restructured to stay under 800-line limit
  • Codex tool naming — added CLAUDE_TOOL_NAMING rule and fixed 5 Claude-era tool references in codex skills
  • ASCII diagram consistency — aligned box-drawing characters across 23 documentation files
  • Fork exhaustion prevention — replaced jq with awk in validate-go-fast to prevent fork bombs on large repos

Full Changelog: v2.30.0...v2.31.0

v2.30.0

25 Mar 03:47

Choose a tag to compare

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance


v2.30.0 — Codex hookless lifecycle, PROGRAM.md workflows, and stronger long-running RPI runs

Highlights

AgentOps now handles Codex hookless sessions more cleanly, gives autonomous workflows a clearer PROGRAM.md contract, and makes long-running RPI runs much easier to inspect. This release also hardens the local release and validation path itself, so the same gate stack you rely on for shipping is more trustworthy under headless and generated-artifact-heavy workflows.

What's New

  • Hookless Codex lifecycle support — Codex sessions can now run through startup, follow-up, validation, and closeout without depending on legacy hook assumptions.
  • PROGRAM.md for autonomous work — Autodev and evolve flows now share a concrete program contract instead of relying on looser ad hoc context.
  • Artifact-aware long RPI runs — Mission control now shows run artifacts and evaluator output so you can inspect what happened during multi-phase autonomous runs.
  • More reliable release validation — Headless runtime checks, reverse-engineer hygiene, and release-gate coverage are more deterministic.

All Changes

Added

  • Hookless Codex lifecycle support across CLI commands and skill orchestration
  • A first-class PROGRAM.md contract for autodev and evolve-driven workflows
  • Artifact and evaluator visibility for long-running RPI sessions

Changed

  • Codex bundle maintenance, lifecycle guidance, and release validation coverage around the expanded Codex execution path

Fixed

  • Codex RPI scope and closeout issues that caused follow-up and validation drift
  • Release-gate regressions in headless runtime validation and learning coherence
  • Reverse-engineer repo scans so generated or temporary trees no longer contaminate detected CLI surfaces

Full changelog


Full changelog

Added

  • Codex hookless lifecycle supportao codex runtime commands, lifecycle fallback, and Codex skill orchestration now cover hookless sessions end to end
  • PROGRAM.md autodev contract — Added a first-class PROGRAM.md contract for autodev flows and taught /evolve and related RPI paths to use it
  • Long-running RPI artifact visibility — Mission control now exposes run artifacts and evaluator output so long-running RPI sessions are replayable and easier to inspect

Changed

  • Codex runtime maintenance flow — Refreshed Codex bundle hashes, lifecycle guards, runtime docs, and release validation coverage around the expanded Codex execution path

Fixed

  • Codex RPI scoping and closeout — Tightened objective scope, epic scope, closeout ownership, and validation gaps in the Codex RPI lifecycle
  • Release gate reliability — Restored headless runtime coverage, runtime-aware Claude inventory checks, and release-gate coherence validation
  • Reverse-engineer repo hygiene — Repo-mode reverse engineer now ignores generated and temp trees when identifying CLI and module surfaces

Full Changelog: v2.29.0...v2.30.0

v2.29.0

23 Mar 00:20

Choose a tag to compare

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance


v2.29.0 — Config control, broader search, and stronger flywheel proof

Highlights

AgentOps now gives you more control over model spend, a broader default search path, and a tighter proof path for the knowledge flywheel. You can assign agent models by cost tier through config, ao search now pulls from both repo-local knowledge and upstream session history, and the flywheel claim is backed by deterministic proof fixtures instead of manual spot checks.

What's New

  • Per-agent model routingao config now supports model cost tiers and direct config writes, so teams can tune quality and spend without manual file edits.
  • Broader default searchao search now brokers across upstream cass history and repo-local AgentOps artifacts instead of making you choose one surface up front.
  • Stronger flywheel evidence — Close-loop validation now preserves research provenance and uses executable proof fixtures plus artifact-specific citation feedback.
  • Richer review guidance — Council, research, swarm, vibe, athena, and post-mortem picked up new reference packs for reviewer routing, retrieval patterns, and write-time quality checks.

All Changes

Added

  • Model cost tiers and direct config writes for per-agent routing
  • Search brokerage across session history and repo-local knowledge
  • New reference packs for reviewer routing, iterative retrieval, confidence scoring, conflict recovery, and write-time quality

Changed

  • Comparison docs, command docs, and release smoke coverage around the expanded search and config surface

Fixed

  • Flywheel proof, citation feedback, and closure reporting now agree on actual state
  • Search stays aligned with forged session history and fallback behavior
  • Pre-push and release validation is more deterministic under hook-launched git environments
  • Council profile docs are synced between source and checked-in Codex artifacts

Full changelog


Full changelog

Added

  • Model cost tiers and config writesao config can now assign per-agent models by cost tier and persist repo configuration changes directly
  • Search brokerage over session history and repo knowledgeao search now wraps upstream cass results with repo-local AgentOps artifacts by default
  • Reviewer and post-mortem reference packs — Added model-routing, iterative-retrieval, confidence-scoring, write-time-quality, and conflict-recovery guidance across council, research, swarm, vibe, athena, and related skills

Changed

  • Competitive comparison and CLI docs — Refreshed comparison docs, release smoke coverage, and command documentation around the expanded search/config surface

Fixed

  • Flywheel proof and citation loop — Added deterministic proof fixtures, preserved exact research provenance, and made citation feedback artifact-specific so flywheel health reflects real closure state
  • Search alignment with forged session history — Search now stays aligned with forged session artifacts and fallback behavior
  • Hook-launched validation — Pre-push and release gates now isolate inherited git env/stdin correctly and cover newer hook scripts in integration tests
  • Codex council profile parity — Source and checked-in Codex council docs are back in sync for the shared profile contract

Full Changelog: v2.28.0...v2.29.0

v2.28.0

21 Mar 20:30

Choose a tag to compare

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance


v2.28.0 — Competitive Feature Integration

Five features adopted from reverse-engineering GSD v1.27 and Compound Engineer v2.47:

Highlights

  • Smarter failure recovery — Crank now classifies failures and auto-recovers (retry, decompose, or escalate) instead of blindly retrying
  • Knowledge stays clean — Athena defrag runs at every session end, pruning stale artifacts automatically
  • Per-project review config — Drop a .agents/reviewer-config.md to control which council judges run
  • Right-sized plans — Plans auto-scale detail level (minimal/standard/deep) based on complexity
  • Red-team your ideas — Brainstorm now stress-tests every approach before you choose

All Changes

See CHANGELOG.md for the complete list.


Full changelog

Added

  • Node repair operator — Crank now classifies task failures as RETRY (transient), DECOMPOSE (too complex), or PRUNE (blocked) with budget-controlled recovery
  • Knowledge refresh auto-trigger — Lightweight athena defrag runs automatically at session end via new SessionEnd hook
  • Configurable review agents — Project-level .agents/reviewer-config.md controls which judge perspectives council and vibe spawn
  • Three-tier plan detail scaling — Plan auto-selects Minimal, Standard, or Deep templates based on issue count and complexity
  • Adversarial ideation — Brainstorm Phase 3b stress-tests each approach with four red-team questions before user selection

Fixed

  • Crank SKILL.md line limit — Consolidated duplicate References sections to stay under 800-line skill lint limit
  • Codex skill parity — Synced all five competitive features to skills-codex with reference file copies

Full Changelog: v2.27.1...v2.28.0

v2.27.1

21 Mar 00:26

Choose a tag to compare

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance


v2.27.1 — Hotfix: Flywheel golden signals now visible by default

The flywheel status was telling you everything was fine while hiding the full picture behind an opt-in flag. ao flywheel status said "COMPOUNDING" but the golden signals analysis (hidden behind --golden) said "accumulating." Now golden signals always compute and display — no more misleading status.

What changed

  • Golden signals always shownao flywheel status now includes the four golden signals (velocity trend, citation pipeline, research closure, reuse concentration) and the overall verdict in every output format (table, JSON, YAML).
  • --golden flag deprecated — Kept for backward compatibility but now a no-op (hidden from help).

Full changelog

See CHANGELOG.md for complete details.


Full changelog

Fixed

  • Flywheel golden signals always shown — Golden signals were gated behind --golden flag, causing ao flywheel status to report "COMPOUNDING" while the hidden golden signals analysis showed "accumulating". Golden signals now compute and display by default.

Full Changelog: v2.27.0...v2.27.1

v2.27.0

20 Mar 21:29

Choose a tag to compare

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance


Highlights

The knowledge flywheel now tells you whether it's actually working. Four golden signals answer the question every agent operator asks: is my knowledge compounding, or just collecting dust?

ao flywheel status --golden

What's New

Golden Signals for Flywheel Health

Four health indicators that go beyond escape velocity (σρ > δ):

Signal Question It Answers
Velocity Trend Is σρ−δ increasing over time, or sliding back?
Citation Pipeline Are citations actually delivering value, or just noise?
Research Closure Is research being mined into learnings, or hoarded?
Reuse Concentration Is the whole knowledge pool active, or just a few items?

Each signal produces a verdict. Three or more healthy signals = compounding. Three or more critical = decaying. Mixed = accumulating — you know what to fix.

Forge-to-Pool Bridge

Forge now auto-writes pending learnings to .agents/knowledge/pending/ — closing the last manual gap in the flywheel loop. Knowledge flows from session → forge → pool → learnings → inject without intervention.

Session-Start Citation Priming

ao lookup runs at session start, surfacing relevant knowledge and creating the citation events that drive the feedback loop.

All Changes

Added

  • Flywheel golden signals (ao flywheel status --golden)
  • Forge-to-pool bridge for close-loop knowledge ingestion
  • SessionStart citation priming via ao lookup
  • Skill catalog quality improvements (descriptions, extraction, references)

Fixed

  • .agents/.gitignore scope — replaced broad !*/ with explicit subdirectory list
  • Codex runtime skill parity hardening
  • Codex install smoke test assertions

Changed

  • CLI reference docs regenerated

Full Changelog: v2.26.1...v2.27.0

v2.26.1

16 Mar 18:34

Choose a tag to compare

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance


v2.26.1 — DAG-ify orchestrator skills

Hotfix: /rpi was stopping after implementation (Phase 2) without running validation (Phase 3). The execution steps were spread across prose sections with ### headings that created natural LLM stopping points.

Highlights

  • RPI now runs all three phases reliably. The execution sequence for /rpi, /discovery, and /validation is encoded as a compact DAG code block — no section breaks between steps, no natural stopping points for the LLM.
  • -577 lines across 6 skill files (3 source + 3 codex variants). Less prose, more program.

What's New

Fixed

  • /rpi stops after Phase 2 — restructured as compact DAG
  • /discovery and /validation restructured to match
  • Test patterns updated for new heading format

Changed

  • GOALS.md rebuilt from first principles
  • README leads with moats, progressive disclosure
  • CLI reference docs regenerated
  • Doctor + findings helper test coverage added
Full changelog

See CHANGELOG.md for the complete v2.26.1 entry.


Full changelog

Fixed

  • RPI stops after Phase 2 — Restructured rpi, discovery, and validation orchestrator skills as compact DAGs with execution sequence in a single code block; eliminates LLM stopping between phases due to ### section headings acting as natural breakpoints
  • Test grep patterns for DAG headings — Updated test-tuning-defaults.sh to match new complexity-scaled gate headings after DAG restructure

Changed

  • Goals reimagined — GOALS.md rebuilt from first principles with fitness gate fixes
  • README progressive disclosure — Lead with moats, collapse detail into expandable sections
  • CLI reference docs — Regenerated with updated date stamps
  • Doctor + findings helpers — Added CLI test coverage for extracted helpers

Full Changelog: v2.26.0...v2.26.1

v2.26.0

15 Mar 18:51

Choose a tag to compare

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance


v2.26.0 Release Notes

Highlights

  • Test pyramid expanded to BF1–BF9 — Four new bug-finding levels cover regression replay, performance benchmarks, backward compatibility, and security-in-test patterns
  • Language-specific test patterns — Go and Python standards now include concrete examples for every new BF level
  • Codex audit: 60+ fixes — Orphaned references removed, lint warnings resolved, manifest hashes regenerated across all 54 Codex skills

What's New

  • BF6 (Regression): Bug-specific replay tests with ID-based naming (TestBug_AG_XYZ_... / test_bug_ag_xyz_...)
  • BF7 (Performance): Benchmark patterns using Go testing.B and Python pytest-benchmark
  • BF8 (Backward Compatibility): Fixture corpus approach with testdata/compat/ (Go) and tests/fixtures/compat/ (Python)
  • BF9 (Security): In-test secrets redaction and path traversal rejection patterns
  • Decision tree extended with 4 new routing questions
  • RPI phase mapping updated: bug fix mandates BF6, hot-path mandates BF7, format changes mandate BF8, secrets mandate BF9
  • regen-codex-hashes.sh script for Codex manifest maintenance

All Changes

Full changelog

Added

  • BF6–BF9 test pyramid levels with language-specific Go and Python patterns
  • Test pyramid decision tree expansion (4 new routing questions)
  • RPI phase mapping for BF6–BF9
  • regen-codex-hashes.sh manifest hash regeneration script

Changed

  • Go standards: benchmark, backward compat, regression, security test patterns
  • Python standards: Hypothesis, pytest-benchmark, compat fixtures, regression, security patterns
  • Coverage assessment template extended from BF1–BF5 to BF1–BF9

Fixed

  • Codex skill audit: 60+ findings across 54 skills
  • Skill lint warnings in crank, rpi, recover
  • README skill references and orphaned templates
  • Skill linter refs in reverse-engineer-rpi

Full Changelog: See CHANGELOG.md


Full changelog

Added

  • BF6–BF9 test pyramid levels — Regression (bug-specific replay), Performance/Benchmark, Backward Compatibility, and Security (in-test) bug-finding levels with language-specific patterns for Go and Python
  • Test pyramid decision tree expansion — 4 new routing questions for BF6–BF9 in the "When to Use" guide
  • RPI phase mapping for BF6–BF9 — Bug fix → BF6 mandatory, hot-path → BF7 benchmark, format change → BF8 compat fixture, secrets → BF9 redaction tests
  • regen-codex-hashes.sh — Manifest hash regeneration script for Codex skill maintenance

Changed

  • Go standards — Added benchmark tests (BF7), backward compat with testdata/compat/ (BF8), regression test naming convention (BF6), security tests for path traversal (BF9)
  • Python standards — Added Hypothesis property-based testing (BF1), pytest-benchmark patterns (BF7), backward compat with parametrized fixtures (BF8), regression test naming (BF6), secrets redaction tests (BF9)
  • Coverage assessment template — Extended BF pyramid table from BF1–BF5 to BF1–BF9

Fixed

  • Codex skill audit — 60+ findings fixed across all 54 Codex skills; removed orphaned claude-code-latest-features.md and claude-cli-verified-commands.md references
  • Skill lint warnings — Resolved all warnings in crank, rpi, recover skills
  • README skill references — Corrected broken references and linked orphaned templates
  • Skill linter refs — Fixed directory reference and backtick formatting in reverse-engineer-rpi

Full Changelog: v2.25.1...v2.26.0

v2.25.1

15 Mar 14:16

Choose a tag to compare

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance


Fixed

  • Codex BF pyramid parity — Synced BF1/BF2/BF4 bug-finding level selection into skills-codex implement, post-mortem, and validation skills
  • Codex Claude backend cross-contamination — Removed orphaned backend-claude-teams.md files (Claude primitives: TeamCreate, SendMessage) from 4 Codex skills (council, research, shared, swarm)
  • Dead converter rule — Removed stale sed substitution for backend-claude-teams.md rename in converter script
  • Swarm reference integrity — Added Reference Documents section to swarm SKILL.md; updated validate.sh to check only Codex-native backend references

Full Changelog: v2.25.0...v2.25.1

v2.25.0

14 Mar 18:46

Choose a tag to compare

brew update && brew upgrade agentops · bash <(curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install.sh) · checksums · verify provenance


v2.25.0 — Test Pyramid + Autonomous Execution

Highlights

  • Test pyramid baked into every RPI phase. L0–L7 levels flow from discovery through post-mortem. Agents own L0–L3 autonomously; L4+ requires human input. Plans include test level metadata, pre-mortem validates coverage, post-mortem reports gaps.

  • RPI and Evolve now run fully autonomous by default. No human questions between phases. Three-Phase Rule enforces discovery → implementation → validation as a single uninterrupted flow. Anti-pattern tables catch 13 common failure modes.

  • Codex skill infrastructure matures. New API contract, DAG-based smoke test for 54 skills, durable overrides for crank/swarm/council, and a complete standards reference for Codex skill authoring.

What's New

Test Pyramid Standard

A shared reference (test-pyramid.md) defines 8 test levels with clear agent autonomy boundaries. Every RPI phase now knows which test levels to scope, plan, write, and validate.

Autonomous Execution

/rpi and /evolve enforce hands-free execution. No pausing to ask, no stopping after implementation, no narrating plans. The human touchpoint is the final report after all three phases complete.

Codex Platform

Output contracts on verdict skills, a conformance validator, and converter improvements that properly strip Claude primitives instead of mapping to non-existent tools.

All Changes

See CHANGELOG.md for the complete list of 24 commits.


Full changelog

Added

  • L0–L7 test pyramid standard — Shared reference doc (standards/references/test-pyramid.md) defining 8 test levels, agent autonomy boundaries (L0–L3 autonomous, L4+ human-guided), and RPI phase mapping
  • Test pyramid integration across RPI lifecycle — Discovery identifies test levels, plan classifies tests by level, pre-mortem validates coverage, implement selects TDD level, crank carries test_levels metadata, validation audits coverage, post-mortem reports gaps
  • RPI autonomous execution enforcement — Three-Phase Rule mandates discovery → implementation → validation without human interruption; anti-patterns table documents 7 failure modes
  • Evolve autonomous execution enforcement — Each cycle runs a complete 3-phase /rpi --auto; anti-patterns table documents 6 failure modes; large work decomposed into sub-RPI cycles
  • Codex skill standard — New standards/references/codex-skill.md with tool mapping, prohibited primitives, two-phase validation, DAG-first traversal, and prompt constraint boundaries
  • Codex-native overrides — Durable overrides for crank, swarm, council that survive regeneration
  • DAG-based Codex smoke testscripts/smoke-test-codex-skills.sh validates 54 skills with dependency-ordered traversal
  • Codex skill API contractdocs/contracts/codex-skill-api.md with conformance validator
  • Output contract declarationsoutput_contract field on council, vibe, pre-mortem, research skills with canonical finding-item schema

Changed

  • Codex converter rewrite — Strips Claude primitives instead of mapping to unavailable tools; rewrites reference files through codex_rewrite_text
  • CI pipeline — Removed codex skill parity check (skills-codex/ now manually maintained); fixed shellcheck and embedded sync issues

Fixed

  • Converter primitive stripping — Task primitives (TaskCreate, TeamCreate, SendMessage) properly stripped instead of mapped to non-existent Codex equivalents
  • Embedded hook sync — Added missing test-pyramid.md and codex-skill.md to CLI embedded references
  • ShellCheck SC1125 — Fixed em-dash in shellcheck disable directive in smoke test script
  • Skill line limits — Moved verbose autonomy rules to reference files to stay under tier-specific line budgets

Full Changelog: v2.24.0...v2.25.0