Releases · yimwoo/codex-agenteam

Highlights

Added agenteam-rt evidence for versioned governed run evidence bundles.
Captures run outcome, metrics, stage and role exits, final verification, stale state, events, artifacts, and optional governance data.
Carries pipeline_mode through traces for clearer audit context.
Updated CLI and pipeline documentation for evidence generation.

Verification

GitHub PR checks passed for Python 3.11, Python 3.12, and scan.
Local release-candidate verification passed: runtime tests, skill/smoke tests, ruff format/check, bats smoke, git isolation, verify-stage tests, and whitespace checks.

Highlights

Added agenteam-rt trace --run-id <id> for diagnostic JSON across failed, blocked, stale, resumable, and final-verify-failed runs.
Updated status --progress to derive from trace while preserving compact output and legacy verify aliases.
Recorded runner role artifact provenance so trace can point to prompt, stdout, stderr, and exec audit files.
Documented trace usage in CLI and pipeline docs.

Verification

python3 -m pytest test/test_runtime.py -q -> 330 passed
python3 -m pytest test/test_skill_contracts.py test/test_smoke_playground.py -q -> 13 passed
python3 -m ruff format --check runtime/ test/ -> clean
bats test/smoke.bats -> 26 passed
bats test/test_git_isolate.bats test/test_verify_stage.bats -> 22 passed

Highlights

Hardened agenteam-rt run so non-zero role exits fail stages and runs authoritatively.
Added verify retry, conservative rework_to repair dispatch, blocked gate handling, gated-run resume, and final verification retry support.
Kept runner stdout as a pure JSONL event stream while persisting history quietly.
Added durable runner events for role execution, retries, rework, gates, and final verification.
Documented the runtime/executor boundary for the non-interactive runner.

Verification

GitHub Actions passed for Python 3.11 and 3.12.
Plugin scanner passed.
Local release checks passed: runtime pytest, skill/smoke playground pytest, ruff format check, smoke bats, git-isolate bats, and verify-stage bats.

Highlights

add governed-delivery foundations with agenteam-rt governed-bootstrap to scaffold local governance assets
add structured decision logging via agenteam-rt decision append|list|render-log
add config-driven tripwire evaluation via agenteam-rt tripwire check
add optional governance metadata on runs and surface it in status and standup output
update public docs for the new optional workflow while keeping internal planning docs private

Verification

python3 -m pytest -q
bats test/smoke.bats
python3 -m ruff check runtime/agenteam/cli.py runtime/agenteam/governance.py runtime/agenteam/standup.py runtime/agenteam/state.py test/test_runtime.py
python3 -m black --check runtime/agenteam/governance.py runtime/agenteam/state.py test/test_runtime.py

What's New

`agenteam-rt run` — Full Pipeline via codex exec

Drive the complete AgenTeam pipeline non-interactively — for CI, experiments, cron, and remote execution.

# Run a task through the full pipeline
agenteam-rt run --task "add user auth" --auto-approve-gates

# From a seed file with a specific profile
agenteam-rt run --task-file seed.md --profile standard --output-dir ./out

# Resume a crashed/interrupted run
agenteam-rt run --run-id <id>

Key features:

Invokes codex exec via documented stdin path (--json --full-auto -)
Fail-fast gates: human gates stop by default; --auto-approve-gates for autonomous mode
Dual output: per-stage/role directory structure + JSONL event stream to stdout
Prompt audit: exact prompt and structured prompt-build output saved before each dispatch
Resume: --run-id picks up from the last incomplete stage
History on failure: persists run history for both successful and failed runs

Output structure:

.agenteam/runs/<run-id>/
  run.json                 # Run summary
  events.jsonl             # All events
  implement/dev/           # Per-stage/role
    prompt.txt             # Exact prompt sent
    prompt-build.json      # Structured components
    stdout.txt / stderr.txt / exec.json

Built on existing primitives: prompt-build, init, transition, record-verify, record-gate, event append, history append. No new runtime primitives beyond runner.py.

Numbers

7 files changed, 533 lines added
5 new tests (302 total, 0 regressions)
New module: runtime/agenteam/runner.py

Roadmap synced through v3.2. Added prompt-build contract test. 297 tests passing.

Next up: v3.3 Non-Interactive Runner (agenteam-rt run).

What's New

`prompt-build` — Drive AgenTeam from `codex exec`, CI, and experiment harnesses

Closes #23. New command that returns the fully composed prompt for a role dispatch as structured JSON:

agenteam-rt prompt-build --run-id <id> --stage implement --role dev

Output includes:

schema_version: "1" — for future compatibility
agent.developer_instructions — from resolved role config (deep-merged defaults + overrides)
task.raw / task.effective — original task + deterministic prior-run context
artifacts.search_paths — always populated, mode-aware (standalone vs HOTL)
artifacts.selected — best-effort files from current run
handoff_contract, verification, dispatch_context — structured objects
hotl — eligibility check (graceful without HOTL)
prompt_sections — ordered list of prompt components for experiment harnesses
prompt — fully composed string, ready for codex exec --prompt

Key design decisions:

Deterministic: Prior-run context via build_visible_memory(), not LLM judgment
Structured first: Machine-shaped subfields for programmatic access; prompt as convenience
Best-effort artifacts: search_paths is the reliable contract; selected uses mtime heuristic
Extracted helper: build_developer_instructions() shared between TOML generation and prompt-build

Use cases unlocked

Experiment harnesses (agento): run baseline-vs-agenteam comparisons programmatically
CI/scheduled runs: trigger AgenTeam pipelines from GitHub Actions or cron
Custom runners: build your own stage loop with prompt-build + codex exec + record-verify

Numbers

8 files changed, 463 lines added
7 new tests (296 total, 0 regressions)

What's New

CI Repair Skill

New skill $ateam:ci-repair — fix CI failures without re-running the full pipeline.

$ateam:ci-repair #42
$ateam:ci-repair feature/add-auth

How it works:

Resolves the PR and finds the latest CI run for the current head commit
Fetches failure context: which job/step failed + truncated logs (max 400 lines)
Preflights git state (clean worktree, correct branch)
Dispatches dev with bounded failure context and debugging guidance
Runs local verification before pushing
Pushes only repair changes (baseline diff, never git add -A)

Safety features:

Commit-keyed: Only repairs if the latest run for the PR's HEAD is failing (won't repair old failures when CI already passed)
Git preflight: Rejects dirty worktree or detached HEAD
Verify before push: Local verification must pass; unverified fixes require explicit user confirmation
Bounded context: Max 3 failed jobs, 100 lines per step, 400 total — keeps the repair prompt focused

GitHub Actions only for v3.1. One repair cycle per invocation. Manual trigger (no auto-polling).

Numbers

7 files changed, 255 lines added
1 new contract test (289 total, 0 regressions)

What's New

Run Observability

Users now get visibility into pipeline progress during execution — the #1 competitive gap across all multi-agent tools.

status --progress — compact progress view:

AgenTeam Run: 20260410T120000Z
Task: add user authentication
Status: running
Elapsed: 4m 32s

Stages:
  research   ✓ completed  (0m 45s)
  strategy   ✓ completed  (0m 30s)
  design     ✓ completed  (1m 02s)
  implement  → verifying   (1m 15s)  [verify attempt 2/3]
  test       · pending
  review     · pending

Active Lock: dev
Last Event: stage_verified (implement) — fail, attempt 2

event tail --run-id <id> — stream events from the JSONL log in real time. Exits on run_finished or Ctrl-C. Power-user tool for monitoring from another terminal.

In-thread progress — the run skill now emits a progress snapshot after every stage transition (dispatch, verify, gate, completion). Users see pipeline state in the conversation thread without a separate terminal.

Stage timestamps — transition() now writes started_at on dispatch and completed_at on completion/skip, enabling elapsed time computation.

Cross-Model Review Guidance

Using the same model for dev and reviewer can produce sycophantic reviews. New documentation recommends different models:

roles:
  dev:
    model: gpt-5.3-codex       # writes code
  reviewer:
    model: o3-pro              # reviews with different reasoning

Backward Compatible

status without --progress still returns the raw JSON state dump. No breaking changes.

Numbers

13 files changed, 299 lines added
11 new tests (288 total, 0 regressions)

v2.11.2 makes AgenTeam's carry-forward memory visible in the runtime surfaces people already use.

Highlights:

status now includes a memory block with concise carry-forward lessons from compatible prior runs
standup now surfaces the same visible memory alongside current health and dispatch context
memory is built from existing run history and filtered by the same compatibility rules added in v2.11.1, so stale or legacy runs do not pollute the output
docs and tests now cover visible memory, empty-memory behavior, and incompatible-history filtering

Releases: yimwoo/codex-agenteam

v3.7.0 - Governed Run Evidence

Highlights

Verification

Uh oh!

v3.6.0 - Visible Run Control

Highlights

Verification

Uh oh!

v3.5.0 - Runner Trust

Highlights

Verification

Uh oh!

v3.4.0

Highlights

Verification

Uh oh!

v3.3.0 — Non-Interactive Runner

What's New

agenteam-rt run — Full Pipeline via codex exec

Numbers

Uh oh!

v3.2.1: Stabilization

Uh oh!

v3.2.0 — Prompt Build: Non-Interactive Pipeline Primitive

What's New

prompt-build — Drive AgenTeam from codex exec, CI, and experiment harnesses

Use cases unlocked

Numbers

Uh oh!

v3.1.0 — CI Repair Loop

What's New

CI Repair Skill

Numbers

Uh oh!

v3.0.0 — Run Observability + Cross-Model Review

What's New

Run Observability

Cross-Model Review Guidance

Backward Compatible

Numbers

Uh oh!

v2.11.2

Uh oh!

`agenteam-rt run` — Full Pipeline via codex exec

`prompt-build` — Drive AgenTeam from `codex exec`, CI, and experiment harnesses