Skip to content

Releases: yimwoo/codex-agenteam

v3.7.0 - Governed Run Evidence

26 Apr 22:01
458662e

Choose a tag to compare

Highlights

  • Added agenteam-rt evidence for versioned governed run evidence bundles.
  • Captures run outcome, metrics, stage and role exits, final verification, stale state, events, artifacts, and optional governance data.
  • Carries pipeline_mode through traces for clearer audit context.
  • Updated CLI and pipeline documentation for evidence generation.

Verification

  • GitHub PR checks passed for Python 3.11, Python 3.12, and scan.
  • Local release-candidate verification passed: runtime tests, skill/smoke tests, ruff format/check, bats smoke, git isolation, verify-stage tests, and whitespace checks.

v3.6.0 - Visible Run Control

25 Apr 16:29
a56a173

Choose a tag to compare

Highlights

  • Added agenteam-rt trace --run-id <id> for diagnostic JSON across failed, blocked, stale, resumable, and final-verify-failed runs.
  • Updated status --progress to derive from trace while preserving compact output and legacy verify aliases.
  • Recorded runner role artifact provenance so trace can point to prompt, stdout, stderr, and exec audit files.
  • Documented trace usage in CLI and pipeline docs.

Verification

  • python3 -m pytest test/test_runtime.py -q -> 330 passed
  • python3 -m pytest test/test_skill_contracts.py test/test_smoke_playground.py -q -> 13 passed
  • python3 -m ruff format --check runtime/ test/ -> clean
  • bats test/smoke.bats -> 26 passed
  • bats test/test_git_isolate.bats test/test_verify_stage.bats -> 22 passed

v3.5.0 - Runner Trust

25 Apr 05:43
8621b1e

Choose a tag to compare

Highlights

  • Hardened agenteam-rt run so non-zero role exits fail stages and runs authoritatively.
  • Added verify retry, conservative rework_to repair dispatch, blocked gate handling, gated-run resume, and final verification retry support.
  • Kept runner stdout as a pure JSONL event stream while persisting history quietly.
  • Added durable runner events for role execution, retries, rework, gates, and final verification.
  • Documented the runtime/executor boundary for the non-interactive runner.

Verification

  • GitHub Actions passed for Python 3.11 and 3.12.
  • Plugin scanner passed.
  • Local release checks passed: runtime pytest, skill/smoke playground pytest, ruff format check, smoke bats, git-isolate bats, and verify-stage bats.

v3.4.0

14 Apr 18:47
771137b

Choose a tag to compare

Highlights

  • add governed-delivery foundations with agenteam-rt governed-bootstrap to scaffold local governance assets
  • add structured decision logging via agenteam-rt decision append|list|render-log
  • add config-driven tripwire evaluation via agenteam-rt tripwire check
  • add optional governance metadata on runs and surface it in status and standup output
  • update public docs for the new optional workflow while keeping internal planning docs private

Verification

  • python3 -m pytest -q
  • bats test/smoke.bats
  • python3 -m ruff check runtime/agenteam/cli.py runtime/agenteam/governance.py runtime/agenteam/standup.py runtime/agenteam/state.py test/test_runtime.py
  • python3 -m black --check runtime/agenteam/governance.py runtime/agenteam/state.py test/test_runtime.py

v3.3.0 — Non-Interactive Runner

11 Apr 22:35
ed160bb

Choose a tag to compare

What's New

agenteam-rt run — Full Pipeline via codex exec

Drive the complete AgenTeam pipeline non-interactively — for CI, experiments, cron, and remote execution.

# Run a task through the full pipeline
agenteam-rt run --task "add user auth" --auto-approve-gates

# From a seed file with a specific profile
agenteam-rt run --task-file seed.md --profile standard --output-dir ./out

# Resume a crashed/interrupted run
agenteam-rt run --run-id <id>

Key features:

  • Invokes codex exec via documented stdin path (--json --full-auto -)
  • Fail-fast gates: human gates stop by default; --auto-approve-gates for autonomous mode
  • Dual output: per-stage/role directory structure + JSONL event stream to stdout
  • Prompt audit: exact prompt and structured prompt-build output saved before each dispatch
  • Resume: --run-id picks up from the last incomplete stage
  • History on failure: persists run history for both successful and failed runs

Output structure:

.agenteam/runs/<run-id>/
  run.json                 # Run summary
  events.jsonl             # All events
  implement/dev/           # Per-stage/role
    prompt.txt             # Exact prompt sent
    prompt-build.json      # Structured components
    stdout.txt / stderr.txt / exec.json

Built on existing primitives: prompt-build, init, transition, record-verify, record-gate, event append, history append. No new runtime primitives beyond runner.py.

Numbers

  • 7 files changed, 533 lines added
  • 5 new tests (302 total, 0 regressions)
  • New module: runtime/agenteam/runner.py

v3.2.1: Stabilization

11 Apr 22:12
584cc94

Choose a tag to compare

Roadmap synced through v3.2. Added prompt-build contract test. 297 tests passing.

Next up: v3.3 Non-Interactive Runner (agenteam-rt run).

v3.2.0 — Prompt Build: Non-Interactive Pipeline Primitive

11 Apr 21:55
63227e5

Choose a tag to compare

What's New

prompt-build — Drive AgenTeam from codex exec, CI, and experiment harnesses

Closes #23. New command that returns the fully composed prompt for a role dispatch as structured JSON:

agenteam-rt prompt-build --run-id <id> --stage implement --role dev

Output includes:

  • schema_version: "1" — for future compatibility
  • agent.developer_instructions — from resolved role config (deep-merged defaults + overrides)
  • task.raw / task.effective — original task + deterministic prior-run context
  • artifacts.search_paths — always populated, mode-aware (standalone vs HOTL)
  • artifacts.selected — best-effort files from current run
  • handoff_contract, verification, dispatch_context — structured objects
  • hotl — eligibility check (graceful without HOTL)
  • prompt_sections — ordered list of prompt components for experiment harnesses
  • prompt — fully composed string, ready for codex exec --prompt

Key design decisions:

  • Deterministic: Prior-run context via build_visible_memory(), not LLM judgment
  • Structured first: Machine-shaped subfields for programmatic access; prompt as convenience
  • Best-effort artifacts: search_paths is the reliable contract; selected uses mtime heuristic
  • Extracted helper: build_developer_instructions() shared between TOML generation and prompt-build

Use cases unlocked

  • Experiment harnesses (agento): run baseline-vs-agenteam comparisons programmatically
  • CI/scheduled runs: trigger AgenTeam pipelines from GitHub Actions or cron
  • Custom runners: build your own stage loop with prompt-build + codex exec + record-verify

Numbers

  • 8 files changed, 463 lines added
  • 7 new tests (296 total, 0 regressions)

v3.1.0 — CI Repair Loop

10 Apr 21:23
4d78b30

Choose a tag to compare

What's New

CI Repair Skill

New skill $ateam:ci-repair — fix CI failures without re-running the full pipeline.

$ateam:ci-repair #42
$ateam:ci-repair feature/add-auth

How it works:

  1. Resolves the PR and finds the latest CI run for the current head commit
  2. Fetches failure context: which job/step failed + truncated logs (max 400 lines)
  3. Preflights git state (clean worktree, correct branch)
  4. Dispatches dev with bounded failure context and debugging guidance
  5. Runs local verification before pushing
  6. Pushes only repair changes (baseline diff, never git add -A)

Safety features:

  • Commit-keyed: Only repairs if the latest run for the PR's HEAD is failing (won't repair old failures when CI already passed)
  • Git preflight: Rejects dirty worktree or detached HEAD
  • Verify before push: Local verification must pass; unverified fixes require explicit user confirmation
  • Bounded context: Max 3 failed jobs, 100 lines per step, 400 total — keeps the repair prompt focused

GitHub Actions only for v3.1. One repair cycle per invocation. Manual trigger (no auto-polling).

Numbers

  • 7 files changed, 255 lines added
  • 1 new contract test (289 total, 0 regressions)

v3.0.0 — Run Observability + Cross-Model Review

10 Apr 19:30
70275d1

Choose a tag to compare

What's New

Run Observability

Users now get visibility into pipeline progress during execution — the #1 competitive gap across all multi-agent tools.

status --progress — compact progress view:

AgenTeam Run: 20260410T120000Z
Task: add user authentication
Status: running
Elapsed: 4m 32s

Stages:
  research   ✓ completed  (0m 45s)
  strategy   ✓ completed  (0m 30s)
  design     ✓ completed  (1m 02s)
  implement  → verifying   (1m 15s)  [verify attempt 2/3]
  test       · pending
  review     · pending

Active Lock: dev
Last Event: stage_verified (implement) — fail, attempt 2

event tail --run-id <id> — stream events from the JSONL log in real time. Exits on run_finished or Ctrl-C. Power-user tool for monitoring from another terminal.

In-thread progress — the run skill now emits a progress snapshot after every stage transition (dispatch, verify, gate, completion). Users see pipeline state in the conversation thread without a separate terminal.

Stage timestampstransition() now writes started_at on dispatch and completed_at on completion/skip, enabling elapsed time computation.

Cross-Model Review Guidance

Using the same model for dev and reviewer can produce sycophantic reviews. New documentation recommends different models:

roles:
  dev:
    model: gpt-5.3-codex       # writes code
  reviewer:
    model: o3-pro              # reviews with different reasoning

Backward Compatible

status without --progress still returns the raw JSON state dump. No breaking changes.

Numbers

  • 13 files changed, 299 lines added
  • 11 new tests (288 total, 0 regressions)

v2.11.2

10 Apr 18:01

Choose a tag to compare

v2.11.2 makes AgenTeam's carry-forward memory visible in the runtime surfaces people already use.

Highlights:

  • status now includes a memory block with concise carry-forward lessons from compatible prior runs
  • standup now surfaces the same visible memory alongside current health and dispatch context
  • memory is built from existing run history and filtered by the same compatibility rules added in v2.11.1, so stale or legacy runs do not pollute the output
  • docs and tests now cover visible memory, empty-memory behavior, and incompatible-history filtering