Releases: yimwoo/codex-agenteam
v3.7.0 - Governed Run Evidence
Highlights
- Added
agenteam-rt evidencefor versioned governed run evidence bundles. - Captures run outcome, metrics, stage and role exits, final verification, stale state, events, artifacts, and optional governance data.
- Carries
pipeline_modethrough traces for clearer audit context. - Updated CLI and pipeline documentation for evidence generation.
Verification
- GitHub PR checks passed for Python 3.11, Python 3.12, and scan.
- Local release-candidate verification passed: runtime tests, skill/smoke tests, ruff format/check, bats smoke, git isolation, verify-stage tests, and whitespace checks.
v3.6.0 - Visible Run Control
Highlights
- Added
agenteam-rt trace --run-id <id>for diagnostic JSON across failed, blocked, stale, resumable, and final-verify-failed runs. - Updated
status --progressto derive from trace while preserving compact output and legacy verify aliases. - Recorded runner role artifact provenance so trace can point to prompt, stdout, stderr, and exec audit files.
- Documented trace usage in CLI and pipeline docs.
Verification
python3 -m pytest test/test_runtime.py -q-> 330 passedpython3 -m pytest test/test_skill_contracts.py test/test_smoke_playground.py -q-> 13 passedpython3 -m ruff format --check runtime/ test/-> cleanbats test/smoke.bats-> 26 passedbats test/test_git_isolate.bats test/test_verify_stage.bats-> 22 passed
v3.5.0 - Runner Trust
Highlights
- Hardened
agenteam-rt runso non-zero role exits fail stages and runs authoritatively. - Added verify retry, conservative
rework_torepair dispatch, blocked gate handling, gated-run resume, and final verification retry support. - Kept runner stdout as a pure JSONL event stream while persisting history quietly.
- Added durable runner events for role execution, retries, rework, gates, and final verification.
- Documented the runtime/executor boundary for the non-interactive runner.
Verification
- GitHub Actions passed for Python 3.11 and 3.12.
- Plugin scanner passed.
- Local release checks passed: runtime pytest, skill/smoke playground pytest, ruff format check, smoke bats, git-isolate bats, and verify-stage bats.
v3.4.0
Highlights
- add governed-delivery foundations with
agenteam-rt governed-bootstrapto scaffold local governance assets - add structured decision logging via
agenteam-rt decision append|list|render-log - add config-driven tripwire evaluation via
agenteam-rt tripwire check - add optional governance metadata on runs and surface it in status and standup output
- update public docs for the new optional workflow while keeping internal planning docs private
Verification
python3 -m pytest -qbats test/smoke.batspython3 -m ruff check runtime/agenteam/cli.py runtime/agenteam/governance.py runtime/agenteam/standup.py runtime/agenteam/state.py test/test_runtime.pypython3 -m black --check runtime/agenteam/governance.py runtime/agenteam/state.py test/test_runtime.py
v3.3.0 — Non-Interactive Runner
What's New
agenteam-rt run — Full Pipeline via codex exec
Drive the complete AgenTeam pipeline non-interactively — for CI, experiments, cron, and remote execution.
# Run a task through the full pipeline
agenteam-rt run --task "add user auth" --auto-approve-gates
# From a seed file with a specific profile
agenteam-rt run --task-file seed.md --profile standard --output-dir ./out
# Resume a crashed/interrupted run
agenteam-rt run --run-id <id>Key features:
- Invokes
codex execvia documented stdin path (--json --full-auto -) - Fail-fast gates: human gates stop by default;
--auto-approve-gatesfor autonomous mode - Dual output: per-stage/role directory structure + JSONL event stream to stdout
- Prompt audit: exact prompt and structured prompt-build output saved before each dispatch
- Resume:
--run-idpicks up from the last incomplete stage - History on failure: persists run history for both successful and failed runs
Output structure:
.agenteam/runs/<run-id>/
run.json # Run summary
events.jsonl # All events
implement/dev/ # Per-stage/role
prompt.txt # Exact prompt sent
prompt-build.json # Structured components
stdout.txt / stderr.txt / exec.json
Built on existing primitives: prompt-build, init, transition, record-verify, record-gate, event append, history append. No new runtime primitives beyond runner.py.
Numbers
- 7 files changed, 533 lines added
- 5 new tests (302 total, 0 regressions)
- New module:
runtime/agenteam/runner.py
v3.2.1: Stabilization
Roadmap synced through v3.2. Added prompt-build contract test. 297 tests passing.
Next up: v3.3 Non-Interactive Runner (agenteam-rt run).
v3.2.0 — Prompt Build: Non-Interactive Pipeline Primitive
What's New
prompt-build — Drive AgenTeam from codex exec, CI, and experiment harnesses
Closes #23. New command that returns the fully composed prompt for a role dispatch as structured JSON:
agenteam-rt prompt-build --run-id <id> --stage implement --role devOutput includes:
schema_version: "1"— for future compatibilityagent.developer_instructions— from resolved role config (deep-merged defaults + overrides)task.raw/task.effective— original task + deterministic prior-run contextartifacts.search_paths— always populated, mode-aware (standalone vs HOTL)artifacts.selected— best-effort files from current runhandoff_contract,verification,dispatch_context— structured objectshotl— eligibility check (graceful without HOTL)prompt_sections— ordered list of prompt components for experiment harnessesprompt— fully composed string, ready forcodex exec --prompt
Key design decisions:
- Deterministic: Prior-run context via
build_visible_memory(), not LLM judgment - Structured first: Machine-shaped subfields for programmatic access;
promptas convenience - Best-effort artifacts:
search_pathsis the reliable contract;selecteduses mtime heuristic - Extracted helper:
build_developer_instructions()shared between TOML generation and prompt-build
Use cases unlocked
- Experiment harnesses (agento): run baseline-vs-agenteam comparisons programmatically
- CI/scheduled runs: trigger AgenTeam pipelines from GitHub Actions or cron
- Custom runners: build your own stage loop with
prompt-build+codex exec+record-verify
Numbers
- 8 files changed, 463 lines added
- 7 new tests (296 total, 0 regressions)
v3.1.0 — CI Repair Loop
What's New
CI Repair Skill
New skill $ateam:ci-repair — fix CI failures without re-running the full pipeline.
$ateam:ci-repair #42
$ateam:ci-repair feature/add-auth
How it works:
- Resolves the PR and finds the latest CI run for the current head commit
- Fetches failure context: which job/step failed + truncated logs (max 400 lines)
- Preflights git state (clean worktree, correct branch)
- Dispatches dev with bounded failure context and debugging guidance
- Runs local verification before pushing
- Pushes only repair changes (baseline diff, never
git add -A)
Safety features:
- Commit-keyed: Only repairs if the latest run for the PR's HEAD is failing (won't repair old failures when CI already passed)
- Git preflight: Rejects dirty worktree or detached HEAD
- Verify before push: Local verification must pass; unverified fixes require explicit user confirmation
- Bounded context: Max 3 failed jobs, 100 lines per step, 400 total — keeps the repair prompt focused
GitHub Actions only for v3.1. One repair cycle per invocation. Manual trigger (no auto-polling).
Numbers
- 7 files changed, 255 lines added
- 1 new contract test (289 total, 0 regressions)
v3.0.0 — Run Observability + Cross-Model Review
What's New
Run Observability
Users now get visibility into pipeline progress during execution — the #1 competitive gap across all multi-agent tools.
status --progress — compact progress view:
AgenTeam Run: 20260410T120000Z
Task: add user authentication
Status: running
Elapsed: 4m 32s
Stages:
research ✓ completed (0m 45s)
strategy ✓ completed (0m 30s)
design ✓ completed (1m 02s)
implement → verifying (1m 15s) [verify attempt 2/3]
test · pending
review · pending
Active Lock: dev
Last Event: stage_verified (implement) — fail, attempt 2
event tail --run-id <id> — stream events from the JSONL log in real time. Exits on run_finished or Ctrl-C. Power-user tool for monitoring from another terminal.
In-thread progress — the run skill now emits a progress snapshot after every stage transition (dispatch, verify, gate, completion). Users see pipeline state in the conversation thread without a separate terminal.
Stage timestamps — transition() now writes started_at on dispatch and completed_at on completion/skip, enabling elapsed time computation.
Cross-Model Review Guidance
Using the same model for dev and reviewer can produce sycophantic reviews. New documentation recommends different models:
roles:
dev:
model: gpt-5.3-codex # writes code
reviewer:
model: o3-pro # reviews with different reasoningBackward Compatible
status without --progress still returns the raw JSON state dump. No breaking changes.
Numbers
- 13 files changed, 299 lines added
- 11 new tests (288 total, 0 regressions)
v2.11.2
v2.11.2 makes AgenTeam's carry-forward memory visible in the runtime surfaces people already use.
Highlights:
statusnow includes amemoryblock with concise carry-forward lessons from compatible prior runsstandupnow surfaces the same visible memory alongside current health and dispatch context- memory is built from existing run history and filtered by the same compatibility rules added in v2.11.1, so stale or legacy runs do not pollute the output
- docs and tests now cover visible memory, empty-memory behavior, and incompatible-history filtering