Add multi-agent workflow review stages

## Problem

Workflows currently let one agent perform the main implementation path, but there is no first-class way to run a separate model as a post-code, pre-PR judge. That makes it hard to enforce an independent review pass before a workflow pushes, opens a PR, or marks the run done.

We want workflows to support multi-agent stages where one stage can implement and another stage can review the result using a different model/provider.

## Goal

Add workflow review stages that can run after code changes and tests, before PR creation. A review stage should be able to inspect constrained inputs, produce a structured verdict, and optionally block or send findings back to the implementing agent.

Example use cases:

- Post-code review with a different model family than the implementer.
- Security or regression judge before PR creation.
- Test adequacy review after unit/E2E output is available.
- Policy enforcement before workflows mark themselves complete.

## Proposed workflow shape

```yaml
stages:
  - name: implement
    agent:
      model: gpt-5
      instructions: |
        Implement the issue. Commit changes.

  - name: test
    run:
      command: go test ./...

  - name: review
    judge:
      model: claude-sonnet-4
      inputs:
        - issue
        - git_diff
        - test_output
      require:
        verdict: pass
      instructions: |
        Review the diff for correctness, security, regressions, and missing tests.
        Return pass/fail with specific required fixes.

  - name: pr
    run:
      command: gh pr create ...
      if: stages.review.verdict == "pass"
```

The exact YAML can change, but the concept should be explicit: a judge/review stage is different from the main implementing agent stage.

## Behavioral requirements

- A judge stage can choose a different model/provider from the implementer.
- Judge inputs should be constrained and explicit, such as issue text, git diff, changed files, test output, CI output, or prior stage artifacts.
- Judge output should be structured, not just free-form prose. At minimum:
  - `verdict`: `pass` or `fail`
  - `summary`
  - `findings`: file/line/comment/severity when available
  - `required_fixes`
- Workflows should be able to decide whether a failed judge blocks PR creation, opens a PR with a warning, or sends findings back to the implementer.
- Judge output should be visible in the claw chat and persisted as a workflow/stage artifact.
- The workflow must have hard bounds: max attempts, timeout, and token/cost limits.

## MVP proposal

Start with a bounded review loop:

1. Implementer runs.
2. Tests run.
3. Judge reviews issue + diff + test output.
4. If judge passes, continue to PR creation.
5. If judge fails, send findings back to the same implementer once.
6. Re-run tests.
7. Run one final judge pass.
8. If still failing, stop or follow the workflow's configured failure behavior.

Avoid unbounded agent debate in the first version.

## Open questions

- Should judge stages be allowed to edit files, or should they be read-only by default?
- Should judge findings become normal chat messages, stage artifacts, PR comments, or all three?
- Should the implementer receive the judge's full reasoning or only structured findings?
- How should workflows reference prior stage outputs in conditions?
- Do we need built-in judge presets such as `code_review`, `security_review`, and `test_review`?

## Acceptance criteria

- A workflow can define a model-backed review/judge stage after implementation and tests.
- The judge can use a different model from the implementer.
- The judge receives explicit bounded inputs, including at least issue context, git diff, and test output.
- The judge returns a structured verdict that can block PR creation.
- The review result is visible in the UI/chat and stored as a stage artifact.
- The implementation supports a bounded one-retry feedback loop from judge to implementer.
- Tests cover pass, fail, retry-once, and block-PR behavior.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-agent workflow review stages #353

Problem

Goal

Proposed workflow shape

Behavioral requirements

MVP proposal

Open questions

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add multi-agent workflow review stages #353

Description

Problem

Goal

Proposed workflow shape

Behavioral requirements

MVP proposal

Open questions

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions