feat: add multi-agent workflow review stages (judge) by elasticclaw-factory[bot] · Pull Request #368 · elasticclaw/elasticclaw

elasticclaw-factory · 2026-06-06T19:08:28Z

Closes #353

Summary

Adds first-class support for model-backed review/judge stages in workflow pipelines. A judge stage can run after implementation and tests, using a different model/provider to review the work before PR creation.

Changes

Pipeline Schema (pkg/hub/pipeline/pipeline.go)

New JudgeAction in OnEnter:
- model -- different LLM model than the implementer
- inputs -- bounded inputs: issue, git_diff, test_output, files
- require.verdict -- required verdict (pass or fail)
- instructions -- system prompt for the judge
- output -- pipeline output name for structured verdict storage
- continue_on_error -- allow pipeline to continue on judge failure
- max_tokens / timeout -- response limits
New judge_verdict trigger type -- enables auto-transition after judge:
- judge_verdict: pass -> transition to PR stage
- judge_verdict: fail -> transition to fix stage

Execution Engine (pkg/hub/pipeline_runner.go)

executeJudgeAction() -- collects bounded inputs, calls LLM, parses structured JSON
parseJudgeResponse() -- extracts JSON from markdown fences, validates schema
autoTransitionAfterJudge() -- finds next stage by verdict and transitions
Judge results persisted as pipeline outputs for later stage reference
Judge results injected into claw chat as structured messages

Tests

TestParseJudgeAction -- full judge action parsing
TestParseJudgeActionMinimal -- minimal config parsing
TestJudgeInputConstants -- input type constants
TestStageForJudgeVerdictPass/NoMatch/CaseInsensitive -- trigger matching
TestParseJudgeResponseValid/Fail/WithMarkdownFences/MissingVerdict/InvalidVerdict -- response parsing
TestJudgeTimeoutDefault/Custom/Invalid -- timeout handling
TestRunOnEnterJudgeBlocksOnFail -- blocking behavior
TestRunOnEnterJudgeContinueOnError -- continue_on_error behavior
TestAutoTransitionAfterJudge -- auto-transition logic

Verification

go build ./... passes
go test ./pkg/hub/... passes
go test ./pkg/hub/pipeline/... passes

Example Workflow

stages:
  - id: implement
    label: "Implement"
    entry: true
    on_enter:
      inject: |
        Implement the issue. Commit changes. Run tests. When done, say "[DONE]".

  - id: test
    label: "Test"
    triggers:
      - message_contains: "[DONE]"
    on_enter:
      run:
        command: go test ./...
        output: test_output

  - id: review
    label: "Review"
    triggers:
      - message_contains: "[DONE]"
    on_enter:
      judge:
        model: anthropic/claude-sonnet-4-6
        inputs:
          - issue
          - git_diff
          - test_output
        require:
          verdict: pass
        instructions: |
          Review the diff for correctness, security, regressions, and missing tests.
          Return pass/fail with specific required fixes.
        output: review_result
        timeout: 2m

  - id: pr
    label: "Create PR"
    triggers:
      - judge_verdict: pass
    on_enter:
      inject: |
        Review passed. Create a PR with: [DONE] https://github.com/org/repo/pull/...

  - id: fix
    label: "Fix Issues"
    triggers:
      - judge_verdict: fail
    on_enter:
      inject: |
        Review failed. Please fix these issues and re-run tests:

        {{ .Outputs.review_result.required_fixes }}

        Then say "[DONE]" again.

  - id: merged
    label: "Merged"
    triggers:
      - pr_merged:
    terminal: true

  - id: stopped
    label: "Stopped"
    triggers:
      - message_contains: "stop"
    terminal: true

This workflow:

Implement stage — agent implements the issue
Test stage — runs go test ./... on [DONE], stores output
Review stage — judge (different model) reviews issue + diff + test output
If pass → auto-transition to PR stage
If fail → auto-transition to Fix stage with findings injected
Agent fixes, re-runs tests, hits review again (bounded retry)
Merged or Stopped terminal stages

- Add JudgeAction to pipeline OnEnter with model, inputs, require.verdict, instructions, output, continue_on_error, max_tokens, timeout - Add JudgeInput constants: issue, git_diff, test_output, files - Add executeJudgeAction() that calls LLM with constrained inputs and parses structured JSON response (verdict, summary, findings, required_fixes) - Add judge_verdict trigger type for auto-transition after judge (pass/fail) - Add autoTransitionAfterJudge() for bounded retry loops - Persist judge output as pipeline artifact for later stage reference - Inject judge results into claw chat (pass/fail with findings) - Block or continue based on required verdict and continue_on_error - Add comprehensive tests for parsing, execution, and transitions

greptile-apps · 2026-06-06T19:13:26Z

_{Reviews (1): Last reviewed commit: "feat: add multi-agent workflow review st..." | Re-trigger Greptile}

- Remove unused stageID parameter from executeJudgeAction - Fix fragile JSON extraction: use brace counting instead of strings.LastIndex to properly handle nested braces, escaped quotes, and trailing text - Fix P1 bug: move autoTransitionAfterJudge before require.verdict check so judge_verdict triggers always fire even when verdict doesn't match required value (enables fail->fix retry loops) - Add tests for trailing text, nested braces, and escaped quotes

greptile-apps · 2026-06-06T19:18:50Z

_{Reviews (2): Last reviewed commit: "fix: address greptile review comments on..." | Re-trigger Greptile}

greptile-apps Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread pkg/hub/pipeline_runner.go Outdated

Comment thread pkg/hub/pipeline_runner.go

Comment thread pkg/hub/pipeline_runner.go Outdated

marccampbell merged commit 1e1a861 into main Jun 6, 2026
11 checks passed

marccampbell mentioned this pull request Jun 6, 2026

Add declarative tool review gates for workflow run outputs #369

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add multi-agent workflow review stages (judge)#368

feat: add multi-agent workflow review stages (judge)#368
marccampbell merged 2 commits into
mainfrom
feat/353-judge-stage

elasticclaw-factory Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elasticclaw-factory Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Pipeline Schema (pkg/hub/pipeline/pipeline.go)

Execution Engine (pkg/hub/pipeline_runner.go)

Tests

Verification

Example Workflow

Uh oh!

greptile-apps Bot commented Jun 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

elasticclaw-factory Bot commented Jun 6, 2026 •

edited

Loading