Skip to content

feat: add multi-agent workflow review stages (judge)#368

Merged
marccampbell merged 2 commits into
mainfrom
feat/353-judge-stage
Jun 6, 2026
Merged

feat: add multi-agent workflow review stages (judge)#368
marccampbell merged 2 commits into
mainfrom
feat/353-judge-stage

Conversation

@elasticclaw-factory

@elasticclaw-factory elasticclaw-factory Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Closes #353

Summary

Adds first-class support for model-backed review/judge stages in workflow pipelines. A judge stage can run after implementation and tests, using a different model/provider to review the work before PR creation.

Changes

Pipeline Schema (pkg/hub/pipeline/pipeline.go)

  • New JudgeAction in OnEnter:

    • model -- different LLM model than the implementer
    • inputs -- bounded inputs: issue, git_diff, test_output, files
    • require.verdict -- required verdict (pass or fail)
    • instructions -- system prompt for the judge
    • output -- pipeline output name for structured verdict storage
    • continue_on_error -- allow pipeline to continue on judge failure
    • max_tokens / timeout -- response limits
  • New judge_verdict trigger type -- enables auto-transition after judge:

    • judge_verdict: pass -> transition to PR stage
    • judge_verdict: fail -> transition to fix stage

Execution Engine (pkg/hub/pipeline_runner.go)

  • executeJudgeAction() -- collects bounded inputs, calls LLM, parses structured JSON
  • parseJudgeResponse() -- extracts JSON from markdown fences, validates schema
  • autoTransitionAfterJudge() -- finds next stage by verdict and transitions
  • Judge results persisted as pipeline outputs for later stage reference
  • Judge results injected into claw chat as structured messages

Tests

  • TestParseJudgeAction -- full judge action parsing
  • TestParseJudgeActionMinimal -- minimal config parsing
  • TestJudgeInputConstants -- input type constants
  • TestStageForJudgeVerdictPass/NoMatch/CaseInsensitive -- trigger matching
  • TestParseJudgeResponseValid/Fail/WithMarkdownFences/MissingVerdict/InvalidVerdict -- response parsing
  • TestJudgeTimeoutDefault/Custom/Invalid -- timeout handling
  • TestRunOnEnterJudgeBlocksOnFail -- blocking behavior
  • TestRunOnEnterJudgeContinueOnError -- continue_on_error behavior
  • TestAutoTransitionAfterJudge -- auto-transition logic

Verification

  • go build ./... passes
  • go test ./pkg/hub/... passes
  • go test ./pkg/hub/pipeline/... passes

Example Workflow

stages:
  - id: implement
    label: "Implement"
    entry: true
    on_enter:
      inject: |
        Implement the issue. Commit changes. Run tests. When done, say "[DONE]".

  - id: test
    label: "Test"
    triggers:
      - message_contains: "[DONE]"
    on_enter:
      run:
        command: go test ./...
        output: test_output

  - id: review
    label: "Review"
    triggers:
      - message_contains: "[DONE]"
    on_enter:
      judge:
        model: anthropic/claude-sonnet-4-6
        inputs:
          - issue
          - git_diff
          - test_output
        require:
          verdict: pass
        instructions: |
          Review the diff for correctness, security, regressions, and missing tests.
          Return pass/fail with specific required fixes.
        output: review_result
        timeout: 2m

  - id: pr
    label: "Create PR"
    triggers:
      - judge_verdict: pass
    on_enter:
      inject: |
        Review passed. Create a PR with: [DONE] https://github.com/org/repo/pull/...

  - id: fix
    label: "Fix Issues"
    triggers:
      - judge_verdict: fail
    on_enter:
      inject: |
        Review failed. Please fix these issues and re-run tests:

        {{ .Outputs.review_result.required_fixes }}

        Then say "[DONE]" again.

  - id: merged
    label: "Merged"
    triggers:
      - pr_merged:
    terminal: true

  - id: stopped
    label: "Stopped"
    triggers:
      - message_contains: "stop"
    terminal: true

This workflow:

  1. Implement stage — agent implements the issue
  2. Test stage — runs go test ./... on [DONE], stores output
  3. Review stage — judge (different model) reviews issue + diff + test output
  4. If pass → auto-transition to PR stage
  5. If fail → auto-transition to Fix stage with findings injected
  6. Agent fixes, re-runs tests, hits review again (bounded retry)
  7. Merged or Stopped terminal stages

- Add JudgeAction to pipeline OnEnter with model, inputs, require.verdict,
  instructions, output, continue_on_error, max_tokens, timeout
- Add JudgeInput constants: issue, git_diff, test_output, files
- Add executeJudgeAction() that calls LLM with constrained inputs and
  parses structured JSON response (verdict, summary, findings, required_fixes)
- Add judge_verdict trigger type for auto-transition after judge (pass/fail)
- Add autoTransitionAfterJudge() for bounded retry loops
- Persist judge output as pipeline artifact for later stage reference
- Inject judge results into claw chat (pass/fail with findings)
- Block or continue based on required verdict and continue_on_error
- Add comprehensive tests for parsing, execution, and transitions
@greptile-apps

greptile-apps Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Reviews (1): Last reviewed commit: "feat: add multi-agent workflow review st..." | Re-trigger Greptile

Comment thread pkg/hub/pipeline_runner.go Outdated
Comment thread pkg/hub/pipeline_runner.go
Comment thread pkg/hub/pipeline_runner.go Outdated
- Remove unused stageID parameter from executeJudgeAction
- Fix fragile JSON extraction: use brace counting instead of strings.LastIndex
  to properly handle nested braces, escaped quotes, and trailing text
- Fix P1 bug: move autoTransitionAfterJudge before require.verdict check
  so judge_verdict triggers always fire even when verdict doesn't match
  required value (enables fail->fix retry loops)
- Add tests for trailing text, nested braces, and escaped quotes
@greptile-apps

greptile-apps Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Reviews (2): Last reviewed commit: "fix: address greptile review comments on..." | Re-trigger Greptile

@marccampbell marccampbell merged commit 1e1a861 into main Jun 6, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add multi-agent workflow review stages

1 participant