Skip to content

Loop node reports max_iterations_reached despite completion signal being present in output #1126

@SteveGJones

Description

@SteveGJones

Environment

  • Archon v0.3.5 running from source
  • Docker on Apple Silicon (ARM64 Linux containers)
  • Workflow with loop: node using until: ALL_CLEAN and max_iterations: 3

Problem

A loop node with until: ALL_CLEAN and max_iterations: 3 runs all 3 iterations successfully. The Claude agent outputs the completion signal matching the until value in its iteration output. Despite the signal being present, Archon reports max_iterations_reached and exits the loop with code 1.

This causes dependent nodes (e.g., a final-report node with depends_on: [fix-and-review]) to be skipped, because the loop node is treated as failed.

Expected Behaviour

When the completion signal (ALL_CLEAN) is present in the loop iteration output, the loop should terminate with success (exit 0) regardless of which iteration number it was found in. Reaching max_iterations should only be a failure when the signal was NOT found.

Workflow Definition

nodes:
  - id: fix-and-review
    model: claude-sonnet-4-6
    context: fresh
    loop:
      prompt: |
        # Fix-Review Iteration
        ...
        If BOTH reviews pass with zero issues found, output:
        <COMPLETE>ALL_CLEAN</COMPLETE>
      until: ALL_CLEAN
      max_iterations: 3
      fresh_context: true

  - id: final-report
    depends_on: [fix-and-review]
    prompt: |
      Generate a final report...

Observed Behaviour

Tested twice with identical results:

Run 1:

  • Iterations: 3 (all completed, fixes applied in iteration 1-2)
  • Signal: ALL_CLEAN emitted in output
  • Result: max_iterations_reached, exit code 1
  • final-report: skipped

Run 2:

  • Iterations: 3 (all completed, fixes applied in iteration 1)
  • Signal: ALL_CLEAN emitted in output
  • Result: max_iterations_reached, exit code 1
  • final-report: skipped

In both runs, the actual work completed correctly — code was fixed, tests were added, commits were made. Only the loop termination status was wrong.

Possible Causes

  1. Signal detection may run before the full iteration output is captured
  2. The signal may need a specific format (bare string vs wrapped in tags)
  3. Max iterations check may take precedence over signal detection on the final iteration
  4. The signal may be present in stdout but not in the captured node_output field that Archon checks

Impact

Any workflow using loop: with until: for iterative fix-review cycles will report failure even when the work completes successfully. Dependent nodes (reports, PR creation, notifications) are skipped.

Workaround

None currently identified. The underlying work completes correctly — all code fixes are applied and committed. Only the loop termination status and dependent node execution are affected. We're investigating using max_iterations without until: and having the agent track completion state via filesystem as an alternative pattern.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High priority - Address soon, next in queuearea: workflowsWorkflow enginebugSomething is brokeneffort/mediumFew files, one domain or module, some coordination needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions