Skip to content

bug(workflows): loop node model: directive silently ignored → loop body runs on global default #1082

@mhooooo

Description

@mhooooo

Summary

Workflow YAML can declare model: on a loop node to control which model the loop body uses for its AI iterations. The loader emits a warning (loop_node_ai_fields_ignored) but accepts the YAML without error, and the loop body falls back to the global default model from ~/.archon/config.yaml. This affects archon-ralph-dag.yaml — its implement node declares model: claude-opus-4-6[1m] but runs on whatever assistants.claude.model is set to.

Expected

Either:

  1. (Loader fix) Loop nodes honor per-node model: directives — the directive is forwarded to each iteration's AI call, so claude-opus-4-6[1m] is actually used.
  2. (Cosmetic fix) archon-ralph-dag.yaml doesn't declare a model: field that will be silently ignored. Users who see model: claude-opus-4-6[1m] in the YAML reasonably assume it'll be honored.

Actual

  • Loader emits a warning at load time: {module: "workflow.loader", id: "implement", fields: ["model"], msg: "loop_node_ai_fields_ignored"}
  • Loop body runs on global default from ~/.archon/config.yaml → assistants.claude.model
  • No error surfaced to the user at dispatch time — the workflow runs "successfully" but not on the expected model

Reproduction

Config:

# ~/.archon/config.yaml
assistants:
  claude:
    model: sonnet   # the global default

Dispatch:

archon workflow run archon-ralph-dag --branch feat/test ".archon/ralph/{some-prd}"

During execution, inspect the running claude-agent-sdk subprocess:

ps aux | grep claude-agent-sdk

Expected: --model opus or --model claude-opus-4-6[1m] (per the workflow YAML's model: directive on the implement loop node).
Actual: --model sonnet (the global default).

Source references

packages/workflows/src/loader.ts:60-82 — the strip-AI-fields logic:

const isNonAiNode =
  ('bash' in node && typeof node.bash === 'string') ||
  isScriptNode(node) ||
  isLoopNode(node) ||          // ← loop nodes classified as non-AI
  isApprovalNode(node) ||
  isCancelNode(node);
if (isNonAiNode) {
  // ...
  const aiFields = isScriptNode(node) ? SCRIPT_NODE_AI_FIELDS : BASH_NODE_AI_FIELDS;
  const presentAiFields = aiFields.filter(f => (raw as Record<string, unknown>)[f] !== undefined);
  if (presentAiFields.length > 0) {
    getLog().warn({ id: node.id, fields: presentAiFields }, `${nodeType}_node_ai_fields_ignored`);
  }
}

Loop nodes fall into the isNonAiNode branch, aiFields resolves to BASH_NODE_AI_FIELDS (since isScriptNode(node) is false), the warning fires, but the underlying handling of these fields in downstream consumers (dag-executor.ts) doesn't read them from the loop node at all. The directive is effectively dropped.

.archon/workflows/defaults/archon-ralph-dag.yaml (line ~192):

- id: implement
  depends_on: [validate-prd]
  idle_timeout: 600000
  model: claude-opus-4-6[1m]   # ← silently dropped
  loop:
    prompt: |
      # Ralph Agent — Autonomous Story Implementation
      ...

Why this matters

Ralph implementation loops are the highest-value AI calls in Archon — they're the ones actually writing multi-file feature code from PRD stories. Users who read model: claude-opus-4-6[1m] in archon-ralph-dag.yaml reasonably assume the implementation will use Opus 4.6 with 1M context. When that directive is silently dropped, users get Sonnet instead (or whatever their global default is), which is a real quality and cost difference that's hard to catch without looking at the subprocess args.

In my case: I noticed because I specifically asked "is this running on opus?" and traced the subprocess args. If I hadn't asked, I would've finished a 23-commit ralph run believing it was opus output.

Open questions (for upstream guidance)

  1. Is the loop-node AI-field strip intentional? My guess is no — the model: field on a loop node is clearly meant to apply to iterations, and the loader explicitly warns about it (suggesting awareness of the limitation). But I want to confirm before submitting a fix.
  2. If the loader should honor model: on loop nodes, how should it plumb through? dag-executor.ts already has the resolveNodeProvider function that handles node.model ?? workflowModel ?? config.assistants[provider]?.model. It looks like the loop node just needs to be allowed to reach that code path. I'm happy to submit the PR if you confirm the intent.
  3. Should workflow-level model: (declared at the top of the YAML, not per-node) be honored by loop nodes as a workaround? I haven't tested this yet — if yes, it's a temporary mitigation users can apply until the per-node fix lands.

Workaround (current)

Update ~/.archon/config.yaml globally:

assistants:
  claude:
    model: claude-opus-4-6[1m]

This affects every Claude call Archon makes (Slack bot chat, other workflows, classifiers), so it's not a precision tool. But it's the only way to force ralph-dag's implement loop onto opus today.

Environment

  • Archon CLI v0.3.5 (source, git commit 14790d7)
  • macOS Darwin 25.3.0 (Mac Mini M4)
  • Bun v1.3.11
  • @anthropic-ai/claude-agent-sdk@0.2.89

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions