Skip to content

archon-refactor-safely: analysis node silently drops plan when execute node fails — workflow reports "completed" with zero changes made #1477

@hlyl

Description

@hlyl

Summary
When running archon-refactor-safely (v0.3.9), the analysis phase successfully produces a detailed refactoring plan and impact analysis, but cannot persist them to disk because the node runs with denied_tools: [Write, Edit, Bash]. The downstream execute-refactor node then fails with repeated 401 Unauthorized errors against api.openai.com before it can receive the plan from context. The workflow reports completed with no code changed, no PR created, and no error surfaced to the user.
Steps to Reproduce

Register a Python project in Archon
Run the archon-refactor-safely workflow against a large monolithic file (e.g. app.py, ~2000 lines)
Observe the workflow run to completion (~20 minutes)
Check the branch — zero commits, no new files, no PR

Observed Behaviour
The analysis node (explore / plan) runs correctly and produces thorough output — a full impact analysis and a 9-task ordered refactoring plan. However, when the agent attempts to write these as artifact files, it cannot find Write, Edit, or Bash tools (correctly blocked by denied_tools). The logs show the agent trying dozens of ToolSearch calls looking for any writable tool:
ToolSearch select:Write 2.7s
ToolSearch Write 3.2s
ToolSearch select:Write 2.5s
ToolSearch file write create 3.6s
ToolSearch Write file create new 4.3s
ToolSearch select:Bash,Edit,Write 3.1s
ToolSearch Bash shell execute command 4.2s
...
ToolSearch Bash shell terminal subprocess 96.0s ← gave up here
After exhausting tool search, the plan is printed to the log and silently dropped. The execute-refactor node then fires but immediately hits a series of 401 Unauthorized errors on api.openai.com:
⚠️ Reconnecting... 2/5 (401 Unauthorized ... url: wss://api.openai.com/v1/responses)
⚠️ Reconnecting... 3/5 (401 Unauthorized ...)
⚠️ Reconnecting... 4/5 (401 Unauthorized ...)
⚠️ Reconnecting... 5/5 (401 Unauthorized ...)
The node also emits: Warning: Node 'execute-refactor' uses hooks but codex doesn't support it — this will be ignored.
After reconnect exhaustion the node falls back to basic git status / git log checks, confirms the branch is empty, and exits. The workflow is marked completed.
Expected Behaviour
Either:

The workflow should fail with a clear error explaining that the plan could not be persisted and execution was skipped, or
A dedicated bash node between analysis and execution should write the plan to $ARTIFACTS_DIR/refactor-plan.md, so the plan survives even if the execute node crashes

Why This Is Worse Than a Normal Failure
The workflow completes with status completed and the run took ~20 minutes. Nothing in the UI indicates that zero work was done. A user has to manually check the branch to discover the failure. This is a silent data loss scenario — significant planning work (tokens, time) is consumed and discarded.
Two Distinct Problems
These can be fixed independently but both need addressing:

  1. Plan artifact lost at context boundary (the core bug)
    The archon-refactor-safely workflow design assumes the analysis node can write refactor-plan.md to disk, but denied_tools: [Write, Edit, Bash] prevents this. The plan is never persisted. Even if the execute node worked perfectly, it would have nothing to consume. A bash node should write the plan after the analysis phase — bash nodes are not subject to the AI tool restrictions.
  2. execute-refactor node silently fails on missing OpenAI key
    The node appears to use a Codex/OpenAI provider but no key is configured (or it has expired). Instead of failing fast with a clear error, it retries 5 times over ~30 seconds and then quietly falls back to read-only git commands. The hooks... will be ignored warning suggests the node is misconfigured for this environment. This should be a hard failure with a clear message.
    Environment

Archon version: v0.3.9
Workflow: archon-refactor-safely (bundled default)
Project type: Python (no package.json)
Platform: Web UI

Secondary Issue Observed
The execute-refactor node's validation step runs a Node.js/bun validation harness (uv run validate, bun run validate) against a Python project. All checks fail with "Script not found" because no package.json exists. The node incorrectly concludes "all checks passed — false negatives from a Node.js harness" and does not surface this mismatch. The validation step should detect project type before choosing a harness.
Suggested Fix
In the archon-refactor-safely workflow YAML, add a bash node between the analysis and execute phases:
yaml- id: persist-plan
bash: |
cat > "$ARTIFACTS_DIR/refactor-plan.md" << 'PLAN'
$analyze.output
PLAN
depends_on: [analyze]
This ensures the plan survives regardless of what happens to the execute node, and makes it available for DAG resume if the run needs to be retried.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High priority - Address soon, next in queuearea: workflowsWorkflow enginebugSomething is brokeneffort/mediumFew files, one domain or module, some coordination needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions