Skip to content

fix(providers,workflows): treat Claude SDK stop_sequence success as success (#1425)#1662

Merged
Wirasm merged 2 commits into
devfrom
fix/1425-claude-success-stop-sequence
May 13, 2026
Merged

fix(providers,workflows): treat Claude SDK stop_sequence success as success (#1425)#1662
Wirasm merged 2 commits into
devfrom
fix/1425-claude-success-stop-sequence

Conversation

@Wirasm

@Wirasm Wirasm commented May 13, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Problem: Claude Agent SDK's SDKResultSuccess declares is_error: boolean (not literal false). Stop_sequence terminations carry is_error: true + subtype: 'success' — the SDK's encoding of "non-default termination, but not a failure". The claude provider forwarded this to MessageChunk.isError verbatim, so three downstream consumers (dag-executor main, dag-executor loop, orchestrator-agent direct chat) misclassified clean terminations as node failures with the contradictory user-hostile error "Node '<X>' failed: SDK returned success" — even though the AI had completed correctly and written its output.
  • Why it matters: Three reporters across v0.3.9 and v0.3.11 (latest 2026-05-13). Review-classify and synthesis pipelines, plus loop-back wrapper workflows, hard-fail and skip downstream trigger-rule joins (synthesize, self-fix, simplify, report). Users blocked unless they fork+patch+rebuild; archon's workflow schema does not expose stop_sequences and disabling output_format would break downstream when: conditions.
  • What changed: Provider boundary normalises is_error: true + subtype: 'success' as a clean success — no isError propagation downstream; debug-level claude.result_success_stop_sequence log preserves observability. Defense-in-depth errorSubtype !== 'success' guards added to dag-executor.ts main path and loop branch so a future third-party IAgentProvider forwarding the SDK pair raw can't reintroduce the bug. Regression tests for provider, DAG main, DAG loop, and orchestrator direct chat; guard test ensures real error subtypes (error_max_turns) still propagate.
  • What did NOT change (scope boundary): No change to MessageChunk type shape in packages/providers/src/types.ts (contract semantics tighten, type signature unchanged). No change to Codex turn.failed path (Bug A in the issue is already resolved by commit 4c6ddd99 + the current Codex turn.failed handler at codex/provider.ts:235-247). No change to orchestrator-agent.ts code (the provider-level fix covers direct chat transparently). No retry/auth-precheck behavioural improvements. No touch on related but separate bugs fix: interactive loop resume crashes with error_during_execution (stale session) #1208, bug(codex): retry loop reuses caller AbortSignal; crash on attempt N poisons attempt N+1 #1266.

UX Journey

Before

User                          Archon                                Claude SDK
────                          ──────                                ──────────
runs workflow with        ──▶  classify node executes
output_format: json_schema     SDK returns {is_error: true,
(implies stop_sequence)        subtype: 'success',
                               stop_reason: 'stop_sequence'}  ◀──── completes via stop sequence
                               provider forwards isError verbatim
                               dag-executor.ts:921 throws:
                                 "Node 'classify' failed:
                                  SDK returned success"
                               workflow exits with failure
                               trigger-rule joins skip
                               (synthesize/self-fix/simplify/report)
sees ❌ on a run whose  ◀──── output was written to disk;
classify output is valid       review pipeline produces nothing

After

User                          Archon                                Claude SDK
────                          ──────                                ──────────
runs workflow with        ──▶  classify node executes
output_format: json_schema     SDK returns {is_error: true,
                               subtype: 'success',
                               stop_reason: 'stop_sequence'}  ◀──── completes via stop sequence
                               [+] provider normalises:
                                   is_error + 'success' subtype
                                   ⇒ clean result, no isError
                                   [+] debug log: claude.result_success_stop_sequence
                               dag-executor.ts:921 guard:
                                   `isError && errorSubtype !== 'success'` ⇒ skip throw
                               node completes normally
                               trigger-rule joins fire
                               downstream nodes run on real output
sees ✅ classify output      ◀── review pipeline writes findings

Architecture Diagram

Before

@archon/providers
└── claude/provider.ts
    └── result handler: forwards SDK is_error verbatim
        ──▶ MessageChunk { isError: true, errorSubtype: 'success' }
                                            │
                                            ▼
@archon/workflows                @archon/core
├── dag-executor.ts              orchestrator-agent.ts
│   ├── main path:921             ├── handleStreamMode:1142
│   │   throws unconditionally    │   sends classifyAndFormatError("success")
│   │   on isError                │
│   └── loop branch:1947          └── handleBatchMode:1298
│       same throw                    same surface

After

@archon/providers
└── claude/provider.ts
    [~] result handler: normalises is_error + subtype: 'success'
        ──▶ MessageChunk { /* no isError when subtype === 'success' */ }
            [+] debug log: claude.result_success_stop_sequence
                                            │
                                            ▼
@archon/workflows                @archon/core
├── dag-executor.ts              orchestrator-agent.ts
│   ├── main path:921             ├── handleStreamMode:1142
│   │   [~] && errorSubtype       │   transparently fixed (chunk clean)
│   │       !== 'success'         │
│   └── loop branch:1947          └── handleBatchMode:1298
│       [~] same carve-out            transparently fixed (chunk clean)

Connection inventory:

From To Status Notes
claude/provider.ts result handler MessageChunk.isError modified No longer emits isError: true when subtype === 'success'
claude/provider.ts result handler logger modified New debug-level claude.result_success_stop_sequence event
dag-executor.ts:907-922 (main) throw new Error(...) modified New errorSubtype !== 'success' carve-out
dag-executor.ts:1932-1948 (loop) throw new Error(...) modified Same carve-out mirrored
orchestrator-agent.ts:1127, 1298 MessageChunk unchanged Transparently fixed via provider normalisation
Existing error_max_budget_usd carve-out at dag-executor.ts:894-903 unchanged Pattern mirrored for new carve-out

Label Snapshot

  • Risk: risk: low
  • Size: size: S
  • Scope: workflows, core (tests)
  • Module: providers:claude, workflows:executor, core:orchestrator (test)

Change Metadata

  • Change type: bug
  • Primary scope: multi (providers + workflows + core tests)

Linked Issue

Validation Evidence (required)

bun run validate   # ✅ exit 0

All six gates green on first invocation (after one prettier auto-format pass on the new provider test):

  • check:bundledbundled-defaults.generated.ts is up to date (36 commands, 20 workflows)
  • check:bundled-skillbundled-skill.ts is up to date (21 files)
  • type-check — all 10 packages exit 0
  • lint — 0 errors, 0 warnings
  • format:check — all files Prettier-clean
  • test — all per-package suites pass; new tests:
    • @archon/providers claude/provider.test.ts: +1 test (success-via-stop-sequence) → 26 → 27 in sendQuery decomposition behaviors
    • @archon/workflows dag-executor.test.ts: +2 tests (DAG main + loop branch) → 218 in the test file
    • @archon/core orchestrator-agent.test.ts: +1 test (direct chat contract) → 116 → 117

Skipped commands: none.

Security Impact (required)

  • New permissions/capabilities? No — only changes the conditional emission of an existing field on an existing chunk type, and a log level for one event.
  • New external network calls? No.
  • Secrets/tokens handling changed? No.
  • File system access scope changed? No.
  • Risk and mitigation: n/a.

Compatibility / Migration

  • Backward compatible? Yes — workflows that previously failed with "SDK returned success" will now complete cleanly. No workflow YAML changes required. Workflows that produced real errors (e.g. error_max_turns, error_during_execution, error_max_budget_usd) continue to fail loudly; the guard is subtype-scoped to 'success' only.
  • Config/env changes? No.
  • Database migration needed? No.
  • Upgrade steps: none.

Human Verification (required)

  • Verified scenarios:
    • bun run validate exits 0 on a clean checkout of this branch (after one prettier auto-format on the new test).
    • Individual per-package test runs: provider 74 pass, workflows dag-executor 218 pass, core orchestrator-agent 117 pass.
    • SDK type signature personally inspected at @anthropic-ai/claude-agent-sdk@0.2.121/sdk.d.ts:3117-3158SDKResultSuccess.is_error: boolean (not literal false), confirming the bug mechanism described in the investigation artifact.
    • Provider's else if (resultMsg.is_error === true && resultMsg.subtype === 'success') branch verified to fire on the new test case (asserted via not.toHaveBeenCalledWith(..., 'claude.result_is_error')).
  • Edge cases checked:
    • Real error subtype (error_max_turns) still propagates isError: true and fires the claude.result_is_error error log — guarded by the existing test at provider.test.ts:1223 which continues to pass.
    • Codex provider's turn.failed path (codex/provider.ts:235-247) audited — emits errorSubtype: 'codex_turn_failed', never 'success'. Unaffected.
    • Pi provider's event-bridge.ts:160,164,173 audited — emits 'missing_assistant_message' or last.stopReason ∈ {'error','aborted'}, never 'success'. Unaffected.
    • Empty-output guard at dag-executor.ts:1126 audited — still triggers if any node produces zero output AND no structured output. Defence-in-depth for Bug A (issue bug(workflow): two silent-failure paths in DAG executor — Codex turn.failed exits stream, Claude success-with-stop-sequence thrown as failure #1425) remains intact.
  • What was not verified:
    • End-to-end live run against archon-fix-github-issue on a real Claude output_format: json_schema node. The unit tests synthesise the exact (is_error, subtype, stop_reason) triple the SDK emits in this case, so behavioural coverage is equivalent.
    • Behaviour on Claude Agent SDK versions older than 0.2.121. The SDKResultSuccess.is_error: boolean type signature has been stable across the 0.2.x line (verified in 0.2.74 and 0.2.89 lockfile entries).

Side Effects / Blast Radius (required)

  • Affected subsystems:
    • @archon/providers — Claude provider only. Codex/Pi unaffected.
    • @archon/workflows — DAG executor's main and loop branches only.
    • @archon/core — orchestrator direct chat transparently fixed; no code change to orchestrator-agent.ts.
  • Potential unintended effects:
    • Stop_sequence-terminated runs that previously failed loudly with "SDK returned success" will now complete. This is the intended behaviour, but operators who monitored those failures as alerts will see them disappear. Logs preserve a debug-level claude.result_success_stop_sequence event for ops dashboards.
    • The claude.result_is_error error-level log no longer fires for subtype === 'success'. The new debug event is a clean replacement.
  • Guardrails / monitoring for early detection:
    • claude.result_success_stop_sequence log (debug) shows where the new code path fires.
    • claude.result_is_error log (error) continues to fire for genuine error subtypes — alert on this if a real regression slips through.
    • Type-check + lint + format catch any author misuse of the guard pattern at change time.

Rollback Plan (required)

Risks and Mitigations

  • Risk: The provider's success normalisation hides a future SDK regression where subtype: 'success' actually does indicate a failure.
    • Mitigation: Very unlikely given the SDK's explicit discriminated union (SDKResultMessage = SDKResultSuccess | SDKResultError) and the error_* subtype enum. If the SDK ever changes semantics, the executor's defense-in-depth guard would still emit a less-confusing error (the node would complete rather than throw with a contradictory message). The guard scope is subtype-equality on the string literal 'success', not a broader pattern.
  • Risk: A third-party IAgentProvider implementation copies the SDK pattern and starts emitting isError: true, errorSubtype: 'success'.
    • Mitigation: Defense-in-depth guards at dag-executor.ts:915 and :1940 already cover this — the executor will not throw on such a chunk regardless of which provider emitted it.

Investigation artifact: .claude/PRPs/issues/issue-1425.md (local-only, gitignored per repo policy)

Summary by CodeRabbit

  • Bug Fixes

    • Claude stop-sequence terminations that indicate clean completion are no longer misclassified as errors; sessions persist and no false error messages are shown.
  • Tests

    • Added regression tests across provider, orchestrator, and workflow layers to prevent reintroduction of the misclassification and ensure workflows complete correctly.

Review Change Stack

…uccess (#1425)

The Claude Agent SDK's SDKResultSuccess type declares is_error as boolean
(not literal false). When a model terminates via a configured stop sequence
the SDK sets is_error: true while keeping subtype: 'success' — its encoding
of "non-default termination, but not a failure".

The claude provider forwarded is_error verbatim into MessageChunk.isError,
so three downstream consumers (dag-executor main path, dag-executor loop
branch, orchestrator-agent direct chat) misclassified clean stop_sequence
terminations as node failures and produced the contradictory user-hostile
error "Node '<X>' failed: SDK returned success" — even though the AI had
already completed correctly and written its output. Multi-user impact on
v0.3.9 / v0.3.11 across review-classify and synthesis pipelines.

Changes:
- claude/provider.ts: normalise is_error + subtype: 'success' as a clean
  result. Don't propagate isError downstream; log a debug-level
  claude.result_success_stop_sequence event for observability instead.
- dag-executor.ts (main path + loop branch): defense-in-depth guard so
  third-party IAgentProvider implementations that forward the SDK pair raw
  can't reintroduce the same false-failure.
- provider/dag-executor/orchestrator-agent tests: regression tests covering
  the stop_sequence success path; guard test ensures real error subtypes
  (error_max_turns) still propagate.

Fixes #1425
@Wirasm

Wirasm commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

🔍 Automated Code Review

Independent review by prp-core:code-reviewer (didn't see the implementation conversation).

Result: PASS — Ready for merge

Findings

✅ Strengths

  • Provider boundary is the right layer. is_error: true + subtype: 'success' is translated to a clean MessageChunk before it leaves the provider. SDK-specific quirks belong here, not at three downstream consumers.
  • Tight predicate. isRealError = is_error === true && subtype !== 'success' is exact string-equality, not a loose pattern. A future subtype: 'success_partial' or any other value still routes to the error path.
  • Defense-in-depth guards are consistent. Both dag-executor.ts carve-outs (~915 main, ~1940 loop) use the same msg.isError && msg.errorSubtype !== 'success' predicate and sit correctly after the existing error_max_budget_usd carve-out at ~894.
  • Tests exercise all three layers. The orchestrator test is not vacuous — it runs the full handleMessage path including classifyAndFormatError, so it validates the end-to-end non-regression for direct chat.

⚠️ Suggestions (non-blocking)

  • One reasonable missing scenario: a third-party provider that emits isError: true, errorSubtype: 'success' with content in msg.errors would have those errors silently swallowed by the executor guard. This is an intentional design choice (acknowledged by the comment), and such a chunk is contractually malformed per the SDK's union — but worth keeping in mind if a future provider drifts.

🔒 Security

  • No security concerns identified. No new permissions, network calls, secrets handling, or filesystem scope. The change only modifies conditional emission of an existing chunk field and downgrades one log level.

Checklist

  • Fix addresses root cause from investigation
  • Code follows codebase patterns (mirrors error_max_budget_usd carve-out)
  • Tests cover the change at all three call sites + a guard test for real errors
  • CLAUDE.md compliance — typed imports, no any, Pino {domain}.{action}_{state} event naming
  • bun run validate exit 0 (six gates: check:bundled, check:bundled-skill, type-check, lint, format:check, test)

Self-reviewed by Claude • Ready for human review

@Wirasm

Wirasm commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

PR Review Summary (multi-agent)

Five specialist agents ran against the diff: code-reviewer, docs-impact, pr-test-analyzer, comment-analyzer, silent-failure-hunter. Aggregated findings below.

Critical Issues

None.

Important Issues

Agent Issue Location
silent-failure-hunter orchestrator-agent.ts was not updated with the same errorSubtype !== 'success' carve-out that was added to dag-executor.ts. PR description acknowledges this as deliberate ("transparently fixed via provider normalisation") since ClaudeProvider now strips isError before yielding. The gap matters only if a third-party IAgentProvider forwards the raw SDK pair — same scenario the dag-executor guards already defend against. Inconsistent defense-in-depth posture between the two consumers. packages/core/src/orchestrator/orchestrator-agent.ts:1127, :1298
docs-impact Missing CHANGELOG.md entry under [Unreleased]### Fixed. User-observable bug fix. CHANGELOG.md
comment-analyzer Three (#1425) issue-number references in production comments. Per project CLAUDE.md, issue refs in inline comments rot once closed. (Pre-existing (#1208) refs compound this.) packages/providers/src/claude/provider.ts:805, packages/workflows/src/dag-executor.ts:911, :1939
comment-analyzer Loop guard comment leans on "see the corresponding guard in the main node path above" — the main path is ~1025 lines away. Should restate the SDK invariant inline like the main-path comment does. packages/workflows/src/dag-executor.ts:1938

Suggestions

Agent Suggestion Location
pr-test-analyzer Optional regression test (criticality 5/10): loop with fresh_context: true, max_iterations: 2, second iteration yields is_error: true, subtype: 'success'. Same guard code path, so completeness-only. packages/workflows/src/dag-executor.test.ts

Strengths

  • Provider normalization is tightisRealError = is_error === true && subtype !== 'success' is exact-equality, no loose match. Real error subtypes (error_max_turns, error_during_execution, error_max_budget_usd) continue to propagate correctly.
  • dag-executor guards are exact — both use errorSubtype !== 'success' string-equality.
  • Comment at provider normalization site is exemplary: explains the SDK type defect (is_error: boolean, not literal false), the trigger, the SDK's encoding convention, and the downstream consequence. Exactly the "why" comment the codebase should aspire to.
  • Behavioral test coverage — all four new tests assert on observable side-effects (no node_failed events, completeWorkflowRun called, no error message sent to user, claude.result_is_error NOT logged), not just "no throw".
  • Empty-output guard at dag-executor.ts:1131 still fires for stop_sequence-success nodes that produce no content — correct fail-loudly behavior preserved.
  • Log level is appropriatedebug for claude.result_success_stop_sequence, not info.

Verdict

NEEDS MINOR FIXES — no blockers. Recommend addressing the two Important items before merge: (1) the orchestrator-agent defense-in-depth gap (or explicitly document why it was scoped out) and (2) the missing CHANGELOG entry. The (#1425) comment refs and loop comment cross-reference are quality nits worth fixing while touching the file but not merge-blocking.

Recommended Actions

  1. Apply the same && msg.errorSubtype !== 'success' carve-out at orchestrator-agent.ts:1127 and :1298 for consistency with the dag-executor guards, OR keep current behavior and add a brief comment explaining why the orchestrator deliberately relies on provider-level normalization alone.
  2. Add a ### Fixed entry under [Unreleased] in CHANGELOG.md.
  3. Drop the three (#1425) issue refs from production comments.
  4. Make the loop guard comment self-contained.

@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9714dd7-fee0-4d77-8580-7ef870212340

📥 Commits

Reviewing files that changed from the base of the PR and between 2af935c and b8af40f.

📒 Files selected for processing (6)
  • CHANGELOG.md
  • packages/core/src/orchestrator/orchestrator-agent.test.ts
  • packages/core/src/orchestrator/orchestrator-agent.ts
  • packages/providers/src/claude/provider.test.ts
  • packages/providers/src/claude/provider.ts
  • packages/workflows/src/dag-executor.ts
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/providers/src/claude/provider.ts

📝 Walkthrough

Walkthrough

Fixes Claude SDK result misclassification where SDK emits is_error:true with subtype:'success' and stop_reason:'stop_sequence', normalizing it to a non-error at the provider boundary and adding executor and orchestrator guards plus regression tests and a changelog entry.

Changes

SDK Success-Marker Classification and Propagation

Layer / File(s) Summary
Claude Provider Error Classification
packages/providers/src/claude/provider.ts, packages/providers/src/claude/provider.test.ts
Provider's streamClaudeMessages computes isRealError = is_error === true AND subtype !== 'success', debug logs the clean stop_sequence case, and emits result chunks omitting isError/errorSubtype/errors when isRealError is false. Test validates shape and logging.
DAG Executor Node and Loop Error Handling
packages/workflows/src/dag-executor.ts, packages/workflows/src/dag-executor.test.ts
executeNodeInternal and executeLoopNode only treat msg.isError as failure when msg.errorSubtype !== 'success'. Regression tests ensure no node_failed or loop_iteration_failed events and that workflow completion proceeds.
Orchestrator End-to-End Behavior
packages/core/src/orchestrator/orchestrator-agent.ts, packages/core/src/orchestrator/orchestrator-agent.test.ts, CHANGELOG.md
Orchestrator-agent narrows error handling to msg.isError && msg.errorSubtype !== 'success'. Adds two tests confirming no user-visible error and session ID persistence. CHANGELOG updated with a fix note.

Sequence Diagram

sequenceDiagram
  participant Client
  participant Orchestrator
  participant Provider
  participant ClaudeSDK
  Client->>Orchestrator: request workflow execution
  Orchestrator->>Provider: sendQuery
  Provider->>ClaudeSDK: stream request
  ClaudeSDK-->>Provider: result is_error:true subtype:success stop_reason:stop_sequence
  Provider->>Provider: compute isRealError
  Provider-->>Orchestrator: emit clean result stopReason:stop_sequence no isError
  Orchestrator->>Client: report completion no user-visible error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

  • coleam00/Archon#1291: Related changes to DAG executor and provider classification for Claude SDK result handling.
  • coleam00/Archon#1089: Also touches orchestrator handling of msg.isError result chunks; may overlap in orchestrator behavior changes.
  • coleam00/Archon#1162: Related provider adjustments around Claude result event normalization.

Poem

🐰 I hopped through logs and tests today,
Claude said stop but feared dismay,
is_error true yet subtype sings success,
Provider and executor now handle the rest.
Sessions stay steady and users rest.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(providers,workflows): treat Claude SDK stop_sequence success as success (#1425)' is fully related to the main changeset, clearly summarizing the primary fix of preventing stop-sequence terminations from being misclassified as failures.
Description check ✅ Passed The PR description is comprehensive, covering all required template sections including summary, UX journey, architecture diagrams, validation evidence, security impact, compatibility, human verification, side effects, and rollback plan. All required content is present and detailed.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/1425-claude-success-stop-sequence

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/providers/src/claude/provider.test.ts (1)

1249-1281: ⚡ Quick win

Assert the success-marker debug log in this regression test.

You already verify no error log; adding a mockLogger.debug assertion for the success-marker path will lock in the observability contract too.

Proposed test assertion
     expect(chunks[0]).not.toHaveProperty('isError');
     expect(chunks[0]).not.toHaveProperty('errorSubtype');
     expect(chunks[0]).not.toHaveProperty('errors');
     expect(mockLogger.error).not.toHaveBeenCalledWith(expect.anything(), 'claude.result_is_error');
+    expect(mockLogger.debug).toHaveBeenCalledWith(
+      expect.objectContaining({
+        sessionId: 'sid-stop-seq',
+        stopReason: 'stop_sequence',
+      }),
+      'claude.result_success_stop_sequence'
+    );
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/providers/src/claude/provider.test.ts` around lines 1249 - 1281, Add
an assertion that the success-marker debug log was emitted in this regression
test: after collecting chunks and asserting no error logs, assert that
mockLogger.debug was called with the success marker key (e.g.
expect(mockLogger.debug).toHaveBeenCalledWith(expect.anything(),
'claude.result_success_marker')); place this check alongside the existing
mockLogger.error assertion to lock in the observability contract for the
stop_sequence success path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/providers/src/claude/provider.ts`:
- Around line 817-824: The log event name 'claude.result_success_stop_sequence'
should follow the `{domain}.{action}_{state}` convention with an allowed state
suffix; update the getLog().debug call in the block handling resultMsg (where
resultMsg.is_error === true && resultMsg.subtype === 'success') to use a
compliant state such as 'claude.result_success_validated' (retain the same
structured payload: sessionId and stopReason) so the event name aligns with the
pino logging standard; leave the surrounding logic in provider.ts and the
resultMsg checks unchanged.

---

Nitpick comments:
In `@packages/providers/src/claude/provider.test.ts`:
- Around line 1249-1281: Add an assertion that the success-marker debug log was
emitted in this regression test: after collecting chunks and asserting no error
logs, assert that mockLogger.debug was called with the success marker key (e.g.
expect(mockLogger.debug).toHaveBeenCalledWith(expect.anything(),
'claude.result_success_marker')); place this check alongside the existing
mockLogger.error assertion to lock in the observability contract for the
stop_sequence success path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e3a1284-23a4-487e-9a4e-08601fdbf509

📥 Commits

Reviewing files that changed from the base of the PR and between d9d2bc6 and 2af935c.

📒 Files selected for processing (5)
  • packages/core/src/orchestrator/orchestrator-agent.test.ts
  • packages/providers/src/claude/provider.test.ts
  • packages/providers/src/claude/provider.ts
  • packages/workflows/src/dag-executor.test.ts
  • packages/workflows/src/dag-executor.ts

Comment thread packages/providers/src/claude/provider.ts


- Apply errorSubtype !== 'success' guard at orchestrator-agent.ts handleStreamMode
  and handleBatchMode result branches (defense-in-depth, mirrors dag-executor).
  Without this, a third-party IAgentProvider that forwards the SDK pair raw
  would surface a spurious error on direct chat and drop conversation output.
  Adds matching regression test.
- Rename log event claude.result_success_stop_sequence -> claude.result_success_validated
  per CodeRabbit; aligns with {domain}.{action}_{state} pino convention.
- Lock the new debug log into the provider regression test.
- Drop (#1425) issue refs from production comments (rot risk per CLAUDE.md).
- Rewrite loop-branch guard comment to be self-contained instead of cross-
  referencing the main-path guard 1000+ lines away.
- Add CHANGELOG entry under [Unreleased] -> Fixed.
@Wirasm

Wirasm commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

Review feedback addressed (commit b8af40f)

Addressed all Important items from the multi-agent review and the actionable CodeRabbit findings.

Fixed

Source Item Resolution
silent-failure-hunter orchestrator-agent.ts:1127 and :1298 defense-in-depth gap Applied the && msg.errorSubtype !== 'success' carve-out at both sites with a comment explaining the third-party-provider scenario. Added a matching regression test that yields the raw SDK pair (no provider normalisation) and asserts the orchestrator does not surface an error to the user.
CodeRabbit (Major) Log event claude.result_success_stop_sequence doesn't follow {domain}.{action}_{state} Renamed to claude.result_success_validated.
CodeRabbit (Nitpick) Provider regression test doesn't lock in the debug log Added expect(mockLogger.debug).toHaveBeenCalledWith(...) assertion alongside the existing mockLogger.error check.
comment-analyzer Three (#1425) issue refs in production comments Removed from provider.ts:805, dag-executor.ts:911, dag-executor.ts:1939.
comment-analyzer Loop guard comment cross-references main-path guard 1000+ lines away Rewrote comment to restate the SDK invariant inline.
docs-impact Missing CHANGELOG entry Added ### Fixed entry under [Unreleased] describing the user-observable behaviour change.

Skipped (with rationale)

  • pr-test-analyzer's optional fresh_context: true multi-iteration loop test (rated 5/10): same guard code path as the existing loop test, so completeness-only. The agent itself flagged it as "low-risk in practice" and not a likely regression vector.
  • Pre-existing (#1208) refs in the same comment blocks: out of scope for this PR — touching them would balloon the diff for unrelated cleanup.

Validation

bun run validate   # ✅ exit 0

All six gates green:

  • check:bundled / check:bundled-skill
  • type-check (10 packages)
  • lint (0 errors, 0 warnings)
  • format:check
  • test — including the two new regression tests:
    • @archon/providers claude/provider.test.ts: debug log assertion
    • @archon/core orchestrator-agent.test.ts: defense-in-depth (raw SDK pair)

@Wirasm Wirasm merged commit 936134f into dev May 13, 2026
4 checks passed
@Wirasm Wirasm deleted the fix/1425-claude-success-stop-sequence branch May 13, 2026 11:10
@coleam00 coleam00 mentioned this pull request May 14, 2026
cropse pushed a commit to cropse/Archon that referenced this pull request May 19, 2026
…uccess (coleam00#1425) (coleam00#1662)

* fix(providers,workflows): treat Claude SDK stop_sequence success as success (coleam00#1425)

The Claude Agent SDK's SDKResultSuccess type declares is_error as boolean
(not literal false). When a model terminates via a configured stop sequence
the SDK sets is_error: true while keeping subtype: 'success' — its encoding
of "non-default termination, but not a failure".

The claude provider forwarded is_error verbatim into MessageChunk.isError,
so three downstream consumers (dag-executor main path, dag-executor loop
branch, orchestrator-agent direct chat) misclassified clean stop_sequence
terminations as node failures and produced the contradictory user-hostile
error "Node '<X>' failed: SDK returned success" — even though the AI had
already completed correctly and written its output. Multi-user impact on
v0.3.9 / v0.3.11 across review-classify and synthesis pipelines.

Changes:
- claude/provider.ts: normalise is_error + subtype: 'success' as a clean
  result. Don't propagate isError downstream; log a debug-level
  claude.result_success_stop_sequence event for observability instead.
- dag-executor.ts (main path + loop branch): defense-in-depth guard so
  third-party IAgentProvider implementations that forward the SDK pair raw
  can't reintroduce the same false-failure.
- provider/dag-executor/orchestrator-agent tests: regression tests covering
  the stop_sequence success path; guard test ensures real error subtypes
  (error_max_turns) still propagate.

Fixes coleam00#1425

* fix(orchestrator,providers,workflows): address review feedback for coleam00#1425

- Apply errorSubtype !== 'success' guard at orchestrator-agent.ts handleStreamMode
  and handleBatchMode result branches (defense-in-depth, mirrors dag-executor).
  Without this, a third-party IAgentProvider that forwards the SDK pair raw
  would surface a spurious error on direct chat and drop conversation output.
  Adds matching regression test.
- Rename log event claude.result_success_stop_sequence -> claude.result_success_validated
  per CodeRabbit; aligns with {domain}.{action}_{state} pino convention.
- Lock the new debug log into the provider regression test.
- Drop (coleam00#1425) issue refs from production comments (rot risk per CLAUDE.md).
- Rewrite loop-branch guard comment to be self-contained instead of cross-
  referencing the main-path guard 1000+ lines away.
- Add CHANGELOG entry under [Unreleased] -> Fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(workflow): two silent-failure paths in DAG executor — Codex turn.failed exits stream, Claude success-with-stop-sequence thrown as failure

1 participant