fix(providers,workflows): treat Claude SDK stop_sequence success as success (#1425)#1662
Conversation
…uccess (#1425) The Claude Agent SDK's SDKResultSuccess type declares is_error as boolean (not literal false). When a model terminates via a configured stop sequence the SDK sets is_error: true while keeping subtype: 'success' — its encoding of "non-default termination, but not a failure". The claude provider forwarded is_error verbatim into MessageChunk.isError, so three downstream consumers (dag-executor main path, dag-executor loop branch, orchestrator-agent direct chat) misclassified clean stop_sequence terminations as node failures and produced the contradictory user-hostile error "Node '<X>' failed: SDK returned success" — even though the AI had already completed correctly and written its output. Multi-user impact on v0.3.9 / v0.3.11 across review-classify and synthesis pipelines. Changes: - claude/provider.ts: normalise is_error + subtype: 'success' as a clean result. Don't propagate isError downstream; log a debug-level claude.result_success_stop_sequence event for observability instead. - dag-executor.ts (main path + loop branch): defense-in-depth guard so third-party IAgentProvider implementations that forward the SDK pair raw can't reintroduce the same false-failure. - provider/dag-executor/orchestrator-agent tests: regression tests covering the stop_sequence success path; guard test ensures real error subtypes (error_max_turns) still propagate. Fixes #1425
🔍 Automated Code ReviewIndependent review by Result: PASS — Ready for mergeFindings✅ Strengths
|
PR Review Summary (multi-agent)Five specialist agents ran against the diff: code-reviewer, docs-impact, pr-test-analyzer, comment-analyzer, silent-failure-hunter. Aggregated findings below. Critical IssuesNone. Important Issues
Suggestions
Strengths
VerdictNEEDS MINOR FIXES — no blockers. Recommend addressing the two Important items before merge: (1) the orchestrator-agent defense-in-depth gap (or explicitly document why it was scoped out) and (2) the missing CHANGELOG entry. The Recommended Actions
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughFixes Claude SDK result misclassification where SDK emits is_error:true with subtype:'success' and stop_reason:'stop_sequence', normalizing it to a non-error at the provider boundary and adding executor and orchestrator guards plus regression tests and a changelog entry. ChangesSDK Success-Marker Classification and Propagation
Sequence DiagramsequenceDiagram
participant Client
participant Orchestrator
participant Provider
participant ClaudeSDK
Client->>Orchestrator: request workflow execution
Orchestrator->>Provider: sendQuery
Provider->>ClaudeSDK: stream request
ClaudeSDK-->>Provider: result is_error:true subtype:success stop_reason:stop_sequence
Provider->>Provider: compute isRealError
Provider-->>Orchestrator: emit clean result stopReason:stop_sequence no isError
Orchestrator->>Client: report completion no user-visible error
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/providers/src/claude/provider.test.ts (1)
1249-1281: ⚡ Quick winAssert the success-marker debug log in this regression test.
You already verify no error log; adding a
mockLogger.debugassertion for the success-marker path will lock in the observability contract too.Proposed test assertion
expect(chunks[0]).not.toHaveProperty('isError'); expect(chunks[0]).not.toHaveProperty('errorSubtype'); expect(chunks[0]).not.toHaveProperty('errors'); expect(mockLogger.error).not.toHaveBeenCalledWith(expect.anything(), 'claude.result_is_error'); + expect(mockLogger.debug).toHaveBeenCalledWith( + expect.objectContaining({ + sessionId: 'sid-stop-seq', + stopReason: 'stop_sequence', + }), + 'claude.result_success_stop_sequence' + ); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/providers/src/claude/provider.test.ts` around lines 1249 - 1281, Add an assertion that the success-marker debug log was emitted in this regression test: after collecting chunks and asserting no error logs, assert that mockLogger.debug was called with the success marker key (e.g. expect(mockLogger.debug).toHaveBeenCalledWith(expect.anything(), 'claude.result_success_marker')); place this check alongside the existing mockLogger.error assertion to lock in the observability contract for the stop_sequence success path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/providers/src/claude/provider.ts`:
- Around line 817-824: The log event name 'claude.result_success_stop_sequence'
should follow the `{domain}.{action}_{state}` convention with an allowed state
suffix; update the getLog().debug call in the block handling resultMsg (where
resultMsg.is_error === true && resultMsg.subtype === 'success') to use a
compliant state such as 'claude.result_success_validated' (retain the same
structured payload: sessionId and stopReason) so the event name aligns with the
pino logging standard; leave the surrounding logic in provider.ts and the
resultMsg checks unchanged.
---
Nitpick comments:
In `@packages/providers/src/claude/provider.test.ts`:
- Around line 1249-1281: Add an assertion that the success-marker debug log was
emitted in this regression test: after collecting chunks and asserting no error
logs, assert that mockLogger.debug was called with the success marker key (e.g.
expect(mockLogger.debug).toHaveBeenCalledWith(expect.anything(),
'claude.result_success_marker')); place this check alongside the existing
mockLogger.error assertion to lock in the observability contract for the
stop_sequence success path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6e3a1284-23a4-487e-9a4e-08601fdbf509
📒 Files selected for processing (5)
packages/core/src/orchestrator/orchestrator-agent.test.tspackages/providers/src/claude/provider.test.tspackages/providers/src/claude/provider.tspackages/workflows/src/dag-executor.test.tspackages/workflows/src/dag-executor.ts
- Apply errorSubtype !== 'success' guard at orchestrator-agent.ts handleStreamMode and handleBatchMode result branches (defense-in-depth, mirrors dag-executor). Without this, a third-party IAgentProvider that forwards the SDK pair raw would surface a spurious error on direct chat and drop conversation output. Adds matching regression test. - Rename log event claude.result_success_stop_sequence -> claude.result_success_validated per CodeRabbit; aligns with {domain}.{action}_{state} pino convention. - Lock the new debug log into the provider regression test. - Drop (#1425) issue refs from production comments (rot risk per CLAUDE.md). - Rewrite loop-branch guard comment to be self-contained instead of cross- referencing the main-path guard 1000+ lines away. - Add CHANGELOG entry under [Unreleased] -> Fixed.
Review feedback addressed (commit b8af40f)Addressed all Important items from the multi-agent review and the actionable CodeRabbit findings. Fixed
Skipped (with rationale)
Validationbun run validate # ✅ exit 0All six gates green:
|
…uccess (coleam00#1425) (coleam00#1662) * fix(providers,workflows): treat Claude SDK stop_sequence success as success (coleam00#1425) The Claude Agent SDK's SDKResultSuccess type declares is_error as boolean (not literal false). When a model terminates via a configured stop sequence the SDK sets is_error: true while keeping subtype: 'success' — its encoding of "non-default termination, but not a failure". The claude provider forwarded is_error verbatim into MessageChunk.isError, so three downstream consumers (dag-executor main path, dag-executor loop branch, orchestrator-agent direct chat) misclassified clean stop_sequence terminations as node failures and produced the contradictory user-hostile error "Node '<X>' failed: SDK returned success" — even though the AI had already completed correctly and written its output. Multi-user impact on v0.3.9 / v0.3.11 across review-classify and synthesis pipelines. Changes: - claude/provider.ts: normalise is_error + subtype: 'success' as a clean result. Don't propagate isError downstream; log a debug-level claude.result_success_stop_sequence event for observability instead. - dag-executor.ts (main path + loop branch): defense-in-depth guard so third-party IAgentProvider implementations that forward the SDK pair raw can't reintroduce the same false-failure. - provider/dag-executor/orchestrator-agent tests: regression tests covering the stop_sequence success path; guard test ensures real error subtypes (error_max_turns) still propagate. Fixes coleam00#1425 * fix(orchestrator,providers,workflows): address review feedback for coleam00#1425 - Apply errorSubtype !== 'success' guard at orchestrator-agent.ts handleStreamMode and handleBatchMode result branches (defense-in-depth, mirrors dag-executor). Without this, a third-party IAgentProvider that forwards the SDK pair raw would surface a spurious error on direct chat and drop conversation output. Adds matching regression test. - Rename log event claude.result_success_stop_sequence -> claude.result_success_validated per CodeRabbit; aligns with {domain}.{action}_{state} pino convention. - Lock the new debug log into the provider regression test. - Drop (coleam00#1425) issue refs from production comments (rot risk per CLAUDE.md). - Rewrite loop-branch guard comment to be self-contained instead of cross- referencing the main-path guard 1000+ lines away. - Add CHANGELOG entry under [Unreleased] -> Fixed.
Summary
SDKResultSuccessdeclaresis_error: boolean(not literal false). Stop_sequence terminations carryis_error: true+subtype: 'success'— the SDK's encoding of "non-default termination, but not a failure". The claude provider forwarded this toMessageChunk.isErrorverbatim, so three downstream consumers (dag-executor main, dag-executor loop, orchestrator-agent direct chat) misclassified clean terminations as node failures with the contradictory user-hostile error"Node '<X>' failed: SDK returned success"— even though the AI had completed correctly and written its output.synthesize,self-fix,simplify,report). Users blocked unless they fork+patch+rebuild; archon's workflow schema does not exposestop_sequencesand disablingoutput_formatwould break downstreamwhen:conditions.is_error: true+subtype: 'success'as a clean success — noisErrorpropagation downstream; debug-levelclaude.result_success_stop_sequencelog preserves observability. Defense-in-deptherrorSubtype !== 'success'guards added todag-executor.tsmain path and loop branch so a future third-partyIAgentProviderforwarding the SDK pair raw can't reintroduce the bug. Regression tests for provider, DAG main, DAG loop, and orchestrator direct chat; guard test ensures real error subtypes (error_max_turns) still propagate.MessageChunktype shape inpackages/providers/src/types.ts(contract semantics tighten, type signature unchanged). No change to Codexturn.failedpath (Bug A in the issue is already resolved by commit4c6ddd99+ the current Codexturn.failedhandler atcodex/provider.ts:235-247). No change toorchestrator-agent.tscode (the provider-level fix covers direct chat transparently). No retry/auth-precheck behavioural improvements. No touch on related but separate bugs fix: interactive loop resume crashes with error_during_execution (stale session) #1208, bug(codex): retry loop reuses caller AbortSignal; crash on attempt N poisons attempt N+1 #1266.UX Journey
Before
After
Architecture Diagram
Before
After
Connection inventory:
claude/provider.tsresult handlerMessageChunk.isErrorisError: truewhensubtype === 'success'claude/provider.tsresult handlerclaude.result_success_stop_sequenceeventdag-executor.ts:907-922(main)throw new Error(...)errorSubtype !== 'success'carve-outdag-executor.ts:1932-1948(loop)throw new Error(...)orchestrator-agent.ts:1127, 1298MessageChunkerror_max_budget_usdcarve-out atdag-executor.ts:894-903Label Snapshot
risk: lowsize: Sworkflows,core(tests)providers:claude,workflows:executor,core:orchestrator(test)Change Metadata
bugmulti(providers + workflows + core tests)Linked Issue
Validation Evidence (required)
bun run validate # ✅ exit 0All six gates green on first invocation (after one prettier auto-format pass on the new provider test):
check:bundled—bundled-defaults.generated.ts is up to date (36 commands, 20 workflows)check:bundled-skill—bundled-skill.ts is up to date (21 files)type-check— all 10 packages exit 0lint— 0 errors, 0 warningsformat:check— all files Prettier-cleantest— all per-package suites pass; new tests:@archon/providers claude/provider.test.ts: +1 test (success-via-stop-sequence) → 26 → 27 insendQuery decomposition behaviors@archon/workflows dag-executor.test.ts: +2 tests (DAG main + loop branch) → 218 in the test file@archon/core orchestrator-agent.test.ts: +1 test (direct chat contract) → 116 → 117Skipped commands: none.
Security Impact (required)
Compatibility / Migration
"SDK returned success"will now complete cleanly. No workflow YAML changes required. Workflows that produced real errors (e.g.error_max_turns,error_during_execution,error_max_budget_usd) continue to fail loudly; the guard is subtype-scoped to'success'only.Human Verification (required)
bun run validateexits 0 on a clean checkout of this branch (after one prettier auto-format on the new test).@anthropic-ai/claude-agent-sdk@0.2.121/sdk.d.ts:3117-3158—SDKResultSuccess.is_error: boolean(not literal false), confirming the bug mechanism described in the investigation artifact.else if (resultMsg.is_error === true && resultMsg.subtype === 'success')branch verified to fire on the new test case (asserted vianot.toHaveBeenCalledWith(..., 'claude.result_is_error')).error_max_turns) still propagatesisError: trueand fires theclaude.result_is_errorerror log — guarded by the existing test atprovider.test.ts:1223which continues to pass.turn.failedpath (codex/provider.ts:235-247) audited — emitserrorSubtype: 'codex_turn_failed', never'success'. Unaffected.event-bridge.ts:160,164,173audited — emits'missing_assistant_message'orlast.stopReason ∈ {'error','aborted'}, never'success'. Unaffected.dag-executor.ts:1126audited — still triggers if any node produces zero output AND no structured output. Defence-in-depth for Bug A (issue bug(workflow): two silent-failure paths in DAG executor — Codex turn.failed exits stream, Claude success-with-stop-sequence thrown as failure #1425) remains intact.archon-fix-github-issueon a real Claudeoutput_format: json_schemanode. The unit tests synthesise the exact(is_error, subtype, stop_reason)triple the SDK emits in this case, so behavioural coverage is equivalent.SDKResultSuccess.is_error: booleantype signature has been stable across the 0.2.x line (verified in 0.2.74 and 0.2.89 lockfile entries).Side Effects / Blast Radius (required)
@archon/providers— Claude provider only. Codex/Pi unaffected.@archon/workflows— DAG executor's main and loop branches only.@archon/core— orchestrator direct chat transparently fixed; no code change to orchestrator-agent.ts."SDK returned success"will now complete. This is the intended behaviour, but operators who monitored those failures as alerts will see them disappear. Logs preserve a debug-levelclaude.result_success_stop_sequenceevent for ops dashboards.claude.result_is_errorerror-level log no longer fires forsubtype === 'success'. The new debug event is a clean replacement.claude.result_success_stop_sequencelog (debug) shows where the new code path fires.claude.result_is_errorlog (error) continues to fire for genuine error subtypes — alert on this if a real regression slips through.Rollback Plan (required)
git revert <merge-commit-sha> && git push origin dev— single commit, additive only. Reverting restores the pre-PR throw behaviour on stop_sequence terminations (re-introducing bug(workflow): two silent-failure paths in DAG executor — Codex turn.failed exits stream, Claude success-with-stop-sequence thrown as failure #1425's bug for affected users).subtype === 'success'as a failure is correct.claude.result_is_errorlogs — should fire for real errors.&& msg.errorSubtype !== 'success'guards indag-executor.tslines ~915 and ~1940 are still present; checkclaude/provider.ts:807-826isRealErrorderivation is intact.Risks and Mitigations
subtype: 'success'actually does indicate a failure.SDKResultMessage = SDKResultSuccess | SDKResultError) and theerror_*subtype enum. If the SDK ever changes semantics, the executor's defense-in-depth guard would still emit a less-confusing error (the node would complete rather than throw with a contradictory message). The guard scope is subtype-equality on the string literal'success', not a broader pattern.IAgentProviderimplementation copies the SDK pattern and starts emittingisError: true, errorSubtype: 'success'.dag-executor.ts:915and:1940already cover this — the executor will not throw on such a chunk regardless of which provider emitted it.Investigation artifact:
.claude/PRPs/issues/issue-1425.md(local-only, gitignored per repo policy)Summary by CodeRabbit
Bug Fixes
Tests