refactor(workflows): consolidate duplicated safeSendMessage into executor-shared#1496
Conversation
📝 WalkthroughWalkthroughCentralizes platform message-sending safety by moving ChangesShared message-safety centralization
Sequence Diagram(s)sequenceDiagram
participant Executor
participant DagExecutor
participant SafeSend as executor-shared.safeSendMessage
participant Platform
participant Logger
Executor->>SafeSend: safeSendMessage(platform, convId, msg, context, meta, tracker)
DagExecutor->>SafeSend: safeSendMessage(platform, convId, msg, context, meta, tracker)
SafeSend->>Platform: platform.sendMessage(convId, msg, meta)
alt success
Platform-->>SafeSend: ack
SafeSend-->>Executor: true
SafeSend-->>DagExecutor: true
else failure
Platform-->>SafeSend: throws error
SafeSend->>SafeSend: classifyError -> (FATAL / TRANSIENT / UNKNOWN)
SafeSend->>Logger: log failure diagnostics (platformType, errorType, length, context, stack)
alt FATAL
SafeSend-->>Executor: throw Error (rethrow)
SafeSend-->>DagExecutor: throw Error (rethrow)
else UNKNOWN below-threshold
SafeSend-->>Executor: false
SafeSend-->>DagExecutor: false
else UNKNOWN hit-threshold
SafeSend-->>Executor: throw Error
SafeSend-->>DagExecutor: throw Error
else TRANSIENT
SafeSend-->>Executor: false
SafeSend-->>DagExecutor: false
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/workflows/src/executor.ts (1)
53-65: Use a structured event name here.
'Critical message send failed'breaks the repo’sdomain.action_statelog naming pattern, which makes this path harder to query alongside the rest of the workflow logs.As per coding guidelines, "Structured logging with Pino: use event naming format
{domain}.{action}_{state}."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/workflows/src/executor.ts` around lines 53 - 65, Replace the freeform log message string in the getLog().error call inside executor.ts with a structured event name following the {domain}.{action}_{state} pattern (for example, "workflow.message_send_failed"), keeping the existing metadata object (err, conversationId, messageLength, errorType, platform.getPlatformType(), context, attempt, maxRetries) so the human-readable message is preserved in the payload if needed; locate the getLog().error invocation in the executor (the block referencing err, conversationId, message.length, errorType, platform.getPlatformType(), attempt, maxRetries) and update the second argument to the structured event name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 77-79: The DAG execution path is still not enabling the UNKNOWN
abort guard because dag-executor.ts never creates or threads an
UnknownErrorTracker into its send sites; importing SendMessageContext and
safeSendMessage alone doesn't change behavior. Fix by instantiating an
UnknownErrorTracker (or accept one from the caller) in the relevant DAG flow in
dag-executor.ts and pass it into every safeSendMessage call (and into the
SendMessageContext if your shared API expects it), ensuring the UNKNOWN abort
guard is activated during DAG execution; update function signatures or call
sites (e.g., the functions that call safeSendMessage in this file) to accept and
forward the tracker where needed.
In `@packages/workflows/src/executor-shared.ts`:
- Around line 567-599: The UNKNOWN error counter (unknownErrorTracker.count) is
only reset on success so non-UNKNOWN failures (e.g., TRANSIENT or FATAL) do not
break "consecutive" streaks; update the catch block in the function handling
platform sends (use symbols classifyError, errorType, unknownErrorTracker,
UNKNOWN_ERROR_THRESHOLD) to reset unknownErrorTracker.count = 0 whenever
errorType !== 'UNKNOWN' (ensure this happens before threshold logic and before
throwing for FATAL) so only actual consecutive UNKNOWN errors increment and
trigger the threshold.
---
Nitpick comments:
In `@packages/workflows/src/executor.ts`:
- Around line 53-65: Replace the freeform log message string in the
getLog().error call inside executor.ts with a structured event name following
the {domain}.{action}_{state} pattern (for example,
"workflow.message_send_failed"), keeping the existing metadata object (err,
conversationId, messageLength, errorType, platform.getPlatformType(), context,
attempt, maxRetries) so the human-readable message is preserved in the payload
if needed; locate the getLog().error invocation in the executor (the block
referencing err, conversationId, message.length, errorType,
platform.getPlatformType(), attempt, maxRetries) and update the second argument
to the structured event name.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 426e215a-8059-4a91-98b2-bc947f4ffcaf
📒 Files selected for processing (3)
packages/workflows/src/dag-executor.tspackages/workflows/src/executor-shared.tspackages/workflows/src/executor.ts
| safeSendMessage, | ||
| type SendMessageContext, | ||
| } from './executor-shared'; |
There was a problem hiding this comment.
The DAG path still never enables the UNKNOWN abort guard.
Importing the shared helper alone does not change behavior here: this file never creates or passes an UnknownErrorTracker, so every safeSendMessage(...) call still suppresses UNKNOWN failures indefinitely. If the goal of this refactor is to stop silent looping in DAG execution, the tracker needs to be threaded through these send sites.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/workflows/src/dag-executor.ts` around lines 77 - 79, The DAG
execution path is still not enabling the UNKNOWN abort guard because
dag-executor.ts never creates or threads an UnknownErrorTracker into its send
sites; importing SendMessageContext and safeSendMessage alone doesn't change
behavior. Fix by instantiating an UnknownErrorTracker (or accept one from the
caller) in the relevant DAG flow in dag-executor.ts and pass it into every
safeSendMessage call (and into the SendMessageContext if your shared API expects
it), ensuring the UNKNOWN abort guard is activated during DAG execution; update
function signatures or call sites (e.g., the functions that call safeSendMessage
in this file) to accept and forward the tracker where needed.
There was a problem hiding this comment.
Good catch — you're right that importing the shared helper alone doesn't activate the guard, and my original commit message overclaimed. The PR description and commit message have been updated to honestly describe this as deduplication-only; the UnknownErrorTracker remains dormant infrastructure on both sides (same as upstream/dev today, where executor.ts declares the interface but never instantiates one either).
Note for maintainer (cc'd in the PR description): I deliberately scoped this PR to extraction so the refactor is review-friendly and easy to revert. If you'd prefer this PR also wire up the tracker in dag-executor.ts (and possibly executor.ts's callsites) to actually deliver the abort behavior in production, happy to add that on top — just let me know which scope you'd prefer. Otherwise I'd suggest activation as a follow-up PR so the design choices (per-conversation tracker? per-DAG-run? where does it live?) get their own discussion.
There was a problem hiding this comment.
@ericsoriano, thanks for the clear explanation and for correcting the PR description — that kind of transparency makes review much easier.
The deduplication-only scope is a perfectly reasonable call. The design questions you raised (tracker lifetime: per-conversation vs. per-DAG-run, ownership, where it lives) are genuinely non-trivial, and bundling those decisions into the same PR as the extraction would make both harder to review or revert independently.
The remaining observation — that executor.ts also never actually instantiates an UnknownErrorTracker despite defining the interface — reinforces that activation is a distinct piece of work that warrants its own PR and discussion.
Would you like me to open a follow-up GitHub issue to track the activation work (wiring UnknownErrorTracker into both executor.ts and dag-executor.ts call sites, with a design note on tracker lifetime/ownership)? That way the decision doesn't get lost after this PR merges.
(^_^)
✏️ Learnings added
Learnt from: ericsoriano
Repo: coleam00/Archon PR: 1496
File: packages/workflows/src/dag-executor.ts:77-79
Timestamp: 2026-04-29T18:07:33.180Z
Learning: In `coleam00/Archon` `packages/workflows/src/dag-executor.ts` and `packages/workflows/src/executor.ts`: `UnknownErrorTracker` is intentionally dormant/never instantiated on both sides as of PR `#1496`. PR `#1496` was scoped to extraction/deduplication only (moving `safeSendMessage`, `SendMessageContext`, `UnknownErrorTracker` into `executor-shared.ts`). Activating the abort guard (actually wiring `UnknownErrorTracker` into call sites) is deliberately deferred to a follow-up PR; open design questions include tracker lifetime (per-conversation vs. per-DAG-run) and ownership location.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: atlas-architect
Repo: coleam00/Archon PR: 0
File: :0-0
Timestamp: 2026-04-28T21:00:22.937Z
Learning: In `packages/workflows/src/dag-executor.test.ts` (Bun test suite): always wrap `expect(...)` assertions and `spy.mockRestore()` in a `try/finally` block so the spy is restored even when the assertion throws. A leaked `execFileAsync` spy bleeds into downstream unrelated tests and produces misleading outputs (e.g. `'Use: ok'`). Pattern: `try { expect(spy).toHaveBeenCalledWith(...) } finally { spy.mockRestore() }`.
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-28T11:00:25.790Z
Learning: Applies to **/*.{ts,tsx} : Use `execFileAsync` (not `exec`) when calling git or external commands directly from TypeScript; handle expected failures gracefully (missing directories during cleanup); surface git errors to users for actionable issues (conflicts, uncommitted changes)
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-28T11:00:25.790Z
Learning: Applies to packages/workflows/src/**/*.ts : archon/workflows: workflow engine (loader, router, executor, DAG, logger, bundled defaults); depends only on archon/git + archon/paths + archon/providers/types + hono/zod-openapi + zod; DB/AI/config injected via `WorkflowDeps`
Learnt from: Wirasm
Repo: coleam00/Archon PR: 0
File: :0-0
Timestamp: 2026-04-28T10:42:58.632Z
Learning: In the `coleam00/Archon` repo, the `archon/providers` import boundary documented in `CLAUDE.md` is aspirational, not enforced. As of PR `#1463`, `archon/workflows` already directly imports registry functions (`getProviderCapabilities`, `getRegisteredProviders`, `isRegisteredProvider`) from `archon/providers` (not `/types`) in `packages/workflows/src/dag-executor.ts`, `packages/workflows/src/executor.ts`, and `packages/workflows/src/validator.ts`. Registry queries from `archon/providers` in workflow-package files are acceptable consistent with this established precedent.
There was a problem hiding this comment.
Please open a follow-up GitHub issue to track the activation work (wiring UnknownErrorTracker into both executor.ts and dag-executor.ts
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
gh issue create \
--repo coleam00/Archon \
--assignee ericsoriano \
--title "feat(workflows): activate UnknownErrorTracker abort guard in executor.ts and dag-executor.ts" \
--body "## Summary
This issue tracks the follow-up activation work deferred from `#1496`.
PR `#1496` extracted \`safeSendMessage\`, \`SendMessageContext\`, and \`UnknownErrorTracker\` into \`executor-shared.ts\` (deduplication only). As of that PR, \`UnknownErrorTracker\` remains dormant/never instantiated on **both** sides — \`executor.ts\` and \`dag-executor.ts\` — meaning the consecutive-UNKNOWN abort guard is available as infrastructure but not active in production.
## Work required
- Wire \`UnknownErrorTracker\` into the \`safeSendMessage\` call sites in \`packages/workflows/src/executor.ts\`
- Wire \`UnknownErrorTracker\` into the \`safeSendMessage\` call sites in \`packages/workflows/src/dag-executor.ts\`
## Open design questions to resolve before implementing
1. **Tracker lifetime**: should the tracker be scoped per-conversation or per-DAG-run?
2. **Ownership location**: where should the tracker be instantiated and held — at the top-level executor, per-node, or passed in via \`WorkflowDeps\`?
3. **Behavior on abort**: what should happen when the threshold is hit in DAG execution — fail the current node, fail the entire DAG run, or surface to the user for a retry decision?
## References
- PR: https://github.com/coleam00/Archon/pull/1496
- Comment thread: https://github.com/coleam00/Archon/pull/1496#discussion_r3163035187
- Requested by: `@ericsoriano`
"
Length of output: 77
🏁 Script executed:
gh issue create \
--repo coleam00/Archon \
--title "feat(workflows): activate UnknownErrorTracker abort guard in executor.ts and dag-executor.ts" \
--body "## Summary
This issue tracks the follow-up activation work deferred from `#1496`.
PR `#1496` extracted \`safeSendMessage\`, \`SendMessageContext\`, and \`UnknownErrorTracker\` into \`executor-shared.ts\` (deduplication only). As of that PR, \`UnknownErrorTracker\` remains dormant/never instantiated on **both** sides — \`executor.ts\` and \`dag-executor.ts\` — meaning the consecutive-UNKNOWN abort guard is available as infrastructure but not active in production.
## Work required
- Wire \`UnknownErrorTracker\` into the \`safeSendMessage\` call sites in \`packages/workflows/src/executor.ts\`
- Wire \`UnknownErrorTracker\` into the \`safeSendMessage\` call sites in \`packages/workflows/src/dag-executor.ts\`
## Open design questions to resolve before implementing
1. **Tracker lifetime**: should the tracker be scoped per-conversation or per-DAG-run?
2. **Ownership location**: where should the tracker be instantiated and held — at the top-level executor, per-node, or passed in via \`WorkflowDeps\`?
3. **Behavior on abort**: what should happen when the threshold is hit in DAG execution — fail the current node, fail the entire DAG run, or surface to the user for a retry decision?
## References
- PR: https://github.com/coleam00/Archon/pull/1496
- Comment thread: https://github.com/coleam00/Archon/pull/1496#discussion_r3163035187
- Requested by: `@ericsoriano`
"
Length of output: 47
@ericsoriano, done! The follow-up issue has been opened here: #1512
It captures:
- The two files that need wiring (
executor.tsanddag-executor.ts) - The three open design questions (tracker lifetime, ownership location, abort behavior)
- Backlinks to PR
#1496and this comment thread
Note: I wasn't able to auto-assign it due to integration permissions — you may want to assign it to yourself directly on GitHub.
🧠 Learnings used
Learnt from: ericsoriano
Repo: coleam00/Archon PR: 1496
File: packages/workflows/src/dag-executor.ts:77-79
Timestamp: 2026-04-29T18:07:33.180Z
Learning: In `coleam00/Archon` `packages/workflows/src/dag-executor.ts` and `packages/workflows/src/executor.ts`: `UnknownErrorTracker` is intentionally dormant/never instantiated on both sides as of PR `#1496`. PR `#1496` was scoped to extraction/deduplication only (moving `safeSendMessage`, `SendMessageContext`, `UnknownErrorTracker` into `executor-shared.ts`). Activating the abort guard (actually wiring `UnknownErrorTracker` into call sites) is deliberately deferred to a follow-up PR; open design questions include tracker lifetime (per-conversation vs. per-DAG-run) and ownership location.
Learnt from: atlas-architect
Repo: coleam00/Archon PR: 0
File: :0-0
Timestamp: 2026-04-28T21:00:22.937Z
Learning: In `packages/workflows/src/dag-executor.test.ts` (Bun test suite): always wrap `expect(...)` assertions and `spy.mockRestore()` in a `try/finally` block so the spy is restored even when the assertion throws. A leaked `execFileAsync` spy bleeds into downstream unrelated tests and produces misleading outputs (e.g. `'Use: ok'`). Pattern: `try { expect(spy).toHaveBeenCalledWith(...) } finally { spy.mockRestore() }`.
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-28T11:00:25.790Z
Learning: Applies to **/*.{ts,tsx} : Use `execFileAsync` (not `exec`) when calling git or external commands directly from TypeScript; handle expected failures gracefully (missing directories during cleanup); surface git errors to users for actionable issues (conflicts, uncommitted changes)
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-28T11:00:25.790Z
Learning: Applies to packages/workflows/src/**/*.ts : archon/workflows: workflow engine (loader, router, executor, DAG, logger, bundled defaults); depends only on archon/git + archon/paths + archon/providers/types + hono/zod-openapi + zod; DB/AI/config injected via `WorkflowDeps`
Learnt from: Wirasm
Repo: coleam00/Archon PR: 0
File: :0-0
Timestamp: 2026-04-28T10:42:58.632Z
Learning: In the `coleam00/Archon` repo, the `archon/providers` import boundary documented in `CLAUDE.md` is aspirational, not enforced. As of PR `#1463`, `archon/workflows` already directly imports registry functions (`getProviderCapabilities`, `getRegisteredProviders`, `isRegisteredProvider`) from `archon/providers` (not `/types`) in `packages/workflows/src/dag-executor.ts`, `packages/workflows/src/executor.ts`, and `packages/workflows/src/validator.ts`. Registry queries from `archon/providers` in workflow-package files are acceptable consistent with this established precedent.
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-28T11:00:25.790Z
Learning: When a process cannot reliably distinguish 'actively running elsewhere' from 'orphaned by a crash' (started by a different process or input source), it must not autonomously mark work as failed/cancelled/abandoned based on a timer or staleness guess; surface the ambiguous state to the user and provide a one-click action (No Autonomous Lifecycle Mutation Across Process Boundaries)
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-28T11:00:25.790Z
Learning: Applies to packages/providers/src/**/*.ts : archon/providers: AI agent providers (Claude, Codex, Pi community) — owns SDK deps; `IAgentProvider` interface and `sendQuery()` contract; `archon/providers/types` is the contract subpath (zero SDK deps) that archon/workflows imports; providers receive raw `nodeConfig` + `assistantConfig` and translate to SDK-specific options internally
executor.ts and dag-executor.ts each defined a near-identical copy of safeSendMessage and SendMessageContext. Move both into executor-shared.ts as exports and import from there. The UnknownErrorTracker guard is also relocated and exported, but remains dormant infrastructure: no callsite in either file instantiates a tracker today, so this PR delivers pure deduplication with no behavioral change. Activating the guard in DAG execution is appropriate follow-up work. Also addresses two issues surfaced during code review (coleam00#1496): - Reset UnknownErrorTracker.count on any non-UNKNOWN error (was only resetting on success), so a sequence like UNKNOWN -> TRANSIENT -> UNKNOWN -> UNKNOWN no longer trips the threshold incorrectly. - Use the structured 'platform.critical_message_send_failed' event name in the inlined log call instead of the prose 'Critical message send failed' string. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7e3bcd6 to
fec4987
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/workflows/src/executor-shared.ts (1)
573-584: Align shared log event name with project logging convention.Use
{domain}.{action}_{state}format for this event key to stay consistent with logging standards.♻️ Suggested diff
- 'platform_message_send_failed' + 'platform.message_send_failed'As per coding guidelines, structured logging with Pino must use event naming format
{domain}.{action}_{state}.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/workflows/src/executor-shared.ts` around lines 573 - 584, The log event key 'platform_message_send_failed' does not follow the {domain}.{action}_{state} convention; update the getLog().error call (the call site using getLog(), platform.getPlatformType(), err, conversationId, message.length, etc.) to use a dotted domain format such as 'platform.message_send_failed' (or the correct domain token for your project) and update any other matching occurrences to the same convention; run tests/lint to ensure consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/workflows/src/executor-shared.ts`:
- Around line 573-584: The log event key 'platform_message_send_failed' does not
follow the {domain}.{action}_{state} convention; update the getLog().error call
(the call site using getLog(), platform.getPlatformType(), err, conversationId,
message.length, etc.) to use a dotted domain format such as
'platform.message_send_failed' (or the correct domain token for your project)
and update any other matching occurrences to the same convention; run tests/lint
to ensure consistency.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fd0e40cb-c111-4a4b-b306-bcd5d971863e
📒 Files selected for processing (3)
packages/workflows/src/dag-executor.tspackages/workflows/src/executor-shared.tspackages/workflows/src/executor.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/workflows/src/dag-executor.ts
Review SummaryVerdict: minor-fixes-needed This PR deduplicates Blocking issues
Suggested fixes
Minor / nice-to-have
Compliments
Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality. |
executor.ts and dag-executor.ts each defined a near-identical copy of safeSendMessage and SendMessageContext. Move both into executor-shared.ts as exports and import from there. The UnknownErrorTracker guard is also relocated and exported, but remains dormant infrastructure: no callsite in either file instantiates a tracker today, so this PR delivers pure deduplication with no behavioral change. Activating the guard in DAG execution is appropriate follow-up work. Also addresses two issues surfaced during code review (coleam00#1496): - Reset UnknownErrorTracker.count on any non-UNKNOWN error (was only resetting on success), so a sequence like UNKNOWN -> TRANSIENT -> UNKNOWN -> UNKNOWN no longer trips the threshold incorrectly. - Use the structured 'platform.critical_message_send_failed' event name in the inlined log call instead of the prose 'Critical message send failed' string. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the six cases requested in the PR review: - Success resets UnknownErrorTracker to 0 and returns true - TRANSIENT error returns false without throwing - FATAL error rethrows with 'Platform authentication/permission error' - UNKNOWN error increments tracker and returns false below threshold - Three consecutive UNKNOWN errors trip the threshold and throw - TRANSIENT resets tracker so a subsequent UNKNOWN does not trip threshold - Calls without unknownErrorTracker (DAG executor path) never throw Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
e549687 to
f1873a1
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/workflows/src/executor.ts (1)
45-69:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep critical sends on the shared escalation policy.
This retry path now bypasses
safeSendMessage, so it suppresses cases the shared helper escalates:FATALonly breaks the loop here, and consecutiveUNKNOWNfailures never trip the shared abort threshold. Because several callers continue after afalsereturn, platform auth/config issues can degrade into silent notification loss instead of surfacing immediately. Please route this throughsafeSendMessage, or explicitly document why downgrading those failures is safe for critical notifications.As per coding guidelines, "Document fallback behavior with a comment when a fallback is intentional and safe; otherwise throw".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/workflows/src/executor.ts` around lines 45 - 69, The retry loop currently calls platform.sendMessage directly and bypasses the shared escalation logic in safeSendMessage, which suppresses escalation for FATAL/UNKNOWN errors; update the loop to call the shared helper (safeSendMessage) instead of platform.sendMessage so the common abort/escalation behavior is preserved (or if you intentionally want different behavior, add a clear comment documenting why downgrading is safe). Concretely, replace the platform.sendMessage(conversationId, message, metadata) call inside the for-loop with safeSendMessage(...) (passing the same conversationId, message, metadata and context), remove or adapt local classifyError-based suppression, and ensure FATAL errors are propagated or handled by the shared helper rather than silently breaking the loop.
🧹 Nitpick comments (1)
packages/workflows/src/executor.ts (1)
64-64: Update log consumers for the renamed failure event.This label change will break any dashboards or alerts still querying the previous send-failure event name. Make sure monitoring includes
platform.critical_message_send_failedbefore this lands.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/workflows/src/executor.ts` at line 64, The renamed failure event label has changed to 'platform.critical_message_send_failed'; update any monitoring, dashboards, alerts and code that query or emit the old event name to use 'platform.critical_message_send_failed' instead. Search for occurrences of the old label in logging/telemetry consumers and in your emitter/handler functions (the code paths that call the log/emit functions in executor.ts) and replace them with 'platform.critical_message_send_failed', then run smoke tests on dashboards/alerts to confirm they pick up events under the new name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/workflows/src/executor.ts`:
- Around line 45-69: The retry loop currently calls platform.sendMessage
directly and bypasses the shared escalation logic in safeSendMessage, which
suppresses escalation for FATAL/UNKNOWN errors; update the loop to call the
shared helper (safeSendMessage) instead of platform.sendMessage so the common
abort/escalation behavior is preserved (or if you intentionally want different
behavior, add a clear comment documenting why downgrading is safe). Concretely,
replace the platform.sendMessage(conversationId, message, metadata) call inside
the for-loop with safeSendMessage(...) (passing the same conversationId,
message, metadata and context), remove or adapt local classifyError-based
suppression, and ensure FATAL errors are propagated or handled by the shared
helper rather than silently breaking the loop.
---
Nitpick comments:
In `@packages/workflows/src/executor.ts`:
- Line 64: The renamed failure event label has changed to
'platform.critical_message_send_failed'; update any monitoring, dashboards,
alerts and code that query or emit the old event name to use
'platform.critical_message_send_failed' instead. Search for occurrences of the
old label in logging/telemetry consumers and in your emitter/handler functions
(the code paths that call the log/emit functions in executor.ts) and replace
them with 'platform.critical_message_send_failed', then run smoke tests on
dashboards/alerts to confirm they pick up events under the new name.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d0395a0e-3689-4bbd-af51-7b9caf6cb38f
📒 Files selected for processing (4)
packages/workflows/src/dag-executor.tspackages/workflows/src/executor-shared.test.tspackages/workflows/src/executor-shared.tspackages/workflows/src/executor.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/workflows/src/executor-shared.test.ts
- packages/workflows/src/executor-shared.ts
Tests added for
|
| Test case | What it verifies |
|---|---|
| Success path | Returns true, resets UnknownErrorTracker.count to 0 |
| TRANSIENT error | Returns false without throwing |
| FATAL error | Rethrows with 'Platform authentication/permission error' |
| UNKNOWN below threshold | Increments tracker, returns false |
| 3 consecutive UNKNOWNs | Throws '3 consecutive unrecognized errors' |
[UNKNOWN→TRANSIENT→UNKNOWN] sequence |
Count resets on TRANSIENT — no throw on final UNKNOWN |
| No tracker (DAG executor path) | UNKNOWN errors never throw regardless of call count |
Validation
```
bun run validate → exit 0
```
| Check | Result |
|---|---|
| `type-check` (all 10 packages) | ✅ |
| `lint` (0 warnings) | ✅ |
| `format:check` | ✅ |
| `test` (all packages) | ✅ |
…utor-shared (coleam00#1496) * refactor(workflows): consolidate safeSendMessage into executor-shared executor.ts and dag-executor.ts each defined a near-identical copy of safeSendMessage and SendMessageContext. Move both into executor-shared.ts as exports and import from there. The UnknownErrorTracker guard is also relocated and exported, but remains dormant infrastructure: no callsite in either file instantiates a tracker today, so this PR delivers pure deduplication with no behavioral change. Activating the guard in DAG execution is appropriate follow-up work. Also addresses two issues surfaced during code review (coleam00#1496): - Reset UnknownErrorTracker.count on any non-UNKNOWN error (was only resetting on success), so a sequence like UNKNOWN -> TRANSIENT -> UNKNOWN -> UNKNOWN no longer trips the threshold incorrectly. - Use the structured 'platform.critical_message_send_failed' event name in the inlined log call instead of the prose 'Critical message send failed' string. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(workflows): add unit tests for safeSendMessage in executor-shared Covers the six cases requested in the PR review: - Success resets UnknownErrorTracker to 0 and returns true - TRANSIENT error returns false without throwing - FATAL error rethrows with 'Platform authentication/permission error' - UNKNOWN error increments tracker and returns false below threshold - Three consecutive UNKNOWN errors trip the threshold and throw - TRANSIENT resets tracker so a subsequent UNKNOWN does not trip threshold - Calls without unknownErrorTracker (DAG executor path) never throw Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
executor.tsanddag-executor.tseach define a near-identical copy ofsafeSendMessageandSendMessageContext. TheUnknownErrorTrackerguard was declared inexecutor.tsbut is dormant — no callsite in either file instantiates a tracker today.SendMessageContext,UnknownErrorTracker, andsafeSendMessageintoexecutor-shared.tsas exports. Bothexecutor.tsanddag-executor.tsnow import from there. No behavior change — neither file passes a tracker tosafeSendMessage, so the guard remains dormant infrastructure (same asupstream/devtoday).UnknownErrorTrackerinstantiations. No callsite signatures changed. No tests changed; existing suite covers all callsites. Activating the guard in DAG execution is appropriate follow-up work — see "Note for maintainer" below.UX Journey
This is a refactor below the user-facing layer; runtime behavior is identical. The UX journey diagram below is what would happen if a tracker were eventually wired up — not what happens today.
Today (this PR)
Future (if a follow-up wires up a tracker)
Architecture Diagram
Before
After
Connection inventory:
executor.tsexecutor-shared.tslogSendErrorhelperdag-executor.tsexecutor-shared.tssafeSendMessage+type SendMessageContext; deletes local copiesexecutor-shared.tsLabel Snapshot
risk: lowsize: Mworkflowsworkflows:executor-sharedChange Metadata
refactorworkflowsLinked Issue
Validation Evidence (required)
check:bundledpassestype-checkpasses (all packages)lintpasses (0 errors, 0 warnings)format:checkpassesEvidence:
bun run validateexit code 0 against this branch (rebased ontoupstream/devat2945f2ec, then re-validated after addressing review).Security Impact (required)
Compatibility / Migration
safeSendMessageretain identical signatures and behavior. The function still accepts an optionalUnknownErrorTracker(no callsite passes one today, in either file, on eitherupstream/devor this branch).Human Verification (required)
The duplication was surfaced by running
archon-architectagainst this codebase as the architectural-review test for PR #1479. The workflow's analysis identifiedsafeSendMessageduplication as the most acute correctness drift in the executor pair. The resulting commit was filed as a verification PR on my fork (now closed) before being lifted to this upstream contribution and updated based on review.safeSendMessagehappy path, FATAL throw, and TRANSIENT log+continue.upstream/devproduced one trivial import-block conflict indag-executor.ts(combining a recently-addedformatSubprocessFailureimport with the newsafeSendMessage/type SendMessageContextimports), resolved by keeping all four. No semantic conflict.UnknownErrorTrackeris and was dormant on both sides — no callsite instantiates one. The guard's threshold check (UNKNOWN_ERROR_THRESHOLD = 3) is unchanged from the pre-existingexecutor.tslogic.UNKNOWN → TRANSIENT → UNKNOWN → UNKNOWNsequence no longer trips the threshold incorrectly.UNKNOWN_ERROR_THRESHOLDpath. Code path verified statically; no callsite instantiates a tracker, so the path is unreachable in production today.Side Effects / Blast Radius (required)
safeSendMessageexactly as before; only the symbol's source file has changed.'platform_message_send_failed') and the new structured'platform.critical_message_send_failed'event continue to surface every send failure for observability.Rollback Plan (required)
git revert <merge-sha>— single commit, three files, no schema or external interface changes outside this scope.executor.tsordag-executor.ts(missing imports), or duplicated symbols inexecutor-shared.ts. CI's existingtype-checkstep would catch all of these.Risks and Mitigations
executor.ts/dag-executor.tsproduce additional import-block conflicts before merge.dag-executor.tsbusiness logic, not the import block; risk is low.Updates from review
CodeRabbit raised three items; addressed in commit amend:
executor-shared.ts:599): pre-existing bug inherited fromexecutor.ts. Fixed — tracker now resets on any non-UNKNOWN error type, not only on success.executor.ts:64): renamed'Critical message send failed'→'platform.critical_message_send_failed'to match the repo'sdomain.action_statePino convention.Note for maintainer
I deliberately kept this PR scoped to extraction so the refactor is review-friendly and easy to revert. The
UnknownErrorTrackerguard was already dormant onupstream/devbefore this PR (no callsite inexecutor.tsinstantiated one); this PR preserves that state in the consolidated location.If you'd prefer this PR also wire up the
UnknownErrorTrackerindag-executor.ts(and possiblyexecutor.ts's callsites) to actually deliver the abort behavior in production, happy to add that on top — just let me know which scope you'd prefer. Otherwise I'd suggest activation as a follow-up PR so the design choices (per-conversation tracker? per-DAG-run? where does it live?) get their own discussion.Summary by CodeRabbit
Improvements
Tests