Skip to content

fix: Windows HOME resolution and MCP plugin warning suppression#1134

Closed
MrFadiAi wants to merge 1 commit into
coleam00:devfrom
MrFadiAi:fix/windows-home-mcp-warnings
Closed

fix: Windows HOME resolution and MCP plugin warning suppression#1134
MrFadiAi wants to merge 1 commit into
coleam00:devfrom
MrFadiAi:fix/windows-home-mcp-warnings

Conversation

@MrFadiAi

@MrFadiAi MrFadiAi commented Apr 12, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Fix Windows path resolution bug where HOME environment variable is unset, causing CLI and server to fail loading ~/.archon/.env configuration
  • Suppress non-fatal MCP connection failures from user-level plugins (e.g., telegram) in headless workflow subprocesses — only surface warnings for workflow-configured MCP servers

Changes

Windows HOME Resolution (@archon/cli, @archon/server)

  • Replace process.env.HOME with getArchonHome() from @archon/paths for cross-platform config path resolution in both CLI and server entry points
  • On Windows, HOME is often unset — getArchonHome() correctly falls back to USERPROFILE or HOMEDRIVE+HOMEPATH

MCP Plugin Warning Suppression (@archon/workflows, @archon/core)

  • Track workflow-configured MCP server names in the DAG executor
  • Filter MCP connection failure messages to only surface warnings for servers explicitly configured via mcp: in workflow YAML
  • User-level plugin MCP failures (e.g., telegram) are logged at debug level instead of warn and not sent to the user
  • Pass settings: { enabledPlugins: {} } to Claude Agent SDK subprocess to attempt disabling user plugins

Test plan

  • Verified on Windows 11 where HOME is unset — CLI and server now correctly load ~/.archon/.env
  • Verified archon workflow run archon-assist "Say hello" completes without telegram MCP warning
  • TypeScript type-check passes for all packages
  • ESLint + Prettier pass (pre-commit hook)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved MCP server connection error handling by distinguishing between workflow-level and plugin-level failures; only workflow failures are reported to users, while plugin failures are logged internally.
  • New Features

    • Claude agent subprocess now exclusively uses explicitly configured MCP servers for enhanced predictability and control.

- Use getArchonHome() instead of process.env.HOME for cross-platform
  config path resolution in CLI and server entry points
- Suppress non-fatal MCP connection failures from user-level plugins
  (e.g., telegram) in headless workflow subprocesses
- Only surface MCP failures for workflow-configured servers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 12, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Updates env file path resolution across CLI, server, and tests to use a centralized getArchonHome() function. Disables user plugin execution in Claude subprocess by adding plugin settings. Modifies workflow executor to differentiate and filter MCP server connection failures between workflow-specific and plugin-specific sources.

Changes

Cohort / File(s) Summary
Environment Path Standardization
packages/cli/src/cli.ts, packages/cli/src/cli.test.ts, packages/server/src/index.ts
Replaced manual home directory resolution with centralized getArchonHome() function for computing the global ~/.archon/.env file path. Updated imports and adjusted test setup to reflect the new path resolution method.
Claude Provider Configuration
packages/core/src/providers/claude.ts
Added settings: { enabledPlugins: {} } to Claude Agent SDK options to disable user and plugin execution within the subprocess, restricting activity to supplied MCP servers only.
MCP Failure Filtering
packages/workflows/src/dag-executor.ts
Implemented differentiation between workflow-specific and plugin-specific MCP server connection failures in system message handling. Only workflow MCP failures are surfaced to users via safeSendMessage; plugin failures are logged at debug level and suppressed from user output.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 Paths now home, with getArchon's grace,
MCP failures sorted to their rightful place—
Plugins hush while workflows speak,
Claude's plugins rest, no longer meek!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers the main problem, solution, and test plan, but is missing critical template sections including architecture diagrams, validation evidence, security impact, compatibility analysis, human verification details, and rollback plan. Complete the description template by adding Architecture Diagrams (Before/After), Validation Evidence (test command outputs), Security Impact assessment, Compatibility/Migration details, Human Verification scope, Side Effects/Blast Radius analysis, and Rollback Plan.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the two main changes: Windows HOME resolution fix and MCP plugin warning suppression, matching the core objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/cli/src/cli.test.ts (1)

241-246: ⚠️ Potential issue | 🟠 Major

Test is environment-dependent and can be flaky.

This path now targets the real user ~/.archon/.env when present, so test outcomes can vary by machine/user file contents. Prefer a temp .env fixture to keep this deterministic.

Proposed deterministic test adjustment
   it('should allow ~/.archon/.env to override Bun-auto-loaded vars via override:true', async () => {
     const { config } = await import('dotenv');
-    const { resolve } = await import('path');
-    const { existsSync } = await import('fs');
+    const { resolve, join } = await import('path');
+    const { mkdtemp, writeFile, rm } = await import('fs/promises');
+    const { tmpdir } = await import('os');

     // Simulate Bun auto-loading a stale value
     process.env.TEST_ARCHON_OVERRIDE = 'from-cwd-env';

-    // Write a temporary env content and load with override
-    const globalEnvPath = resolve(getArchonHome(), '.env');
-    if (existsSync(globalEnvPath)) {
-      const result = config({ path: globalEnvPath, override: true });
-      // If ~/.archon/.env exists and has DATABASE_URL, it should override
-      expect(result.error).toBeUndefined();
-    }
+    const tempDir = await mkdtemp(join(tmpdir(), 'archon-cli-test-'));
+    const testEnvPath = resolve(tempDir, '.env');
+    await writeFile(testEnvPath, 'TEST_ARCHON_OVERRIDE=from-archon-env\n', 'utf8');
+    const result = config({ path: testEnvPath, override: true });
+    expect(result.error).toBeUndefined();
+    expect(process.env.TEST_ARCHON_OVERRIDE).toBe('from-archon-env');
+    await rm(tempDir, { recursive: true, force: true });

     // Clean up
     delete process.env.TEST_ARCHON_OVERRIDE;
   });

As per coding guidelines: "Keep tests deterministic — no flaky timing or network dependence without guardrails; ensure local validation commands (bun run validate) map directly to CI expectations (Determinism + Reproducibility)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cli.test.ts` around lines 241 - 246, The test reads the real
user home (~/.archon/.env) via getArchonHome() and becomes flaky; instead create
a temporary directory and a temp .env fixture, write the expected DATABASE_URL
into that file, mock or override getArchonHome() to return the temp dir (or set
the env value used by getArchonHome()), then call resolve(getArchonHome(),
'.env') and load with config({ path: ..., override: true }) and assert
result.error is undefined; finally clean up the temp file/dir. Target symbols:
globalEnvPath, getArchonHome, resolve, existsSync, and config.
🧹 Nitpick comments (2)
packages/core/src/providers/claude.ts (1)

388-391: Scope plugin suppression to workflow/headless calls only.

This hard-codes plugin disabling for every ClaudeProvider call. If this provider is reused outside DAG/headless runs, it can unintentionally disable user-intended plugins there too. Prefer a typed AgentRequestOptions flag (for example, disableUserPlugins?: boolean) and apply settings.enabledPlugins only when that flag is set by workflow execution paths.

Proposed direction
-        settings: { enabledPlugins: {} },
+        ...(requestOptions?.disableUserPlugins === true
+          ? { settings: { enabledPlugins: {} } }
+          : {}),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/providers/claude.ts` around lines 388 - 391, The provider
currently unconditionally injects settings.enabledPlugins into every
ClaudeProvider call; change this to a conditional based on a new optional
boolean flag on the request type (e.g., add disableUserPlugins?: boolean to
AgentRequestOptions) and only set settings: { enabledPlugins: {} } on the
outgoing request when AgentRequestOptions.disableUserPlugins === true (default
false). Update the types for AgentRequestOptions, the call site(s) that create
requests for workflow/headless runs to set disableUserPlugins: true, and the
ClaudeProvider request-building logic (referencing ClaudeProvider and
settings.enabledPlugins) so normal non-workflow calls keep user plugins enabled.
packages/workflows/src/dag-executor.ts (1)

1037-1071: Minor duplication in filtering logic.

The code filters failedEntries twice with the same split(' (')[0] operation. While this is readable and the arrays are small, a single-pass partition would be slightly more efficient.

♻️ Optional: single-pass partition
-          const workflowFailures = failedEntries.filter(entry => {
-            const name = entry.split(' (')[0];
-            return workflowMcpNames.has(name);
-          });
-
-          if (workflowFailures.length > 0) {
+          const workflowFailures: string[] = [];
+          const pluginFailures: string[] = [];
+          for (const entry of failedEntries) {
+            const name = entry.split(' (')[0];
+            if (workflowMcpNames.has(name)) {
+              workflowFailures.push(entry);
+            } else {
+              pluginFailures.push(entry);
+            }
+          }
+
+          if (workflowFailures.length > 0) {
             const workflowMsg = `MCP server connection failed: ${workflowFailures.join(', ')}`;
             getLog().warn(
               { nodeId: node.id, mcpStatus: workflowMsg },
               'dag.mcp_server_connection_failed'
             );
             const delivered = await safeSendMessage(
               platform,
               conversationId,
               workflowMsg,
               nodeContext
             );
             if (!delivered) {
               getLog().error(
                 { nodeId: node.id, mcpStatus: workflowMsg, workflowRunId: workflowRun.id },
                 'dag.mcp_connection_failure_delivery_failed'
               );
             }
           }
-          // User-plugin MCP failures: log at debug, don't surface to user
-          const pluginFailures = failedEntries.filter(entry => {
-            const name = entry.split(' (')[0];
-            return !workflowMcpNames.has(name);
-          });
+
           if (pluginFailures.length > 0) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/workflows/src/dag-executor.ts` around lines 1037 - 1071, The code
currently computes workflowFailures and pluginFailures by filtering
failedEntries twice using entry.split(' (')[0]; change this to a single-pass
partition over failedEntries (e.g., a for loop or Array.reduce) that extracts
const name = entry.split(' (')[0] once and pushes the entry into either
workflowFailures or pluginFailures based on workflowMcpNames.has(name); keep the
subsequent logic that logs workflowFailures, calls safeSendMessage, and logs
delivery errors unchanged, only replace the duplicate filtering with the
single-pass partition that produces workflowFailures and pluginFailures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/cli/src/cli.test.ts`:
- Around line 241-246: The test reads the real user home (~/.archon/.env) via
getArchonHome() and becomes flaky; instead create a temporary directory and a
temp .env fixture, write the expected DATABASE_URL into that file, mock or
override getArchonHome() to return the temp dir (or set the env value used by
getArchonHome()), then call resolve(getArchonHome(), '.env') and load with
config({ path: ..., override: true }) and assert result.error is undefined;
finally clean up the temp file/dir. Target symbols: globalEnvPath,
getArchonHome, resolve, existsSync, and config.

---

Nitpick comments:
In `@packages/core/src/providers/claude.ts`:
- Around line 388-391: The provider currently unconditionally injects
settings.enabledPlugins into every ClaudeProvider call; change this to a
conditional based on a new optional boolean flag on the request type (e.g., add
disableUserPlugins?: boolean to AgentRequestOptions) and only set settings: {
enabledPlugins: {} } on the outgoing request when
AgentRequestOptions.disableUserPlugins === true (default false). Update the
types for AgentRequestOptions, the call site(s) that create requests for
workflow/headless runs to set disableUserPlugins: true, and the ClaudeProvider
request-building logic (referencing ClaudeProvider and settings.enabledPlugins)
so normal non-workflow calls keep user plugins enabled.

In `@packages/workflows/src/dag-executor.ts`:
- Around line 1037-1071: The code currently computes workflowFailures and
pluginFailures by filtering failedEntries twice using entry.split(' (')[0];
change this to a single-pass partition over failedEntries (e.g., a for loop or
Array.reduce) that extracts const name = entry.split(' (')[0] once and pushes
the entry into either workflowFailures or pluginFailures based on
workflowMcpNames.has(name); keep the subsequent logic that logs
workflowFailures, calls safeSendMessage, and logs delivery errors unchanged,
only replace the duplicate filtering with the single-pass partition that
produces workflowFailures and pluginFailures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c58d79d8-8e7b-46a7-a670-0e0fe9c298c3

📥 Commits

Reviewing files that changed from the base of the PR and between eb75ab6 and 43c0b97.

📒 Files selected for processing (5)
  • packages/cli/src/cli.test.ts
  • packages/cli/src/cli.ts
  • packages/core/src/providers/claude.ts
  • packages/server/src/index.ts
  • packages/workflows/src/dag-executor.ts

@Wirasm

Wirasm commented Apr 21, 2026

Copy link
Copy Markdown
Collaborator

Thanks for raising this, @MrFadiAi — closing in favor of a fresh fix on current dev.

This PR is 9 days stale and all three changes have drifted:

  1. Windows HOME fix — already shipped. packages/cli/src/cli.ts and packages/server/src/index.ts no longer reference process.env.HOME; PR CLI silently strips repo-local .env vars; logs say (0) from .env instead of stripped N keys #1302 moved env loading into loadArchonEnv() (@archon/paths/env-loader), which uses getArchonHome() internally and handles the Windows case (USERPROFILE / HOMEDRIVE+HOMEPATH / ARCHON_HOME override). No further change needed.

  2. settings: { enabledPlugins: {} } in claude.ts — file moved. packages/core/src/providers/claude.ts was extracted to packages/providers/src/claude/provider.ts during the provider registry refactor, so the patch can't apply.

  3. MCP plugin warning suppression in dag-executor.ts — core idea is good, but the surrounding code has been touched by feat(isolation,workflows): worktree location + per-workflow isolation policy #1310 and feat(paths,workflows): unify ~/.archon/{workflows,commands,scripts} + drop globalSearchPath (closes #1136) #1315 and needs a manual rebase.

The MCP noise filter (point 3) is the piece still worth landing. We'll re-do it on current dev with tests and post the follow-up PR here — you'll be credited via Co-authored-by: on the commit.

@Wirasm Wirasm closed this Apr 21, 2026
Wirasm added a commit that referenced this pull request Apr 22, 2026
…#1327)

* fix(workflows): filter user-plugin MCP noise out of workflow warnings

Before this change, the dag-executor surfaced every entry in the Claude
SDK's "MCP server connection failed: …" system message to the user. That
message includes user-level plugin MCPs inherited from ~/.claude/ (e.g.
`telegram`) that fail to connect in the headless workflow subprocess —
they're non-actionable noise for the workflow author.

Fix:
- Pre-compute the set of workflow-configured MCP server names per node
  by parsing the `mcp:` config file once at the start of
  executeNodeInternal. No caller-facing API change; no duplication of
  the provider's env-var expansion logic (we only need the keys).
- Split the system-message handler: the `MCP server connection failed:`
  path now surfaces only the subset of failing names that match the
  node's configured set; user-plugin failures are debug-logged as
  `dag.mcp_plugin_connection_suppressed`. The `⚠️` branch is unchanged.

Supersedes #1134 (closed as stale — the Windows HOME fix in that PR was
already shipped via #1302, and the claude.ts enabledPlugins change
targeted a file that has since moved into @archon/providers).

Credits @MrFadiAi for identifying and reporting the underlying issue.

Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com>

* fix(workflows): preserve MCP failure status in filtered message + observability

Address review feedback on PR #1327:

- parseMcpFailureServerNames now returns {name, segment} entries so the
  forwarded "MCP server connection failed: ..." message preserves the
  per-server status detail (e.g. "(timeout)", "(disconnected)") that the
  bare-name reconstruction was dropping.
- loadConfiguredMcpServerNames now debug-logs read failures as
  dag.mcp_filter_config_read_failed instead of swallowing them silently.
  A transient EMFILE/EBUSY at filter time would otherwise silently
  reclassify a real workflow-MCP failure as plugin noise.
- Add 4 integration tests through executeDagWorkflow covering the mixed
  workflow/plugin split, all-plugin suppression, no-mcp:-config nodes,
  and the unchanged ⚠️ warning path.
- Drop a WHAT comment above configuredMcpNames and a temporal phrase
  ("before this filter landed") that would rot on merge.
- Document the filtering boundary in guides/mcp-servers.md and add a
  troubleshooting row for users debugging silently-suppressed plugin MCPs.

---------

Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com>
@Wirasm Wirasm mentioned this pull request Apr 22, 2026
prospapledge88 added a commit to prospapledge88/Archon that referenced this pull request May 5, 2026
…h 7 (#12)

* fix(workflows): make archon-adversarial-dev sed replacement macOS-safe (coleam00#1155)

* fix(workflows): make adversarial init sed portable on macOS

* chore: regenerate bundled-defaults after adversarial-dev sed fix

Sync generated bundle with the new temp-file sed pattern in
archon-adversarial-dev.yaml so check:bundled passes and binary
distributions ship the macOS-safe version.

---------

Co-authored-by: laplace young <yangqk12@whu.edu.cn>
Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>

* fix(workflows): filter user-plugin MCP noise out of workflow warnings (coleam00#1327)

* fix(workflows): filter user-plugin MCP noise out of workflow warnings

Before this change, the dag-executor surfaced every entry in the Claude
SDK's "MCP server connection failed: …" system message to the user. That
message includes user-level plugin MCPs inherited from ~/.claude/ (e.g.
`telegram`) that fail to connect in the headless workflow subprocess —
they're non-actionable noise for the workflow author.

Fix:
- Pre-compute the set of workflow-configured MCP server names per node
  by parsing the `mcp:` config file once at the start of
  executeNodeInternal. No caller-facing API change; no duplication of
  the provider's env-var expansion logic (we only need the keys).
- Split the system-message handler: the `MCP server connection failed:`
  path now surfaces only the subset of failing names that match the
  node's configured set; user-plugin failures are debug-logged as
  `dag.mcp_plugin_connection_suppressed`. The `⚠️` branch is unchanged.

Supersedes coleam00#1134 (closed as stale — the Windows HOME fix in that PR was
already shipped via coleam00#1302, and the claude.ts enabledPlugins change
targeted a file that has since moved into @archon/providers).

Credits @MrFadiAi for identifying and reporting the underlying issue.

Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com>

* fix(workflows): preserve MCP failure status in filtered message + observability

Address review feedback on PR coleam00#1327:

- parseMcpFailureServerNames now returns {name, segment} entries so the
  forwarded "MCP server connection failed: ..." message preserves the
  per-server status detail (e.g. "(timeout)", "(disconnected)") that the
  bare-name reconstruction was dropping.
- loadConfiguredMcpServerNames now debug-logs read failures as
  dag.mcp_filter_config_read_failed instead of swallowing them silently.
  A transient EMFILE/EBUSY at filter time would otherwise silently
  reclassify a real workflow-MCP failure as plugin noise.
- Add 4 integration tests through executeDagWorkflow covering the mixed
  workflow/plugin split, all-plugin suppression, no-mcp:-config nodes,
  and the unchanged ⚠️ warning path.
- Drop a WHAT comment above configuredMcpNames and a temporal phrase
  ("before this filter landed") that would rot on merge.
- Document the filtering boundary in guides/mcp-servers.md and add a
  troubleshooting row for users debugging silently-suppressed plugin MCPs.

---------

Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com>

* test(workflows): add anyFailed status derivation coverage for DAG executor (coleam00#1403)

PIV Task 1: Adds three new tests in a dedicated describe block
'executeDagWorkflow -- final status derivation' covering the anyFailed
branch (dag-executor.ts ~line 2956) that previously had no direct test:
- one success + one independent failure calls failWorkflowRun (not completeWorkflowRun)
- multiple successes + one failure calls failWorkflowRun (not completeWorkflowRun)
- trigger_rule: none_failed skips dependent node but anyFailed still marks run failed

Fixes coleam00#1381.

* fix(workflow): migrate piv-loop plan handoff to $ARTIFACTS_DIR (coleam00#1398)

* fix(workflow): migrate piv-loop plan handoff to $ARTIFACTS_DIR (coleam00#1380)

The create-plan node used a relative path (.claude/archon/plans/{slug}.plan.md)
that the AI agent would sometimes write to a different location, breaking all
downstream nodes that glob for the plan file. Migrated all plan/progress file
references to $ARTIFACTS_DIR/plan.md and $ARTIFACTS_DIR/progress.txt, matching
the pattern used by archon-fix-github-issue and other workflows.

Changes:
- Replace slug-based plan path with $ARTIFACTS_DIR/plan.md in create-plan node
- Replace ls -t glob discovery with direct $ARTIFACTS_DIR/plan.md reads in
  refine-plan, code-review, and fix-feedback nodes
- Replace empty-string guard with file-existence check in implement-setup bash
- Migrate progress.txt references in implement loop to $ARTIFACTS_DIR/
- Add explicit plan/progress paths in finalize node
- Regenerated bundled-defaults.generated.ts

Fixes coleam00#1380

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(workflow): address review findings in archon-piv-loop

- Rename 'Step 2: Write the Plan' to 'Step 2: Plan File Location' to
  eliminate the duplicate heading that collided with Step 3's identical
  title in the create-plan node
- Guard implement-setup against a 0-task plan file: exit 1 with a
  clear error when no '### Task N:' sections are found, preventing a
  silent no-op implement loop
- Remove 2>/dev/null from code-review commit so pre-commit hook failures
  and other stderr are visible to the agent instead of silently swallowed
- Replace '|| true' on git push in finalize with an explicit WARNING echo
  so push failures (auth, upstream conflict, no remote) surface to the
  agent rather than being silently ignored
- Regenerate bundled-defaults.generated.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(workflows): regenerate bundled defaults to match opus[1m] alias

The bundle was stale relative to the YAML sources after coleam00#1395 merged —
check:bundled was failing CI. Regenerated; no YAML edits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore(workflows): switch default Opus pin to opus[1m] alias (coleam00#1395)

Anthropic's Opus 4.7 landed 2026-04-16; on the Anthropic API, opus /
opus[1m] now resolve to 4.7 with a 1M context window at standard
pricing. Using the alias instead of the hard-pinned claude-opus-4-6[1m]
lets bundled default workflows auto-track the recommended Opus version.

No explicit effort is set, so nodes inherit the per-model default
(xhigh on 4.7, high on 4.6).

* fix(workflows): approval gate bypass after reject-with-redraft on resume (coleam00#1435)

* fix(workflows): approval gate bypass after reject-with-redraft on resume

When an approval node was rejected with on_reject.prompt, the synthetic
PromptNode built to run the on_reject prompt reused the approval gate's
own node ID. executeNodeInternal then wrote a node_completed event with
that ID, causing getCompletedDagNodeOutputs to treat the gate as already
completed on the next resume — bypassing the human gate entirely.

Fix: give the synthetic node the ID `${node.id}:on_reject` so its
node_completed event has a distinct step_name that won't match the
approval gate slot in priorCompletedNodes.

Adds a regression test asserting no node_completed event with the
approval gate's ID is written during on_reject execution.

Fixes coleam00#1429

* test(workflows): add positive assertion and SSE side-effect comment for on_reject synthetic node

Add complementary positive assertion to the regression test to verify that
node_completed is written exactly once with step_name 'review:on_reject',
ensuring future refactors that suppress the event entirely would be caught.

Add inline comment in executeApprovalNode documenting the known SSE side-effect:
node_started/node_completed events with nodeId='review:on_reject' flow through
the SSE pipeline into the web UI, resulting in a transient phantom node in the
execution view. This is cosmetic-only — the human gate contract is preserved.

* simplify: reduce duplicate cast pattern in on_reject test assertions

* fix(workflows): concise failure messages for bash/script nodes (coleam00#1389) (coleam00#1393)

* fix(workflows): concise failure messages for bash/script nodes (coleam00#1389)

When a `bash:` or `script:` node fails, the executor was embedding
`err.message` verbatim into the user-visible error. For inline scripts run
via `bash -c <body>` / `bun -e <body>`, Node's `promisify(execFile)` puts
the entire substituted script body into `err.message`, `err.cmd`, and the
first line of `err.stack` — and the Pino log serialized all three, repeating
the body ~3× in one structured log line. The actionable diagnostic was
buried under kilobytes of echoed source.

Route both catch-block default branches through a new
`formatSubprocessFailure(err, label)` helper in `executor-shared.ts` that:

- strips the `Command failed: <cmd>` prefix line (which carries the body)
- prefers `err.stderr` — Bun/bash emit the actionable diagnostic there
- tail-truncates at 2 KB with a `…[truncated]` marker
- returns a controlled `logFields` subset (`exitCode`, `killed`, `stderrTail`)
  so Pino never re-serializes `err.message` / `err.stack` / `err.cmd`

Also drops the script-node `stderrHint` concatenation — stderr is already
handled by the helper, so the previous code appended it twice.

Timeout / ENOENT / EACCES branches are preserved verbatim.

Fixes coleam00#1389

* fix(workflows): address PR review feedback for coleam00#1389

- Run formatSubprocessFailure unconditionally so timeout / ENOENT / EACCES
  branches also get sanitized log fields (the timeout message also embeds
  the `Command failed: bash -c <body>` line).
- Drop `errType: err.constructor.name` (always 'Error' in production) and
  replace with `nodeType: 'bash' | 'script'` for actual discriminating value.
- Replace chained `||` + ternary in diagnostic selection with explicit
  if/else for readability.
- Simplify exit suffix guard: `err.code != null` instead of double typeof.
- Make stderrTail emptiness check explicit.
- Drop `RawSubprocessError` export (no external consumer) and widen `code`
  to `number | string | null` to mirror Node's ExecFileException.
- Tighten test length bound to <2100 (was <4000 / <2200) so a doubling of
  SUBPROCESS_ERROR_MAX_CHARS would actually trip the assertion.
- Replace broad regex assertion with `.toContain('[eval]')` — the location
  marker is the strongest signal that diagnostic content survived.
- Strip issue-number citations from describe block and test comments.
- Update script-nodes.md to distinguish stderr behavior on success vs
  failure paths.
- Add CHANGELOG Unreleased / Fixed entry for the user-visible change.

* chore(workflows): regenerate bundled defaults after Tier 6 picks

* docs: add CHANGELOG entry for Tier 6 workflow polish batch

---------

Co-authored-by: CauchYoung <2024302072042@whu.edu.cn>
Co-authored-by: laplace young <yangqk12@whu.edu.cn>
Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com>
Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com>
Co-authored-by: Cole Medin <cole@dynamous.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: cjnprospa <sirhcle.j23@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants