fix: Windows HOME resolution and MCP plugin warning suppression#1134
fix: Windows HOME resolution and MCP plugin warning suppression#1134MrFadiAi wants to merge 1 commit into
Conversation
- Use getArchonHome() instead of process.env.HOME for cross-platform config path resolution in CLI and server entry points - Suppress non-fatal MCP connection failures from user-level plugins (e.g., telegram) in headless workflow subprocesses - Only surface MCP failures for workflow-configured servers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughUpdates env file path resolution across CLI, server, and tests to use a centralized Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/cli/src/cli.test.ts (1)
241-246:⚠️ Potential issue | 🟠 MajorTest is environment-dependent and can be flaky.
This path now targets the real user
~/.archon/.envwhen present, so test outcomes can vary by machine/user file contents. Prefer a temp.envfixture to keep this deterministic.Proposed deterministic test adjustment
it('should allow ~/.archon/.env to override Bun-auto-loaded vars via override:true', async () => { const { config } = await import('dotenv'); - const { resolve } = await import('path'); - const { existsSync } = await import('fs'); + const { resolve, join } = await import('path'); + const { mkdtemp, writeFile, rm } = await import('fs/promises'); + const { tmpdir } = await import('os'); // Simulate Bun auto-loading a stale value process.env.TEST_ARCHON_OVERRIDE = 'from-cwd-env'; - // Write a temporary env content and load with override - const globalEnvPath = resolve(getArchonHome(), '.env'); - if (existsSync(globalEnvPath)) { - const result = config({ path: globalEnvPath, override: true }); - // If ~/.archon/.env exists and has DATABASE_URL, it should override - expect(result.error).toBeUndefined(); - } + const tempDir = await mkdtemp(join(tmpdir(), 'archon-cli-test-')); + const testEnvPath = resolve(tempDir, '.env'); + await writeFile(testEnvPath, 'TEST_ARCHON_OVERRIDE=from-archon-env\n', 'utf8'); + const result = config({ path: testEnvPath, override: true }); + expect(result.error).toBeUndefined(); + expect(process.env.TEST_ARCHON_OVERRIDE).toBe('from-archon-env'); + await rm(tempDir, { recursive: true, force: true }); // Clean up delete process.env.TEST_ARCHON_OVERRIDE; });As per coding guidelines: "Keep tests deterministic — no flaky timing or network dependence without guardrails; ensure local validation commands (
bun run validate) map directly to CI expectations (Determinism + Reproducibility)".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cli/src/cli.test.ts` around lines 241 - 246, The test reads the real user home (~/.archon/.env) via getArchonHome() and becomes flaky; instead create a temporary directory and a temp .env fixture, write the expected DATABASE_URL into that file, mock or override getArchonHome() to return the temp dir (or set the env value used by getArchonHome()), then call resolve(getArchonHome(), '.env') and load with config({ path: ..., override: true }) and assert result.error is undefined; finally clean up the temp file/dir. Target symbols: globalEnvPath, getArchonHome, resolve, existsSync, and config.
🧹 Nitpick comments (2)
packages/core/src/providers/claude.ts (1)
388-391: Scope plugin suppression to workflow/headless calls only.This hard-codes plugin disabling for every
ClaudeProvidercall. If this provider is reused outside DAG/headless runs, it can unintentionally disable user-intended plugins there too. Prefer a typedAgentRequestOptionsflag (for example,disableUserPlugins?: boolean) and applysettings.enabledPluginsonly when that flag is set by workflow execution paths.Proposed direction
- settings: { enabledPlugins: {} }, + ...(requestOptions?.disableUserPlugins === true + ? { settings: { enabledPlugins: {} } } + : {}),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/providers/claude.ts` around lines 388 - 391, The provider currently unconditionally injects settings.enabledPlugins into every ClaudeProvider call; change this to a conditional based on a new optional boolean flag on the request type (e.g., add disableUserPlugins?: boolean to AgentRequestOptions) and only set settings: { enabledPlugins: {} } on the outgoing request when AgentRequestOptions.disableUserPlugins === true (default false). Update the types for AgentRequestOptions, the call site(s) that create requests for workflow/headless runs to set disableUserPlugins: true, and the ClaudeProvider request-building logic (referencing ClaudeProvider and settings.enabledPlugins) so normal non-workflow calls keep user plugins enabled.packages/workflows/src/dag-executor.ts (1)
1037-1071: Minor duplication in filtering logic.The code filters
failedEntriestwice with the samesplit(' (')[0]operation. While this is readable and the arrays are small, a single-pass partition would be slightly more efficient.♻️ Optional: single-pass partition
- const workflowFailures = failedEntries.filter(entry => { - const name = entry.split(' (')[0]; - return workflowMcpNames.has(name); - }); - - if (workflowFailures.length > 0) { + const workflowFailures: string[] = []; + const pluginFailures: string[] = []; + for (const entry of failedEntries) { + const name = entry.split(' (')[0]; + if (workflowMcpNames.has(name)) { + workflowFailures.push(entry); + } else { + pluginFailures.push(entry); + } + } + + if (workflowFailures.length > 0) { const workflowMsg = `MCP server connection failed: ${workflowFailures.join(', ')}`; getLog().warn( { nodeId: node.id, mcpStatus: workflowMsg }, 'dag.mcp_server_connection_failed' ); const delivered = await safeSendMessage( platform, conversationId, workflowMsg, nodeContext ); if (!delivered) { getLog().error( { nodeId: node.id, mcpStatus: workflowMsg, workflowRunId: workflowRun.id }, 'dag.mcp_connection_failure_delivery_failed' ); } } - // User-plugin MCP failures: log at debug, don't surface to user - const pluginFailures = failedEntries.filter(entry => { - const name = entry.split(' (')[0]; - return !workflowMcpNames.has(name); - }); + if (pluginFailures.length > 0) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/workflows/src/dag-executor.ts` around lines 1037 - 1071, The code currently computes workflowFailures and pluginFailures by filtering failedEntries twice using entry.split(' (')[0]; change this to a single-pass partition over failedEntries (e.g., a for loop or Array.reduce) that extracts const name = entry.split(' (')[0] once and pushes the entry into either workflowFailures or pluginFailures based on workflowMcpNames.has(name); keep the subsequent logic that logs workflowFailures, calls safeSendMessage, and logs delivery errors unchanged, only replace the duplicate filtering with the single-pass partition that produces workflowFailures and pluginFailures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/cli/src/cli.test.ts`:
- Around line 241-246: The test reads the real user home (~/.archon/.env) via
getArchonHome() and becomes flaky; instead create a temporary directory and a
temp .env fixture, write the expected DATABASE_URL into that file, mock or
override getArchonHome() to return the temp dir (or set the env value used by
getArchonHome()), then call resolve(getArchonHome(), '.env') and load with
config({ path: ..., override: true }) and assert result.error is undefined;
finally clean up the temp file/dir. Target symbols: globalEnvPath,
getArchonHome, resolve, existsSync, and config.
---
Nitpick comments:
In `@packages/core/src/providers/claude.ts`:
- Around line 388-391: The provider currently unconditionally injects
settings.enabledPlugins into every ClaudeProvider call; change this to a
conditional based on a new optional boolean flag on the request type (e.g., add
disableUserPlugins?: boolean to AgentRequestOptions) and only set settings: {
enabledPlugins: {} } on the outgoing request when
AgentRequestOptions.disableUserPlugins === true (default false). Update the
types for AgentRequestOptions, the call site(s) that create requests for
workflow/headless runs to set disableUserPlugins: true, and the ClaudeProvider
request-building logic (referencing ClaudeProvider and settings.enabledPlugins)
so normal non-workflow calls keep user plugins enabled.
In `@packages/workflows/src/dag-executor.ts`:
- Around line 1037-1071: The code currently computes workflowFailures and
pluginFailures by filtering failedEntries twice using entry.split(' (')[0];
change this to a single-pass partition over failedEntries (e.g., a for loop or
Array.reduce) that extracts const name = entry.split(' (')[0] once and pushes
the entry into either workflowFailures or pluginFailures based on
workflowMcpNames.has(name); keep the subsequent logic that logs
workflowFailures, calls safeSendMessage, and logs delivery errors unchanged,
only replace the duplicate filtering with the single-pass partition that
produces workflowFailures and pluginFailures.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c58d79d8-8e7b-46a7-a670-0e0fe9c298c3
📒 Files selected for processing (5)
packages/cli/src/cli.test.tspackages/cli/src/cli.tspackages/core/src/providers/claude.tspackages/server/src/index.tspackages/workflows/src/dag-executor.ts
|
Thanks for raising this, @MrFadiAi — closing in favor of a fresh fix on current This PR is 9 days stale and all three changes have drifted:
The MCP noise filter (point 3) is the piece still worth landing. We'll re-do it on current |
…#1327) * fix(workflows): filter user-plugin MCP noise out of workflow warnings Before this change, the dag-executor surfaced every entry in the Claude SDK's "MCP server connection failed: …" system message to the user. That message includes user-level plugin MCPs inherited from ~/.claude/ (e.g. `telegram`) that fail to connect in the headless workflow subprocess — they're non-actionable noise for the workflow author. Fix: - Pre-compute the set of workflow-configured MCP server names per node by parsing the `mcp:` config file once at the start of executeNodeInternal. No caller-facing API change; no duplication of the provider's env-var expansion logic (we only need the keys). - Split the system-message handler: the `MCP server connection failed:` path now surfaces only the subset of failing names that match the node's configured set; user-plugin failures are debug-logged as `dag.mcp_plugin_connection_suppressed`. The `⚠️ ` branch is unchanged. Supersedes #1134 (closed as stale — the Windows HOME fix in that PR was already shipped via #1302, and the claude.ts enabledPlugins change targeted a file that has since moved into @archon/providers). Credits @MrFadiAi for identifying and reporting the underlying issue. Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com> * fix(workflows): preserve MCP failure status in filtered message + observability Address review feedback on PR #1327: - parseMcpFailureServerNames now returns {name, segment} entries so the forwarded "MCP server connection failed: ..." message preserves the per-server status detail (e.g. "(timeout)", "(disconnected)") that the bare-name reconstruction was dropping. - loadConfiguredMcpServerNames now debug-logs read failures as dag.mcp_filter_config_read_failed instead of swallowing them silently. A transient EMFILE/EBUSY at filter time would otherwise silently reclassify a real workflow-MCP failure as plugin noise. - Add 4 integration tests through executeDagWorkflow covering the mixed workflow/plugin split, all-plugin suppression, no-mcp:-config nodes, and the unchanged⚠️ warning path. - Drop a WHAT comment above configuredMcpNames and a temporal phrase ("before this filter landed") that would rot on merge. - Document the filtering boundary in guides/mcp-servers.md and add a troubleshooting row for users debugging silently-suppressed plugin MCPs. --------- Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com>
…h 7 (#12) * fix(workflows): make archon-adversarial-dev sed replacement macOS-safe (coleam00#1155) * fix(workflows): make adversarial init sed portable on macOS * chore: regenerate bundled-defaults after adversarial-dev sed fix Sync generated bundle with the new temp-file sed pattern in archon-adversarial-dev.yaml so check:bundled passes and binary distributions ship the macOS-safe version. --------- Co-authored-by: laplace young <yangqk12@whu.edu.cn> Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com> * fix(workflows): filter user-plugin MCP noise out of workflow warnings (coleam00#1327) * fix(workflows): filter user-plugin MCP noise out of workflow warnings Before this change, the dag-executor surfaced every entry in the Claude SDK's "MCP server connection failed: …" system message to the user. That message includes user-level plugin MCPs inherited from ~/.claude/ (e.g. `telegram`) that fail to connect in the headless workflow subprocess — they're non-actionable noise for the workflow author. Fix: - Pre-compute the set of workflow-configured MCP server names per node by parsing the `mcp:` config file once at the start of executeNodeInternal. No caller-facing API change; no duplication of the provider's env-var expansion logic (we only need the keys). - Split the system-message handler: the `MCP server connection failed:` path now surfaces only the subset of failing names that match the node's configured set; user-plugin failures are debug-logged as `dag.mcp_plugin_connection_suppressed`. The `⚠️ ` branch is unchanged. Supersedes coleam00#1134 (closed as stale — the Windows HOME fix in that PR was already shipped via coleam00#1302, and the claude.ts enabledPlugins change targeted a file that has since moved into @archon/providers). Credits @MrFadiAi for identifying and reporting the underlying issue. Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com> * fix(workflows): preserve MCP failure status in filtered message + observability Address review feedback on PR coleam00#1327: - parseMcpFailureServerNames now returns {name, segment} entries so the forwarded "MCP server connection failed: ..." message preserves the per-server status detail (e.g. "(timeout)", "(disconnected)") that the bare-name reconstruction was dropping. - loadConfiguredMcpServerNames now debug-logs read failures as dag.mcp_filter_config_read_failed instead of swallowing them silently. A transient EMFILE/EBUSY at filter time would otherwise silently reclassify a real workflow-MCP failure as plugin noise. - Add 4 integration tests through executeDagWorkflow covering the mixed workflow/plugin split, all-plugin suppression, no-mcp:-config nodes, and the unchanged⚠️ warning path. - Drop a WHAT comment above configuredMcpNames and a temporal phrase ("before this filter landed") that would rot on merge. - Document the filtering boundary in guides/mcp-servers.md and add a troubleshooting row for users debugging silently-suppressed plugin MCPs. --------- Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com> * test(workflows): add anyFailed status derivation coverage for DAG executor (coleam00#1403) PIV Task 1: Adds three new tests in a dedicated describe block 'executeDagWorkflow -- final status derivation' covering the anyFailed branch (dag-executor.ts ~line 2956) that previously had no direct test: - one success + one independent failure calls failWorkflowRun (not completeWorkflowRun) - multiple successes + one failure calls failWorkflowRun (not completeWorkflowRun) - trigger_rule: none_failed skips dependent node but anyFailed still marks run failed Fixes coleam00#1381. * fix(workflow): migrate piv-loop plan handoff to $ARTIFACTS_DIR (coleam00#1398) * fix(workflow): migrate piv-loop plan handoff to $ARTIFACTS_DIR (coleam00#1380) The create-plan node used a relative path (.claude/archon/plans/{slug}.plan.md) that the AI agent would sometimes write to a different location, breaking all downstream nodes that glob for the plan file. Migrated all plan/progress file references to $ARTIFACTS_DIR/plan.md and $ARTIFACTS_DIR/progress.txt, matching the pattern used by archon-fix-github-issue and other workflows. Changes: - Replace slug-based plan path with $ARTIFACTS_DIR/plan.md in create-plan node - Replace ls -t glob discovery with direct $ARTIFACTS_DIR/plan.md reads in refine-plan, code-review, and fix-feedback nodes - Replace empty-string guard with file-existence check in implement-setup bash - Migrate progress.txt references in implement loop to $ARTIFACTS_DIR/ - Add explicit plan/progress paths in finalize node - Regenerated bundled-defaults.generated.ts Fixes coleam00#1380 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(workflow): address review findings in archon-piv-loop - Rename 'Step 2: Write the Plan' to 'Step 2: Plan File Location' to eliminate the duplicate heading that collided with Step 3's identical title in the create-plan node - Guard implement-setup against a 0-task plan file: exit 1 with a clear error when no '### Task N:' sections are found, preventing a silent no-op implement loop - Remove 2>/dev/null from code-review commit so pre-commit hook failures and other stderr are visible to the agent instead of silently swallowed - Replace '|| true' on git push in finalize with an explicit WARNING echo so push failures (auth, upstream conflict, no remote) surface to the agent rather than being silently ignored - Regenerate bundled-defaults.generated.ts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore(workflows): regenerate bundled defaults to match opus[1m] alias The bundle was stale relative to the YAML sources after coleam00#1395 merged — check:bundled was failing CI. Regenerated; no YAML edits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore(workflows): switch default Opus pin to opus[1m] alias (coleam00#1395) Anthropic's Opus 4.7 landed 2026-04-16; on the Anthropic API, opus / opus[1m] now resolve to 4.7 with a 1M context window at standard pricing. Using the alias instead of the hard-pinned claude-opus-4-6[1m] lets bundled default workflows auto-track the recommended Opus version. No explicit effort is set, so nodes inherit the per-model default (xhigh on 4.7, high on 4.6). * fix(workflows): approval gate bypass after reject-with-redraft on resume (coleam00#1435) * fix(workflows): approval gate bypass after reject-with-redraft on resume When an approval node was rejected with on_reject.prompt, the synthetic PromptNode built to run the on_reject prompt reused the approval gate's own node ID. executeNodeInternal then wrote a node_completed event with that ID, causing getCompletedDagNodeOutputs to treat the gate as already completed on the next resume — bypassing the human gate entirely. Fix: give the synthetic node the ID `${node.id}:on_reject` so its node_completed event has a distinct step_name that won't match the approval gate slot in priorCompletedNodes. Adds a regression test asserting no node_completed event with the approval gate's ID is written during on_reject execution. Fixes coleam00#1429 * test(workflows): add positive assertion and SSE side-effect comment for on_reject synthetic node Add complementary positive assertion to the regression test to verify that node_completed is written exactly once with step_name 'review:on_reject', ensuring future refactors that suppress the event entirely would be caught. Add inline comment in executeApprovalNode documenting the known SSE side-effect: node_started/node_completed events with nodeId='review:on_reject' flow through the SSE pipeline into the web UI, resulting in a transient phantom node in the execution view. This is cosmetic-only — the human gate contract is preserved. * simplify: reduce duplicate cast pattern in on_reject test assertions * fix(workflows): concise failure messages for bash/script nodes (coleam00#1389) (coleam00#1393) * fix(workflows): concise failure messages for bash/script nodes (coleam00#1389) When a `bash:` or `script:` node fails, the executor was embedding `err.message` verbatim into the user-visible error. For inline scripts run via `bash -c <body>` / `bun -e <body>`, Node's `promisify(execFile)` puts the entire substituted script body into `err.message`, `err.cmd`, and the first line of `err.stack` — and the Pino log serialized all three, repeating the body ~3× in one structured log line. The actionable diagnostic was buried under kilobytes of echoed source. Route both catch-block default branches through a new `formatSubprocessFailure(err, label)` helper in `executor-shared.ts` that: - strips the `Command failed: <cmd>` prefix line (which carries the body) - prefers `err.stderr` — Bun/bash emit the actionable diagnostic there - tail-truncates at 2 KB with a `…[truncated]` marker - returns a controlled `logFields` subset (`exitCode`, `killed`, `stderrTail`) so Pino never re-serializes `err.message` / `err.stack` / `err.cmd` Also drops the script-node `stderrHint` concatenation — stderr is already handled by the helper, so the previous code appended it twice. Timeout / ENOENT / EACCES branches are preserved verbatim. Fixes coleam00#1389 * fix(workflows): address PR review feedback for coleam00#1389 - Run formatSubprocessFailure unconditionally so timeout / ENOENT / EACCES branches also get sanitized log fields (the timeout message also embeds the `Command failed: bash -c <body>` line). - Drop `errType: err.constructor.name` (always 'Error' in production) and replace with `nodeType: 'bash' | 'script'` for actual discriminating value. - Replace chained `||` + ternary in diagnostic selection with explicit if/else for readability. - Simplify exit suffix guard: `err.code != null` instead of double typeof. - Make stderrTail emptiness check explicit. - Drop `RawSubprocessError` export (no external consumer) and widen `code` to `number | string | null` to mirror Node's ExecFileException. - Tighten test length bound to <2100 (was <4000 / <2200) so a doubling of SUBPROCESS_ERROR_MAX_CHARS would actually trip the assertion. - Replace broad regex assertion with `.toContain('[eval]')` — the location marker is the strongest signal that diagnostic content survived. - Strip issue-number citations from describe block and test comments. - Update script-nodes.md to distinguish stderr behavior on success vs failure paths. - Add CHANGELOG Unreleased / Fixed entry for the user-visible change. * chore(workflows): regenerate bundled defaults after Tier 6 picks * docs: add CHANGELOG entry for Tier 6 workflow polish batch --------- Co-authored-by: CauchYoung <2024302072042@whu.edu.cn> Co-authored-by: laplace young <yangqk12@whu.edu.cn> Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com> Co-authored-by: Cole Medin <cole@dynamous.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: cjnprospa <sirhcle.j23@gmail.com>
Summary
HOMEenvironment variable is unset, causing CLI and server to fail loading~/.archon/.envconfigurationChanges
Windows HOME Resolution (
@archon/cli,@archon/server)process.env.HOMEwithgetArchonHome()from@archon/pathsfor cross-platform config path resolution in both CLI and server entry pointsHOMEis often unset —getArchonHome()correctly falls back toUSERPROFILEorHOMEDRIVE+HOMEPATHMCP Plugin Warning Suppression (
@archon/workflows,@archon/core)mcp:in workflow YAMLdebuglevel instead ofwarnand not sent to the usersettings: { enabledPlugins: {} }to Claude Agent SDK subprocess to attempt disabling user pluginsTest plan
HOMEis unset — CLI and server now correctly load~/.archon/.envarchon workflow run archon-assist "Say hello"completes without telegram MCP warning🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
New Features