Skip to content

fix(cli): lazy-import bundled skill so non-setup commands work without source files#1394

Merged
Wirasm merged 1 commit into
devfrom
fix/lazy-bundled-skill-imports
Apr 28, 2026
Merged

fix(cli): lazy-import bundled skill so non-setup commands work without source files#1394
Wirasm merged 1 commit into
devfrom
fix/lazy-bundled-skill-imports

Conversation

@Wirasm

@Wirasm Wirasm commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

bundled-skill.ts does 18 top-level import … with { type: 'text' } statements that resolve at module load — build time for bun build --compile (content embedded in binary), but every archon invocation for bun link linked-source installs. If any of those source files are missing or moved, the CLI cannot start at all, even for commands like archon --help that don't touch the skill.

The skill content is data the binary deploys via archon setup, not data the CLI reads at runtime. The only production consumer of BUNDLED_SKILL_FILES is copyArchonSkill() in setup.ts:1452. Moving the import inside that function makes the linked-source install robust (only archon setup triggers the bundled-skill module load) while preserving compiled-binary behavior (Bun's bundler statically analyses literal-string import() and embeds the chunk).

Verification

  • bun run type-check — clean across all 10 packages
  • bun run lint, bun run format:check — clean
  • bun test packages/cli/src/commands/setup.test.ts — 38 pass, 1 skip, 0 fail
  • bun build --compile --minify --target=bun — succeeds; the compiled binary still embeds the skill content (verified by grepping a known SKILL.md frontmatter string out of the 69M binary — found 1 match)
  • Compiled binary smoke: --help runs cleanly, no module-load crash

Repro the original bug

With a bun link install of archon:

mv .claude/skills/archon /tmp/  # simulate accidental move/delete
archon --help
# error: Cannot find module '../../../.claude/skills/archon/SKILL.md'
#   from '<repo>/packages/cli/src/bundled-skill.ts'
mv /tmp/archon .claude/skills/  # restore

After this fix, archon --help runs without the source files. archon setup still requires them (or the embedded chunk in compiled mode).

Adjacent

Found while debugging a parallel agent's failing archon invocation during workshop prep. Same family as the four other DX issues filed today:

This one is the cleanest — concrete bug, two-file diff, no API/UX change for users.

Test plan

  • Verify CI passes the full validate suite
  • Manual smoke on a bun link install: temporarily move .claude/skills/archon, confirm archon --help and archon workflow list still work
  • Manual smoke: archon setup against a temp dir, confirm skill files written

Summary by CodeRabbit

  • Refactor

    • Enhanced the skill setup operation to handle asynchronous operations more effectively, with improved error handling integration during the installation process.
  • Tests

    • Updated test suite to verify proper asynchronous behavior in skill installation operations.

… crash on missing source

The 18 top-level `import … with { type: 'text' }` statements in
`bundled-skill.ts` resolve at module load. For `bun build --compile` that's
build time, so the binary embeds the strings and works regardless of any
on-disk skill files. For `bun link` (linked-source) installs that's every
`archon` invocation — including `archon --help`, which doesn't even use the
skill content. If any of the 18 source files are missing or moved, the
import fails and the CLI cannot start at all.

The skill content is data the binary deploys via `archon setup`, not data
the CLI needs at runtime. There's only one consumer in production code:
`copyArchonSkill()` in `setup.ts`. Moving the import into that function as
a dynamic import preserves the compiled-binary behavior (Bun's bundler
statically analyses literal-string `import()` and embeds the chunk —
verified by grepping the SKILL.md frontmatter out of a freshly compiled
binary) while making the linked-source install resilient: only `archon
setup` triggers the bundled-skill module load now. Verified: a known skill
string appears in the compiled binary 1×, and `archon --help` no longer
needs the source files to start.

`copyArchonSkill()` becomes async because the dynamic import is a Promise.
The single production call site is already in an async function and gets
an `await`. The four `setup.test.ts` cases become async too.
@coderabbitai

coderabbitai Bot commented Apr 24, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 550852eb-38aa-4ecd-872f-88e03ff2ffcd

📥 Commits

Reviewing files that changed from the base of the PR and between 2c15439 and 636541a.

📒 Files selected for processing (2)
  • packages/cli/src/commands/setup.test.ts
  • packages/cli/src/commands/setup.ts

📝 Walkthrough

Walkthrough

The copyArchonSkill function transitions from synchronous to asynchronous, using dynamic imports to load bundled skill files at invocation time rather than module load time. The setupCommand function is updated to await this operation, and all corresponding test cases are converted to use async/await syntax.

Changes

Cohort / File(s) Summary
Setup Function Implementation
packages/cli/src/commands/setup.ts
copyArchonSkill function signature changed from synchronous void to asynchronous Promise<void>. Now performs dynamic import('../bundled-skill') at invocation time. setupCommand updated to await the skill copy operation within the existing error handling flow.
Test Suite Updates
packages/cli/src/commands/setup.test.ts
All test case callbacks converted to async. Each copyArchonSkill(target) invocation replaced with await copyArchonSkill(target). Assertions remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 From sync to async, we hop with grace,
Dynamic imports load at their own pace,
Await the skill, let errors flow free,
Tests now dance in async symphony!
—The Code Rabbit 🐾✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description covers the core problem, solution, and verification, but is structured as free-form narrative rather than following the template's required sections. Reorganize using the template structure: add UX Journey (before/after diagrams), Architecture Diagram, Label Snapshot, Validation Evidence table format, and explicit Human Verification section for clarity.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: converting a top-level import to lazy-loading to fix linked-source CLI robustness.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/lazy-bundled-skill-imports

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Wirasm

Wirasm commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator Author

Hi @Wirasm — thanks for opening this PR.

This repository uses a PR template at .github/pull_request_template.md with several required sections. A few of them appear to be empty or placeholder here:

  • UX Journey
  • Architecture Diagram
  • Label Snapshot
  • Change Metadata
  • Linked Issue
  • Compatibility / Migration
  • Human Verification
  • Side Effects / Blast Radius
  • Rollback Plan
  • Risks and Mitigations

Could you fill those out (even briefly)? The template helps reviewers understand scope, risk, and rollback — it speeds up review significantly.

If a section genuinely doesn't apply, just write "N/A" in it rather than leaving it blank.

@Wirasm

Wirasm commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator Author

Multi-Agent Review Summary

Ran code-reviewer, comment-analyzer, pr-test-analyzer, code-simplifier, and docs-impact on this PR.

Critical Issues

None.

Important Issues

None.

Suggestions

Agent Suggestion Location
code-simplifier JSDoc could be tightened from 5 sentences to 3 without losing information — current text restates the same logical point a few times packages/cli/src/commands/setup.ts:1447-1459
pr-test-analyzer No test asserts the dynamic-import failure path produces a user-actionable message vs. a raw Cannot find module trace. Low priority (6/10) — failure mode is unlikely since the specifier is a literal string the compile step embeds packages/cli/src/commands/setup.ts (around new await import('../bundled-skill'))

Strengths

  • All callers of copyArchonSkill updated to await (one production site at setup.ts:1844, four test sites). No missed call sites in the codebase.
  • Bun's --compile static analysis of literal-string dynamic import() is documented behavior; the JSDoc claim about embedding the chunk is accurate.
  • Test updates are mechanically correct — async/await added in matching pairs, no floating promises.
  • The four existing tests still validate real output (file existence, non-empty content, overwrite behavior, missing-parent creation), which implicitly covers the dynamic-import resolution path on source builds.
  • JSDoc earns its place: dynamic import as a load-deferral strategy is genuinely non-obvious and removing the explanation would invite a refactor back to a static import.

Documentation Issues

None. copyArchonSkill / BUNDLED_SKILL_FILES are not referenced in CLAUDE.md, README.md, or packages/docs-web/. Pure internal change.

Note on a Flagged-Then-Dismissed Finding

The comment-analyzer flagged a "critical" inaccuracy claiming the JSDoc says 18 imports while there are actually 19. Re-counted manually with grep -c: there are 18 text-imports in bundled-skill.ts. The JSDoc is correct; the agent miscounted.

Verdict

READY TO MERGE — small, focused fix; correct shape (async + dynamic import() is the minimal form here, no simpler alternative); no behavior change for users, no API change, no doc impact. Manual smoke per the PR description is the right verification path for this class of fix (install-mode-specific module-load timing).

Recommended Actions

  1. Optional: tighten the JSDoc per the simplification suggestion.
  2. Optional: a regression test that mocks the dynamic import to fail and asserts the surfaced error is useful — but only if you want belt-and-suspenders coverage; not blocking.

@Wirasm Wirasm merged commit 7d06773 into dev Apr 28, 2026
4 checks passed
@Wirasm Wirasm deleted the fix/lazy-bundled-skill-imports branch April 28, 2026 12:20
@Wirasm Wirasm mentioned this pull request Apr 29, 2026
leex279 pushed a commit that referenced this pull request May 1, 2026
Resolves conflict in packages/cli/src/commands/setup.ts: PR extracted
copyArchonSkill into ./skill while dev (#1394) added a lazy-import fix
inline. Kept PR's structure (function lives in skill.ts) and applied
dev's lazy-import pattern there so archon --help no longer crashes when
source skill files are missing on linked-source installs.

- packages/cli/src/commands/skill.ts: dynamic-import bundled-skill;
  copyArchonSkill is now async
- packages/cli/src/commands/setup.ts: await copyArchonSkill at the
  setup-wizard call site
- packages/cli/src/commands/skill.test.ts: await the now-async helper

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
prospapledge88 added a commit to prospapledge88/Archon that referenced this pull request May 5, 2026
* fix(core/test): split connection.test.ts from DB-test batch to avoid mock pollution (coleam00#1269)

messages.test.ts uses mock.module('./connection', ...) at module-load time.
Per CLAUDE.md:131 (Bun issue oven-sh/bun#7823), mock.module() is process-
global and irreversible. When Bun pre-loads all test files in a batch, the
mock shadows the real connection module before connection.test.ts runs,
causing getDatabaseType() to always return the mocked value regardless of
DATABASE_URL.

Move connection.test.ts into its own `bun test` invocation immediately
after postgres.test.ts (which runs alone) and before the big DB/utils/
config/state batch that contains messages.test.ts. This follows the same
isolation pattern already used for command-handler, clone, postgres, and
path-validation tests.

* fix(setup): align PORT default on 3090 across .env.example, wizard, and JSDoc (coleam00#1152) (coleam00#1271)

The server's getPort() fallback changed from 3000 to 3090 in the Hono
migration (coleam00#318), but .env.example, the setup wizard's generated .env,
and the JSDoc describing the fallback were not updated — leaving three
different sources of truth for "the default PORT."

When the wizard writes PORT=3000 to ~/.archon/.env (which the Hono
server loads with override: true, while Vite only reads repo-local
.env), the two processes can land on different ports silently. That
mismatch is the real mechanism behind the failure described in coleam00#1152.

- .env.example: comment out PORT, document 3090 as the default
- packages/cli/src/commands/setup.ts: wizard no longer writes PORT=3000
  into the generated .env; fix the "Additional Options" note
- packages/cli/src/commands/setup.test.ts: assert no bare PORT= line and
  the commented default is present
- packages/core/src/utils/port-allocation.ts: fix stale JSDoc "default
  3000" -> "default 3090"
- deploy/.env.example: keep Docker default at 3000 (compose/Caddy target
  that) but annotate it so users don't copy it for local dev

Single source of truth for the local-dev default is now basePort in
port-allocation.ts.

* fix(providers/claude): use || instead of ?? in hasExplicitTokens to handle empty-string env vars (coleam00#1028)

Closes coleam00#1027

* chore(deps): bump claude-agent-sdk to 0.2.121, codex-sdk to 0.125.0 (coleam00#1460)

Both SDKs were ~30 patch releases behind. Validation suite passes
(type-check, lint, format, tests across all 10 packages) without code
changes. The only sustained Claude SDK behavior change in the range —
v0.2.111's options.env overlay/replace flap, since reverted to overlay —
is a no-op for Archon, which already passes { ...process.env } as the
SDK env.

* fix(cli): lazy-import bundled skill files so non-setup commands don't crash on missing source (coleam00#1394)

The 18 top-level `import … with { type: 'text' }` statements in
`bundled-skill.ts` resolve at module load. For `bun build --compile` that's
build time, so the binary embeds the strings and works regardless of any
on-disk skill files. For `bun link` (linked-source) installs that's every
`archon` invocation — including `archon --help`, which doesn't even use the
skill content. If any of the 18 source files are missing or moved, the
import fails and the CLI cannot start at all.

The skill content is data the binary deploys via `archon setup`, not data
the CLI needs at runtime. There's only one consumer in production code:
`copyArchonSkill()` in `setup.ts`. Moving the import into that function as
a dynamic import preserves the compiled-binary behavior (Bun's bundler
statically analyses literal-string `import()` and embeds the chunk —
verified by grepping the SKILL.md frontmatter out of a freshly compiled
binary) while making the linked-source install resilient: only `archon
setup` triggers the bundled-skill module load now. Verified: a known skill
string appears in the compiled binary 1×, and `archon --help` no longer
needs the source files to start.

`copyArchonSkill()` becomes async because the dynamic import is a Promise.
The single production call site is already in an async function and gets
an `await`. The four `setup.test.ts` cases become async too.

* fix(claude): stop passing --no-env-file to native binary in dev mode (coleam00#1461)

* fix(claude): stop passing --no-env-file to native binary in dev mode

The Claude Agent SDK switched from shipping `cli.js` inside the package
to per-platform native binaries via optional deps somewhere in the
0.2.x series. As of 0.2.121 there is no `cli.js` in the SDK package;
dev mode resolves to `@anthropic-ai/claude-agent-sdk-darwin-arm64/claude`
(Mach-O). That native binary rejects `--no-env-file` with
`error: unknown option '--no-env-file'` and the subprocess exits 1.

`shouldPassNoEnvFile` was returning true on `cliPath === undefined` on
the assumption that "dev mode = JS executable run via Bun". That
assumption is dead. Tighten the predicate to only return true on an
explicit `.js` suffix, so we only emit the flag when the SDK is going
to spawn a Bun-runnable script.

CWD `.env` leak protection is unaffected. `stripCwdEnv()` in
`@archon/paths` (coleam00#1067) deletes Bun-auto-loaded `.env`/`.env.local`/
`.env.development`/`.env.production` keys from `process.env` at every
Archon entry point before any subprocess is spawned. The native Claude
binary does not auto-load `.env` from its cwd either. `--no-env-file`
was belt-and-suspenders for the JS-via-Bun case only.

Verified end-to-end with a sentinel: added a unique
`ARCHON_LEAK_SENTINEL_$$` to Archon's `.env`, ran e2e-claude-smoke
with a bash probe checking the subprocess env. stderr shows
`[archon] stripped 23 keys from /Users/rasmus/Projects/cole/Archon
(.env, .env.local)` — sentinel was deleted. Bash node prints
`PASS: simple='4', no sentinel leak`. Workflow completes cleanly,
no `--no-env-file` rejection from the SDK binary.

bun run validate: green across all 10 packages.

* fix(claude): address review on coleam00#1461 (stale docs + test gaps)

Critical: file-level JSDoc at provider.ts:18 still claimed dev mode
resolves cli.js. Updated to reflect SDK 0.2.x's switch to per-platform
native binaries.

Important: security.md still listed --no-env-file as item 2 of
target-repo .env isolation. Scoped that bullet to legacy
Bun-runnable JS entry points and called out that native binaries
don't auto-load .env from cwd. Added an Unreleased Fixed entry to
CHANGELOG.md. Updated binary-resolver.ts JSDoc title that referenced
cli.js.

Polish: widened the predicate to accept .mjs and .cjs (also
Bun-runnable JS — matches the SDK's own internal extension list).
Dropped the redundant `passesNoEnvFile` log field that mirrored
`isJsExecutable`. Added unit cases for .mjs/.cjs (now true) and
.ts/.tsx/.jsx (deliberately false — never SDK entry points).

Added an integration test that mocks resolveClaudeBinaryPath to
return a .js path and asserts executableArgs: ['--no-env-file']
flows through buildBaseClaudeOptions all the way to the SDK call —
catches future regressions in the conditional spread.

bun run validate: green across all 10 packages.

* fix(orchestrator): clear stale session ID on error_during_execution to prevent infinite failure loop (coleam00#1294)

* fix(orchestrator): clear stale session ID on error_during_execution to prevent infinite failure loop

When a Claude API session expires (e.g. after container restart), the orchestrator
persists the new (failed) session ID from the error result, causing every subsequent
message in that conversation to hit the same error — an infinite failure loop.

Fix: on error_during_execution result, set assistant_session_id to NULL instead of
persisting the failed session ID. The next message starts a fresh session with full
context rebuilt from the DB. Conversation history is unaffected since it lives in
remote_agent_messages, independent of the Claude session.

Changes:
- updateSession() and tryPersistSessionId() now accept string | null
- Both handleStreamMode and handleBatchMode clear session ID on error_during_execution

Fixes coleam00#1280

* test(orchestrator): add stale session clearing tests + address review feedback

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>

---------

Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>

* fix(claude): honor CLAUDE_BIN_PATH in dev mode for libc-mismatch hosts (coleam00#1481)

* fix(claude): honor CLAUDE_BIN_PATH in dev mode for libc-mismatch hosts

The Claude Agent SDK auto-resolves its bundled native binary in
[linux-x64-musl, linux-x64] order. On glibc Linux hosts (Ubuntu/Debian/
Fedora), Bun installs both via optionalDependencies and the musl variant
is picked first; its ELF interpreter (/lib/ld-musl-x86_64.so.1) does not
exist on glibc, so spawn fails and the SDK reports a misleading "binary
not found" — the file is on disk, the loader is not.

The documented escape hatch CLAUDE_BIN_PATH was dead code in dev mode:
the resolver early-returned undefined when BUNDLED_IS_BINARY=false before
ever reading the env var. The only workaround was patching node_modules.

Move the env-var block above the BUNDLED_IS_BINARY return. Config-file
path stays binary-mode-only — it's per-repo, not per-machine; env is the
right knob for libc mismatches.

Behavior preserved:
- env unset                  → unchanged (undefined in dev, autodetect/throw in binary)
- env set + file exists      → resolved (was binary-only; now also dev)
- env set + file missing     → clear error (was binary-only; now also dev)

Closes coleam00#1474

* chore(claude): address CodeRabbit review on coleam00#1481

- CHANGELOG entry under [Unreleased] / Fixed describing the dev-mode
  CLAUDE_BIN_PATH escape hatch (previously ignored). Notes that
  config-file path remains binary-mode-only and that env-loading +
  target-repo .env isolation are unchanged downstream.
- Empty-string test pinning that CLAUDE_BIN_PATH='' falls through
  to undefined rather than throwing — protects against a future
  predicate typo that would treat empty as "set".
- One-line note in ai-assistants.md "Binary path configuration"
  section pointing dev-mode users at the env-var override for the
  glibc/musl mismatch case.

Skipped from the review:
- The other two docs-page rewrites (configuration.md /
  troubleshooting.md): the error message itself names CLAUDE_BIN_PATH,
  and coleam00#1474 documents the use case publicly. One mention in
  ai-assistants.md is enough for discovery.
- Type-style consistency tweaks in the test file: pure bikeshed.

* fix(deps): bump hono to ^4.12.16 and @hono/node-server to ^1.19.13 (closes coleam00#1484) (coleam00#1499)

* fix(orchestrator): create ~/.archon/workspaces before AI provider spawn (coleam00#1529)

* fix(orchestrator): create ~/.archon/workspaces before AI provider spawn

On a fresh install, ~/.archon/workspaces doesn't exist yet. The
orchestrator passes that path as cwd to the AI provider, which calls
spawn() — which raises ENOENT. The error is then misclassified as
"binary not found" in the friendly-error path, surfacing as an
incorrect "Claude binary not found" message.

Add ensureArchonWorkspacesPath() in @archon/paths that mkdir -p's the
directory and returns the path. Use it at the orchestrator's spawn-cwd
site so the directory is guaranteed to exist before spawn().

Other call sites of getArchonWorkspacesPath() (workflow discovery,
path-prefix comparisons) only consume the path string and don't need
the directory to exist; they keep using the pure getter.

Closes coleam00#1528

* test(orchestrator): assert ensureArchonWorkspacesPath is called

Capture the @archon/paths mock as a named variable and assert it was
called in the syncWorkspace handleMessage path. Without this, the test
suite passes even if orchestrator-agent.ts:824 reverts to the
non-ensuring getArchonWorkspacesPath() variant — exactly the regression
that surfaced as 'Claude Code native binary not found' in coleam00#1528.

* docs(changelog): add Tier 1 batch 2 cherry-pick entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com>
Co-authored-by: DIY Smart Code <thomas@thirty3.de>
Co-authored-by: Cocoon-Break <54054995+kuishou68@users.noreply.github.com>
Co-authored-by: Kagura <kagura.agent.ai@gmail.com>
Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>
Co-authored-by: Yasser <116118149+YrFnS@users.noreply.github.com>
Co-authored-by: Truffle <truffleagent@gmail.com>
Co-authored-by: cjnprospa <sirhcle.j23@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant