Summary
archon workflow run <name> from any CLI invocation silently hangs forever at dag_node_started. No error, no logs after that line, no child process, no network activity. Reproduced on both Linux x64 (Ubuntu 24.04, v0.3.5 via direct binary) and macOS arm64 (26.2 Tahoe, v0.3.5 via direct binary).
Root Cause
The CLI uses dotenv to load environment variables, and dotenv loads .env from the process CWD, not from ~/.archon/.env. When a user runs archon workflow run from inside a project directory (the normal case — Archon's whole design is about per-codebase workflows), dotenv picks up the project's .env file instead of ~/.archon/.env.
Any auth configuration in ~/.archon/.env — including CLAUDE_USE_GLOBAL_AUTH=false and explicit CLAUDE_CODE_OAUTH_TOKEN / CLAUDE_API_KEY — is silently ignored. Archon then falls back to the global auth path which tries to spawn a claude CLI subprocess, and that subprocess deadlocks without ever producing an event.
Reproduction
# Fresh install on Mac
curl -L -o ~/.local/bin/archon https://github.com/coleam00/Archon/releases/download/v0.3.5/archon-darwin-arm64
chmod +x ~/.local/bin/archon
archon setup # choose Claude + global auth
# Configure explicit auth (should override global)
echo 'CLAUDE_USE_GLOBAL_AUTH=false' >> ~/.archon/.env
echo 'CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-...' >> ~/.archon/.env
# Run from a project dir (the normal case)
cd ~/some-git-repo
archon workflow run archon-assist 'Reply with OK' --no-worktree
Expected: Archon uses the explicit OAuth token and returns a reply within 30-60 seconds.
Actual: Workflow logs workflow_starting → dag_node_started → [archon-assist] Started → silence. After 30 minutes, dag_node_idle_timeout_reached fires. The server log shows authMode:"global" (not explicit), proving the CLAUDE_USE_GLOBAL_AUTH=false override was never loaded.
Diagnostic Evidence
Run archon workflow run twice — once from $HOME, once from a project directory. Check ~/.archon/archon.db events:
SELECT event_type, step_name, data FROM remote_agent_workflow_events
WHERE workflow_run_id = '<run_id>'
ORDER BY created_at;
Both runs produce only two events: workflow_started and node_started. No node_completed, no message_received, no error. The Bun process sits in S (sleeping) state with zero children, zero network sockets, and ~0.4s CPU time after 60+ seconds of wall time.
Workaround
Use archon serve + the HTTP API instead:
# Run from $HOME so dotenv loads the right file
cd ~
nohup archon serve --port 3090 &
# Use the HTTP API for everything
curl -X POST http://localhost:3090/api/conversations -d '{}' -H 'Content-Type: application/json'
curl -X POST http://localhost:3090/api/conversations/<conv_id>/message \
-d '{"message":"test"}' -H 'Content-Type: application/json'
archon serve reads ~/.archon/.env correctly (we can see [dotenv] injecting env (12) from .archon/.env in its log) and the log shows authMode:"explicit" with the OAuth token working.
Fix Suggestions
- CLI should load
~/.archon/.env explicitly, not rely on dotenv's CWD-based loading. Or load both and merge with ~/.archon/.env taking precedence.
- Log which
.env file was loaded so this failure mode is diagnosable: dotenv_loaded_from: /home/user/some-repo/.env would have saved us several hours.
- Warn when
CLAUDE_USE_GLOBAL_AUTH=false is set but neither CLAUDE_CODE_OAUTH_TOKEN nor CLAUDE_API_KEY is present — currently falls back to global silently.
- Detect
CLAUDECODE=1 in process env and either warn or auto-switch to explicit-token mode to avoid the nested Claude Code subprocess deadlock.
Environment
- Archon: v0.3.5 (both darwin-arm64 and linux-x64)
- Claude Code: 2.1.83 (Mac) and 2.1.100 (Linux)
- Node: Bun (bundled into Archon binary)
- OS: macOS 26.2 Tahoe arm64, Ubuntu 24.04.4 LTS x64
Happy to provide more logs or DB dumps on request.
Related issue #2
Title: v0.3.5: archon serve hardcodes skipPlatformAdapters:true — Telegram/Discord/Slack adapters are unreachable
The `cli.serve` command in v0.3.5 invokes `startServer({skipPlatformAdapters: !0, ...})` unconditionally (decompiled from the bundled binary). Every supported platform adapter is gated on `!e.skipPlatformAdapters` at startup, so none of them can ever start via `archon serve` regardless of what's in `~/.archon/.env`.
The setup wizard happily prompts for Telegram / Discord / Slack / Gitea / GitLab tokens and writes them to `~/.archon/.env`, but nothing in the serve code path reads them at runtime. Users expecting to run an Archon bot will silently see `"activePlatforms":["Web"]` and no platform events.
**Suggested fix:** Add a `--with-platforms` CLI flag (or an `archonConfig.enable_platform_adapters: true` setting) that passes `skipPlatformAdapters: false` into `startServer`. Alternatively, make it the default and add `--web-only` as the opt-out for users who specifically want a web-UI-only install.
**Current workaround:** None. Route platform traffic through a separate app that calls Archon's HTTP API. We're using (our Telegram bot) with three custom tools (`archon_workflow_run`, `_status`, `_results`) that talk to Archon over HTTP instead of using the native Telegram adapter.
Summary
archon workflow run <name>from any CLI invocation silently hangs forever atdag_node_started. No error, no logs after that line, no child process, no network activity. Reproduced on both Linux x64 (Ubuntu 24.04, v0.3.5 via direct binary) and macOS arm64 (26.2 Tahoe, v0.3.5 via direct binary).Root Cause
The CLI uses dotenv to load environment variables, and dotenv loads
.envfrom the process CWD, not from~/.archon/.env. When a user runsarchon workflow runfrom inside a project directory (the normal case — Archon's whole design is about per-codebase workflows), dotenv picks up the project's.envfile instead of~/.archon/.env.Any auth configuration in
~/.archon/.env— includingCLAUDE_USE_GLOBAL_AUTH=falseand explicitCLAUDE_CODE_OAUTH_TOKEN/CLAUDE_API_KEY— is silently ignored. Archon then falls back to the global auth path which tries to spawn aclaudeCLI subprocess, and that subprocess deadlocks without ever producing an event.Reproduction
Expected: Archon uses the explicit OAuth token and returns a reply within 30-60 seconds.
Actual: Workflow logs
workflow_starting→dag_node_started→[archon-assist] Started→ silence. After 30 minutes,dag_node_idle_timeout_reachedfires. The server log showsauthMode:"global"(notexplicit), proving theCLAUDE_USE_GLOBAL_AUTH=falseoverride was never loaded.Diagnostic Evidence
Run
archon workflow runtwice — once from$HOME, once from a project directory. Check~/.archon/archon.dbevents:Both runs produce only two events:
workflow_startedandnode_started. Nonode_completed, nomessage_received, no error. The Bun process sits inS (sleeping)state with zero children, zero network sockets, and ~0.4s CPU time after 60+ seconds of wall time.Workaround
Use
archon serve+ the HTTP API instead:archon servereads~/.archon/.envcorrectly (we can see[dotenv] injecting env (12) from .archon/.envin its log) and the log showsauthMode:"explicit"with the OAuth token working.Fix Suggestions
~/.archon/.envexplicitly, not rely on dotenv's CWD-based loading. Or load both and merge with~/.archon/.envtaking precedence..envfile was loaded so this failure mode is diagnosable:dotenv_loaded_from: /home/user/some-repo/.envwould have saved us several hours.CLAUDE_USE_GLOBAL_AUTH=falseis set but neitherCLAUDE_CODE_OAUTH_TOKENnorCLAUDE_API_KEYis present — currently falls back to global silently.CLAUDECODE=1in process env and either warn or auto-switch to explicit-token mode to avoid the nested Claude Code subprocess deadlock.Environment
Happy to provide more logs or DB dumps on request.
Related issue #2
Title: v0.3.5: archon serve hardcodes skipPlatformAdapters:true — Telegram/Discord/Slack adapters are unreachable