Conversation
…be (#1359) The pre-flight binary smoke does a bare `bun build --compile` — it deliberately skips `scripts/build-binaries.sh` to stay fast. That means packages/paths/src/bundled-build.ts retains its dev defaults, including BUNDLED_IS_BINARY = false. version.ts branches on BUNDLED_IS_BINARY: when true it returns the embedded string; when false it calls getDevVersion(), which reads package.json at `SCRIPT_DIR/../../../../package.json`. Inside a compiled binary SCRIPT_DIR resolves under `$bunfs/root/`, the walk produces a CWD- relative path that doesn't exist, and the smoke aborts with "Failed to read version: package.json not found" — a false positive. Hit during the 0.3.8 release attempt: the real Pi lazy-load fix was working end-to-end; the smoke test was the only thing failing. Use --help instead. It exercises the same module-init graph (so it still catches the real failure modes the skill lists — Pi package.json init crash, Bun --bytecode bugs, CJS wrapper issues, circular imports under minify) but has no dev/binary branch, so no false positive. Also add a longer comment block explaining why --help is preferred, so this doesn't get "normalized" back to `version` by a future drive-by.
The brew path of /test-release runs `brew uninstall` in Phase 5 to leave the system in its pre-test state. For operators using the dual-homebrew pattern (renamed brew binary at `/opt/homebrew/bin/archon-stable` so it coexists with a `bun link` dev `archon`), that uninstall wipes the Cellar dir the `archon-stable` symlink points into → `archon-stable` becomes dangling → `brew cleanup` sweeps it away on the next brew op. Next time the operator wants stable, they have to manually re-run `brew-upgrade-archon`. Fix: make the skill aware of `archon-stable` and restore it transparently. - Phase 2 item 4: detect the `archon-stable` symlink before any brew op; export `ARCHON_STABLE_WAS_INSTALLED=yes` so Phase 5 knows to restore it. Only triggers for the brew path (curl-mac/curl-vps don't touch brew so they leave `archon-stable` alone). - Phase 5 brew path: after `brew uninstall + untap`, if the flag was set, re-tap + re-install + rename. Verifies the restored `archon-stable` reports a version and warns (non-fatal) if the rename target is missing. Documents the tradeoff: the restored version is "whatever the tap ships today", not necessarily the pre-test version — usually that's what the operator wants (the release they just tested becomes stable) but the back-version-QA case requires a manual `brew-upgrade-archon` after. - Phase 1 confirmation banner now mentions that `archon-stable` will be preserved so the operator isn't surprised by the reinstall during Phase 5. No changes to curl-mac/curl-vps paths. No changes to Phase 4 test suite.
… a compiled binary (#1360) v0.3.9 made Pi boot-safe: lazy-loading its imports meant `archon version` no longer crashed on `@mariozechner/pi-coding-agent/dist/config.js`'s module-init `readFileSync(getPackageJsonPath())`. That's what the `provider-lazy-load.test.ts` regression test guards. The fix was only half the problem though. When a Pi workflow actually runs, sendQuery() triggers the dynamic import — and Pi's config.js module-init fires then, hitting the exact same ENOENT on `dirname(process.execPath)/package.json`. Discovered by running `archon workflow run test-pi` against a locally-compiled 0.3.9 binary: [main] Failed: ENOENT: no such file or directory, open '/private/tmp/package.json' at readFileSync (unknown) at <anonymous> (/$bunfs/root/archon-providertest:184:7889) at init_config Boot-safe ≠ runtime-safe. The `/test-release` run for 0.3.9 passed because it only exercised `archon-assist` (Claude); Pi was never actually invoked on the released binary. Fix: before the dynamic `import('@mariozechner/pi-coding-agent')` in sendQuery, install a PI_PACKAGE_DIR shim. Pi's config.js checks `process.env.PI_PACKAGE_DIR` first in its `getPackageDir()` and short-circuits the `dirname(process.execPath)` walk. We write a minimal `{name, version, piConfig:{}}` stub to `tmpdir()/archon-pi-shim/package.json` (idempotent — existsSync check) and set the env var. Pi only reads `piConfig.name`, `piConfig.configDir`, and `version` from that file, all optional, so the stub surface is genuinely minimal. Localized to PiProvider: no global state, no mutation of any shared config, no upstream fork. Claude and Codex providers are unaffected (their SDKs don't have this class of module-init side effect). Verified end-to-end: built a compiled archon binary with this patch, ran `archon workflow run test-pi --no-worktree` (Pi workflow with model `anthropic/claude-haiku-4-5`), got a clean response. Before the patch, same binary crashed at `dag_node_started` with the ENOENT above. Regression test added: asserts `PI_PACKAGE_DIR` is set after sendQuery hits even its fast-fail "no model" path. Together with the existing `provider-lazy-load.test.ts` (boot-safe) this covers both halves.
… and Codex (#1361) Both binary resolvers previously stopped at env-var + explicit config and threw a "not found" error when neither was set. Users who followed the upstream-recommended install flow (Anthropic's `curl install.sh` for Claude, `npm install -g @openai/codex`) still had to manually set either `CLAUDE_BIN_PATH` / `CODEX_BIN_PATH` or the corresponding config field before any workflow could run. Add a tier-N autodetect step between the explicit config tier and the install-instructions throw. Purely additive: env and config still win when set (precedence covered by new tests). On autodetect miss, the same install-instructions error fires as before. Claude probe list (verified against docs.claude.com "Uninstall Claude Code → Native installation" section): - $HOME/.local/bin/claude (mac/linux native installer) - $USERPROFILE\.local\bin\claude.exe (Windows native installer) Codex probe list (verified against openai/codex README; npm global- install puts the binary at `{npm_prefix}/bin/<name>` on POSIX, `{npm_prefix}\<name>.cmd` on Windows): - $HOME/.npm-global/bin/codex (user-set `npm config set prefix`) - /opt/homebrew/bin/codex (mac arm64 with homebrew-node) - /usr/local/bin/codex (mac intel / linux system node) - %APPDATA%\npm\codex.cmd (Windows npm global default) - $HOME\.npm-global\codex.cmd (Windows user-set prefix) Not probed (explicit override still required): - Custom npm prefixes — `npm root -g` would need a subprocess per resolve, too much surface for a probe helper - `brew install --cask codex` — cask layout isn't a PATH binary - Manual GitHub Releases extracts — placement is user-determined - `~/.bun/bin/codex` — not documented in openai/codex README Pi provider intentionally has no equivalent change: the Pi SDK is bundled into the archon binary (no subprocess), so there's no "binary" to resolve. Pi auth lives at `~/.pi/agent/auth.json` which the SDK already finds by default, and the PR A shim (`PI_PACKAGE_DIR`) handles the package-dir case via Pi's own documented escape hatch. E2E verified: removed both config entries from ~/.archon/config.yaml, rebuilt compiled binary, ran `archon workflow run archon-assist` and a Codex workflow. Logs showed `source: 'autodetect'` for both, responses returned cleanly.
…ry autodetect test The native-installer autodetect test computed its expected path from process.env.HOME, but the implementation uses node:os homedir(). On Windows, HOME is typically unset (Windows uses USERPROFILE), so the test fell back to '/Users/test' while the resolver returned the real home dir — making the spy's path-equality check fail and breaking CI on windows-latest. Mirror the implementation by importing homedir() from node:os and joining with node:path so the expected path matches the actual platform-resolved home and separator. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ver (#1365) Reported in #1365: a user running `archon serve` with DISCORD_BOT_TOKEN set but the "Message Content Intent" toggle disabled in the Discord Developer Portal saw the entire server crash with `Used disallowed intents`. Discord rejects the gateway connection (close code 4014) when a privileged intent is requested without being enabled, and the unguarded `await discord.start()` propagated the error all the way up, taking the web UI down with it. Wrap discord.start() in try/catch — log the failure with an actionable hint (special-cased for the disallowed-intent error) and continue running. Other adapters and the web UI come up regardless. The shutdown handler already uses optional chaining (`discord?.stop()`) so nulling discord after a failed start is safe. Other adapters (Telegram, Slack, GitHub, Gitea, GitLab) have the same unguarded-start pattern but are out of scope for this fix — addressing them is tracked separately. Also expanded the Discord setup docs with a caution callout that names the exact error string and the new log event so users can grep for both. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(script-nodes): add dedicated guide and teach the archon skill how to write them Script nodes (script:) have been a first-class DAG node type since v0.3.3 but were documented only as one-liners in CLAUDE.md and a CI smoke test. Claude Code reading the archon skill would see "Four Node Types: command, prompt, bash, loop" and reach for bash+node/python one-liners instead of a proper script node — losing bun's --no-env-file isolation, uv's --with dependency pins, and the .archon/scripts/ reuse story. - New packages/docs-web/src/content/docs/guides/script-nodes.md mirroring the structure of loop-nodes.md / approval-nodes.md: schema, inline vs named dispatch, runtime/deps semantics, scripts directory precedence (repo > home), extension-runtime mapping, env isolation, stdout/stderr contract, patterns, and the explicit list of ignored AI fields. - guides/authoring-workflows.md and guides/index.md updated so the new guide is discoverable from both the node-types table and the guides landing page. - reference/variables.md calls out the no-shell-quote difference between bash: and script: substitution — a subtle correctness trap when adapting a bash pattern into a script node. - Sidebar order bumped +1 on hooks/mcp-servers/skills/global-workflows/ remotion-workflow to slot script-nodes at order 5 next to the other node-type guides. - .claude/skills/archon/SKILL.md: replaces stale "Four Node Types" (which also silently omitted approval and cancel) with the accurate seven, with a script-node code block showing both inline and named patterns. - references/workflow-dag.md: full Script Node section covering dispatch, resolution, deps, stdout contract, and the list of AI-only fields that are ignored; validation-rules list updated. - references/dag-advanced.md and references/variables.md: retry-support line corrected; no-shell-quote note added. - examples/dag-workflow.yaml: added an extract-labels TypeScript script node and updated the header comment. * fix(docs): review follow-ups for script-node guide - skills example: extract-labels was reading process.env.ISSUE_JSON which is never set; use String.raw`$fetch-issue.output` so the upstream bash node's JSON is actually consumed - guides/script-nodes.md + skills/workflow-dag.md: idle_timeout is accepted but ignored on script (and bash) nodes — executeScriptNode only reads node.timeout. Clarify that script/bash use `timeout`, not idle_timeout - archon-workflow-builder.yaml: prompt enumerated only bash/prompt/command/loop, so the AI builder could never propose script or approval nodes. Add both (plus examples + rule about script output not being shell-quoted) and regenerate bundled defaults - book/dag-workflows.md + book/quick-reference.md + adapters/web.md: fill in the node-type references that were missing script, approval, and cancel. adapters/web.md also overclaimed "loop" in the palette — NodePalette.tsx only drags command/prompt/bash, so note that the other kinds are YAML-only
…nv gaps, add good-practices + troubleshooting (#1363) * fix(skill/when): document the full `when:` operator set and compound expressions The skill reference previously stated "operators: ==, != only" which is materially wrong — the condition evaluator supports ==, !=, <, >, <=, >= plus && / || compound expressions with && binding tighter than ||, plus dot-notation JSON field access. An agent authoring a workflow from the skill would think half the operators don't exist. Replaces the single-sentence section with a structured reference covering: - All six comparison operators (string and numeric modes) - Compound expressions with precedence rules and short-circuit eval - JSON dot notation semantics and failure modes - The fail-closed rules in full (invalid expression, non-numeric side, missing field, skipped upstream) Grounded in packages/workflows/src/condition-evaluator.ts. * feat(skill): document Approval and Cancel node types Approval and cancel nodes are first-class DAG node types (approval since the workflow lifecycle work in #871, cancel as a guarded-exit primitive) but the skill never described either one. An agent reading the skill and asked to "add a review gate before implementation" or "stop the workflow if the input is unsafe" would fall back to bash + exit 1, losing the proper semantics (cancelled vs. failed, on_reject AI rework, web UI auto-resume). Approval node coverage (references/workflow-dag.md, SKILL.md): - Full configuration block with message, capture_response, on_reject - The interactive: true workflow-level requirement for web UI delivery - Approve/reject commands across all platforms (CLI, slash, natural language) and the capture_response → $node-id.output flow - Ignored-fields list + the on_reject.prompt AI sub-node exception Cancel node coverage (references/workflow-dag.md, SKILL.md): - Single-field schema (cancel: "<reason>") - Lifecycle: cancelled (not failed); in-flight parallel nodes stopped; no DAG auto-resume path - The "cancel: vs bash-exit-1" decision rule (expected precondition miss vs. check itself failing) - Two canonical patterns — upstream-classification gate, pre-expensive-step gate Validation-rules list updated to enumerate approval/cancel constraints (message non-empty, on_reject.max_attempts range 1-10, cancel reason non-empty), plus a forward note that script: joins the mutually-exclusive set once PR #1362 lands. Placement in both files is after the Loop section and before the validation section, so this commit stays additive with respect to PR #1362's Script node insertion between Bash and Loop — rebase is clean. * feat(skill): document workflow-level fields beyond name/provider/model The skill's Schema section previously showed only name, description, provider, and model at the workflow level — which is most of a stub. Agents asked to "use the 1M-context Claude beta" or "run this under a network sandbox" or "add a fallback model in case Opus rate-limits" had no way to discover that any of these fields existed at the workflow level. Adds a comprehensive Workflow-Level Fields section covering: - Core: name, description, provider, model, interactive (with explicit callout that interactive: true is REQUIRED for approval/loop gates on web UI — a common footgun) - Isolation: worktree.enabled for pin-on/pin-off (the only worktree field at workflow level; baseBranch/copyFiles/path/initSubmodules are config.yaml only, so a cross-reference points there) - Claude SDK advanced: effort, thinking, fallbackModel, betas, sandbox, with explicit per-node-only exceptions (maxBudgetUsd, systemPrompt) - Codex-specific: modelReasoningEffort (with note that it's NOT the same as Claude's effort — this has confused users), webSearchMode, additionalDirectories - A complete worked example combining sandbox + approval + interactive All fields cross-referenced against packages/workflows/src/schemas/workflow.ts and packages/workflows/src/schemas/dag-node.ts. * feat(skill/loop): document interactive loops and gate_message Interactive loop nodes pause between iterations for human feedback via /workflow approve — used by archon-piv-loop and archon-interactive-prd. The skill's Loop Nodes section previously omitted both interactive: true and gate_message entirely, so an agent writing a guided-refinement workflow wouldn't know the feature exists or that gate_message is required at parse time. Adds: - interactive and gate_message rows to the config table (marking gate_message as required when interactive: true — enforced by the loader's superRefine) - A dedicated "Interactive Loops" subsection explaining the 6-step iterate-pause-approve-resume flow - Explicit call-out that $LOOP_USER_INPUT populates ONLY on the first iteration of a resumed session — easy to miss and a common surprise - Workflow-level interactive: true requirement for web UI delivery (loader warning otherwise) so the full-flow example is complete - Note that until_bash substitution DOES shell-quote $nodeId.output (unlike script bodies) — called out since the audit surfaced this inconsistency * fix(skill/cli): complete the CLI command reference with missing lifecycle commands The CLI reference previously documented only list, run, cleanup, validate, complete, version, setup, and chat — missing nearly every workflow lifecycle command an agent needs to operate a paused, failed, or stuck run. The interactive-workflows reference assumed these commands existed without actually documenting them. Adds full documentation for: - archon workflow status — show running workflow(s) - archon workflow approve <run-id> [comment] — resume approval gate (also populates $LOOP_USER_INPUT on interactive loops and the gate node's output when capture_response: true) - archon workflow reject <run-id> [reason] — reject gate; cancels or triggers on_reject rework depending on node config - archon workflow cancel <run-id> — terminate running/paused with in-flight subprocess kill - archon workflow abandon <run-id> — mark stuck row cancelled without subprocess kill (for orphan-cleanup after server crashes — matches the #1216 precedent) - archon workflow resume <run-id> [message] — force-resume specific run (auto-resume is default; this is for explicit override) - archon workflow cleanup [days] — disk hygiene for old terminal runs (with explicit callout that it does NOT transition 'running' rows, a common confusion) - archon workflow event emit — used inside loop prompts for state signalling; documented so agents don't invent their own mechanism - archon continue <branch> [flags] [msg] — iterative-session entry point with --workflow and --no-context flags Also: - Adds --allow-env-keys flag to the `workflow run` flag table with audit-log context and the env-leak-gate remediation use case - Adds an "Auto-resume without --resume" note disambiguating when --resume is needed vs. when auto-resume handles it - Adds --include-closed flag to `isolation cleanup`, which was previously missing; converts the flag list to a structured table - Explains the cancel/abandon distinction (live subprocess vs. orphan) All grounded in packages/cli/src/commands/workflow.ts, continue.ts, and isolation.ts. * feat(skill/repo-init): add scripts/ and state/, three-path env model, per-project env injection The repo-init reference was missing two first-class .archon/ directories (scripts/ since v0.3.3, state/ since the workflow-state feature) and had nothing to say about env — the #1 thing a user hits on first-run when their repo has a .env file with API keys. Directory tree updates: - Adds .archon/scripts/ with the extension->runtime rule (.ts/.js -> bun, .py -> uv) so agents know where to put named scripts referenced by script: nodes. - Adds .archon/state/ with explicit "always gitignore" callout — these are runtime artifacts, not source. Previously undocumented in the skill. - Adds .archon/.env (repo-scoped Archon env) and distinguishes it from the target repo's top-level .env. - Adds a "What each directory is for" list so the structure isn't just a tree with no narrative. .gitignore guidance: - state/ and .env added as must-gitignore (state/ matches CLAUDE.md and reference/archon-directories.md — skill was lagging). - mcp/ demoted to conditional — gitignore only if you hardcode secrets. New "Three-Path Env Model" section: - ~/.archon/.env (trusted, user), <cwd>/.archon/.env (trusted, repo), <cwd>/.env (UNTRUSTED, target project — stripped from subprocess env). - Precedence (override: true across archon-owned paths) and the observable [archon] loaded N keys / stripped K keys log lines so operators can verify what actually happened. - Decision tree for where to put API keys vs. target-project env vs. things Archon shouldn't touch. - Links to archon setup --scope home|project with --force for writing to the right file with timestamped backups. New "Per-Project Env Injection" section: - Documents both managed surfaces: .archon/config.yaml env: block (git-committed, $REF expansion) and Web UI Settings → Projects → Env Vars (DB-stored, never returned over API). - Names every execution surface that receives the injected vars: Claude/Codex/Pi subprocess, bash: nodes, script: nodes, and direct codebase-scoped chat. - Documents the env-leak gate with all 5 remediation paths so an agent hitting "Cannot register: env has sensitive keys" knows the options. Grounded in CHANGELOG v0.3.7 (three-path env + setup flags), v0.3.0 (env-leak gate), and reference/security.md on the docs site. * fix(skill/authoring-commands): correct override paths and add home-scoped commands The file-location and discovery sections described an override layout that does not match the actual resolver. It showed: .archon/commands/defaults/archon-assist.md # Overrides the bundled and claimed `.archon/commands/defaults/` was where repo-level overrides lived. In fact the resolver (executor-shared.ts:152-200 + command- validation.ts) walks `.archon/commands/` 1 level deep and uses basename matching — putting `archon-assist.md` at the top of `.archon/commands/` is the canonical way to override the bundled version. The `defaults/` subfolder is a Archon-internal convention for shipping bundled defaults, not a user-facing override pattern. Also, home-scoped commands (`~/.archon/commands/`, shipped in v0.3.7) were completely absent — agents authoring personal helpers wouldn't know they could live at the user level and be shared across every repo. Changes: - File Location section now shows all three discovery scopes (repo, home, bundled) with precedence ordering and 1-level subfolder rules - Duplicate-basename rule documented as a user error surface - Discovery and Priority section rewritten with accurate 3-step lookup order — no more references to the nonexistent defaults/ override path - Adds the Web UI "Global (~/.archon/commands/)" palette label note so users authoring helpers for the builder know what to expect No code changes — this is a pure fix of stale/incorrect skill reference material. * feat(skill): add workflow good-practices and troubleshooting reference pages Closes two gaps from the audit. The skill previously had zero guidance on designing multi-node workflows (what to avoid, what to reach for first, how to structure artifact chains) and zero guidance on where to look when things go wrong (log paths, env-leak gate remediations, orphan-row cleanup, resume semantics). New references/good-practices.md (9 Good Practices + 7 Anti-Patterns): - Use deterministic nodes (bash:/script:) for deterministic work, AI for reasoning — the single biggest quality lever - output_format required whenever downstream when: reads a field — the most common source of "workflow silently routes wrong" - trigger_rule: none_failed_min_one_success after conditional branches — the classic bug where all_success fails because a skipped when:-gated branch doesn't count as a success - context: fresh requires artifacts for state passing — commands must explicitly "read $ARTIFACTS_DIR/..." when downstream of fresh - Cheap models (haiku) for glue, strong for substance - Workflow descriptions as routing affordances - Validate (archon validate workflows) + smoke-run before shipping - Artifact-chain-first design - worktree.enabled: true for code-changing workflows (reversibility) - Anti-patterns with before/after YAML examples for each (AI-for-tests, free-form when: matching, context: fresh without artifacts, long flat AI-node layers, secrets in YAML, retry on loop nodes, tiny max_iterations, missing workflow-level interactive:, tool-restricted MCP nodes) New references/troubleshooting.md: - Log location (~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl) with jq recipes for common queries (last assistant message, failed events, full stream) - Artifact location for cross-node handoff debugging - 9 Common Failure Modes, each with root cause + concrete fix: - $BASE_BRANCH unresolvable - Env-leak gate (5 remediations) - Claude/Codex binary not found (compiled-binary-only) - "running" forever (AI working / orphan / idle_timeout) - Mid-workflow failure and auto-resume semantics - Approval gate missing on web UI (workflow-level interactive:) - MCP plugin connection noise (filtered by design) - Empty $nodeId.output / field access (4 causes) - Diagnostic command cheat sheet (list, status, isolation list, validate, tail-log, --verbose, LOG_LEVEL=debug) - Escalation protocol (version + validate + log tail + CHANGELOG + issue) SKILL.md routing table now dispatches "Workflow good practices / anti-patterns" and "Troubleshoot a failing / stuck workflow" to the new references so an agent can find them without having to know they exist. * docs(book): update node-types coverage from four to all seven The book is the curated first-contact reading path (landing page → "Get Started" → /book/). Both dag-workflows.md and quick-reference.md were stuck on "four node types" — missing script, approval, and cancel. A user reading the book as their first introduction would form an incomplete mental model, then find three more node types in the reference section later with no explanation of when they arrived. book/dag-workflows.md: - "four node types" → "seven node types. Exactly one mode field is required per node" - Table now lists Command, Prompt, Bash, Script, Loop, Approval, Cancel with one-line "when to use" for each, and cross-links to the dedicated guide pages for Script / Loop / Approval - New sections below the table for Script (inline + named examples with runtime and deps), Approval (with the interactive: true workflow-level note that's easy to miss), and Cancel (guarded-exit pattern) — keeping the existing narrative shape for Bash and Loop book/quick-reference.md: - Node Options table now includes script, approval, cancel rows - agents row added (inline sub-agents, Claude-only) - New "Script-specific fields" and "Approval-specific fields" subsections so the cheat-sheet is actually complete rather than pointing users elsewhere for the required constraints - Retry row callout that loop nodes hard-error on retry — previously omitted - bash timeout note widened to cover script timeout (same semantics) Both files are docs-web content; the CI build on the docs-script-nodes PR (#1362) previously validated the Starlight build path with a similar table addition, so this should render clean. * fix(skill/cli): remove nonexistent \`archon workflow cancel\`, fix workflow status jq recipe Two accuracy issues from the PR code-reviewer (comment 4311243858). C1: \`archon workflow cancel <run-id>\` does NOT exist as a CLI subcommand. The switch at packages/cli/src/cli.ts:318-485 dispatches on list / run / status / resume / abandon / approve / reject / cleanup / event — running \`archon workflow cancel\` hits the default case and exits with "Unknown workflow subcommand: cancel" (cli.ts:478-484). Active cancellation is only available via: - /workflow cancel <run-id> chat slash command (all platforms) - Cancel button on the Web UI dashboard - POST /api/workflows/runs/{runId}/cancel REST endpoint cli-commands.md: removed the \`### archon workflow cancel <run-id>\` subsection; kept the \`abandon\` subsection but made it explicit that abandon does NOT kill a subprocess. Added a call-out box at the bottom of the abandon section explaining where to go for actual cancellation. troubleshooting.md "running forever" section: split the original cancel-vs-abandon advice into three bullets — Web UI / CLI abandon (for orphans, no subprocess kill) / chat \`/workflow cancel\` (for live runs that need interruption). Added an explicit "there is no archon workflow cancel CLI subcommand" parenthetical since the wrong command was being suggested in flow. I1: the \`archon workflow list --json\` diagnostic used an incorrect jq filter. workflow list's --json output (workflow.ts:185-219) has shape { workflows: [{ name, description, provider?, model?, ... }], errors: [...] } with no \`runs\` field — \`jq '.workflows[] | select(.runs)'\` returns empty unconditionally. Replaced with \`archon workflow status --json | jq '.runs[]'\`, which matches the actual shape of workflowStatusCommand at workflow.ts:852+ ({ runs: WorkflowRun[] }). Also tightened the narration to distinguish JSON from human-readable status output. No change to the commit history in this PR — these are follow-up fixes to claims I introduced in earlier commits of this branch (f10b989 for C1, 66d2b86 for I1). * fix(skill): remove env-leak gate references (feature was removed in provider extraction) C2 from the PR code-reviewer (comment 4311243858). The pre-spawn env-leak gate was removed from the codebase during the provider-extraction refactor — see TODO(#1135) at packages/providers/src/claude/provider.ts:908. Zero hits for --allow-env-keys / allowEnvKeys / allow_env_keys / allow_target_repo_keys across packages/. The CLI's parseArgs (cli.ts:182-208) has no --allow-env-keys option, and because parseArgs uses strict: false, an unknown --allow-env-keys would be silently ignored rather than error. What remains accurate and is NOT touched: - Three-Path Env Model section (user/repo archon-owned envs are loaded; target repo <cwd>/.env keys are stripped from process.env at boot) still correctly describes current behavior, grounded in packages/paths/src/strip-cwd-env.ts + env-integration.test.ts - Per-Project Env Injection section (Option 1: .archon/config.yaml env: block; Option 2: Web UI Settings → Projects → Env Vars) is unchanged — both remain the sanctioned way to get env vars into subprocesses Removed claims (all three files): - cli-commands.md: --allow-env-keys flag row in the workflow run flags table - repo-init.md: the "Env-leak gate" subsection at the end of Per-Project Env Injection listing 5 remediations (all of which reference UI/CLI/ config surfaces that don't exist). Replaced with a succinct callout that explains the actual current behavior — target repo .env keys are stripped, workflows that need those values should use managed injection — so the reader still gets the "where to put my env vars" answer - troubleshooting.md: the "Cannot register: codebase has sensitive env keys" section (error message that can no longer be emitted) If the env-leak gate is ever resurrected per TODO(#1135), the docs can be re-added then. The CHANGELOG v0.3.0 entry describing the gate is a historical record of past behavior and does not need to be rewritten. * fix(skill/troubleshooting): correct JSONL event type names and field name C3 from the PR code-reviewer (comment 4311243858). The troubleshooting reference's event-types table used _started / _completed / _failed suffixes, but packages/workflows/src/logger.ts:19-30 shows the actual WorkflowEvent.type enum is: workflow_start | workflow_complete | workflow_error | assistant | tool | validation | node_start | node_complete | node_skipped | node_error The second jq recipe also queried `.event` but the discriminator is `.type`. Fixes: - Event table: renamed columns (_started → _start, _completed → _complete, _failed → _error). Explicitly called out the field name as `type` so the reader knows what jq selector to use - Replaced the "tool_use / tool_result" row with a single `tool` row and listed its actual payload fields (tool_name, tool_input, duration_ms, tokens) — tool_use/tool_result are SDK message kinds that appear within the AI stream, not top-level log event types - Added a `validation` row (was missing; it's emitted by workflow-level validation calls with `check` and `result` fields) - Removed `retry_attempt` row — this event type is not emitted to the JSONL file. Retry bookkeeping goes through pino logs, not the workflow log file - Added an explicit callout that loop_iteration_started / loop_iteration_completed (and other emitter-only events) go through the workflow event emitter + DB workflow_events table, NOT the JSONL file. Pointed readers to the DB or Web UI for loop-level detail. This distinguishes the two parallel event systems — easy to conflate (store.ts:11-17 uses _started/_completed/_failed for the DB side, logger.ts uses _start/_complete/_error for JSONL) - Fixed the "all failed events" jq recipe: .event → .type and _failed → _error - Minor cleanup: the inline "tool_use events" mention in the "running forever" section said the wrong event name — updated to "tool or assistant events in the tail" Grounded in packages/workflows/src/logger.ts (canonical JSONL event shape) and packages/workflows/src/store.ts (the parallel DB event naming, which the reviewer correctly flagged as different and worth keeping distinct). * fix(skill): two stragglers from the code-reviewer audit Cleanup of two references that slipped through the earlier C1 and C3 fixes: - references/troubleshooting.md:126: \`node_failed\` → \`node_error\` (the "Node output is empty" diagnostics section references the JSONL log, which uses the logger.ts enum — not the DB workflow_events table which does use \`node_failed\`). The C3 fix corrected the event table and one jq recipe but missed this inline mention. - references/interactive-workflows.md:106: removed \`archon workflow cancel <run-id>\` (nonexistent CLI subcommand) from the troubleshooting bullet. This was pre-existing before the hardening PR but fell within the C1 remediation scope. Replaced with the correct triage: reject (approval gate only) vs abandon (orphan cleanup, no subprocess kill) vs chat /workflow cancel (actual subprocess termination). Grounded in the same sources as the earlier C1/C3 commits: packages/cli/src/cli.ts:318-485 (no cancel case) and packages/workflows/src/logger.ts:19-30 (JSONL type enum). * feat(skill): point to archon.diy as the canonical docs source The skill had no reference to archon.diy (the live docs site built from packages/docs-web/). Several reference files said "see the docs site" without naming the URL, leaving the agent to guess or grep the repo for the hostname. An agent with the skill loaded should know that when the distilled reference pages don't cover a case, the full canonical docs are one WebFetch away. SKILL.md: new "Richer Context: archon.diy" section between Routing and Running Workflows. Covers: - When to reach for the live docs (longer examples, tutorial framing, features the skill only mentions in passing, "where's that documented?" user questions) - URL map — 13 starting points covering getting-started, book (tutorial series), guides/ (authoring + per-node-type + per-node-feature), reference/ (variables, CLI, security, architecture, configuration, troubleshooting), adapters/, deployment/ - Precedence: skill refs first (context-cheap, tuned for agents), docs site as escalation. Prevents agents defaulting to WebFetch when a local skill ref already covers the answer Also upgrades the 5 existing generic "docs site" mentions across reference files to concrete archon.diy URLs with anchor fragments where helpful: - good-practices.md: Inline sub-agents pattern → archon.diy/guides/ authoring-workflows/#inline-sub-agents - troubleshooting.md: "Install page on the docs site" → archon.diy/ getting-started/installation/ - workflow-dag.md: "Workflow Description Best Practices" → anchor link; sandbox schema reference → archon.diy/guides/authoring-workflows/ #claude-sdk-advanced-options - repo-init.md: Security Model reference → archon.diy/reference/ security/#target-repo-env-isolation (deep-link into the section that covers the <cwd>/.env strip behavior) URL source of truth: astro.config.mjs:5 (site: 'https://archon.diy'). URL structure mirrors packages/docs-web/src/content/docs/<section>/ <page>.md — verified by the 62 pages the docs build produces.
Anthropic's Opus 4.7 landed 2026-04-16; on the Anthropic API, opus / opus[1m] now resolve to 4.7 with a 1M context window at standard pricing. Using the alias instead of the hard-pinned claude-opus-4-6[1m] lets bundled default workflows auto-track the recommended Opus version. No explicit effort is set, so nodes inherit the per-model default (xhigh on 4.7, high on 4.6).
* fix(workflow): migrate piv-loop plan handoff to $ARTIFACTS_DIR (#1380) The create-plan node used a relative path (.claude/archon/plans/{slug}.plan.md) that the AI agent would sometimes write to a different location, breaking all downstream nodes that glob for the plan file. Migrated all plan/progress file references to $ARTIFACTS_DIR/plan.md and $ARTIFACTS_DIR/progress.txt, matching the pattern used by archon-fix-github-issue and other workflows. Changes: - Replace slug-based plan path with $ARTIFACTS_DIR/plan.md in create-plan node - Replace ls -t glob discovery with direct $ARTIFACTS_DIR/plan.md reads in refine-plan, code-review, and fix-feedback nodes - Replace empty-string guard with file-existence check in implement-setup bash - Migrate progress.txt references in implement loop to $ARTIFACTS_DIR/ - Add explicit plan/progress paths in finalize node - Regenerated bundled-defaults.generated.ts Fixes #1380 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(workflow): address review findings in archon-piv-loop - Rename 'Step 2: Write the Plan' to 'Step 2: Plan File Location' to eliminate the duplicate heading that collided with Step 3's identical title in the create-plan node - Guard implement-setup against a 0-task plan file: exit 1 with a clear error when no '### Task N:' sections are found, preventing a silent no-op implement loop - Remove 2>/dev/null from code-review commit so pre-commit hook failures and other stderr are visible to the agent instead of silently swallowed - Replace '|| true' on git push in finalize with an explicit WARNING echo so push failures (auth, upstream conflict, no remote) surface to the agent rather than being silently ignored - Regenerate bundled-defaults.generated.ts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore(workflows): regenerate bundled defaults to match opus[1m] alias The bundle was stale relative to the YAML sources after #1395 merged — check:bundled was failing CI. Regenerated; no YAML edits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cutor (#1403) PIV Task 1: Adds three new tests in a dedicated describe block 'executeDagWorkflow -- final status derivation' covering the anyFailed branch (dag-executor.ts ~line 2956) that previously had no direct test: - one success + one independent failure calls failWorkflowRun (not completeWorkflowRun) - multiple successes + one failure calls failWorkflowRun (not completeWorkflowRun) - trigger_rule: none_failed skips dependent node but anyFailed still marks run failed Fixes #1381.
New reference for the archon skill: a single-glance lookup of which parameter works on which node type, an intent-based "how do I..." table, a consolidated silent-failure catalog, and an inline agents: section (previously only referenced via archon.diy). Purpose is complementary, not duplicative: - workflow-dag.md remains the authoring guide - dag-advanced.md remains the hooks/MCP/skills/retry deep-dive - good-practices.md remains the patterns and anti-patterns - parameter-matrix.md is the grep-this-first lookup when you know the outcome you want but not which field gets you there Also registers the new reference in SKILL.md routing table.
Add explicit references to .github/PULL_REQUEST_TEMPLATE.md in both CONTRIBUTING.md and CLAUDE.md, plus a reminder to link issues with Closes/Fixes/Resolves so they auto-close on merge. Repo-triage runs were flagging dozens of partially-filled or unlinked PRs each cycle.
…riage (#1428) * feat(workflows): add maintainer-standup workflow for daily PR/issue triage Daily morning briefing that pulls origin/dev, triages all open PRs and assigned issues against direction.md, and surfaces progress vs. the previous run. Designed for live-checkout use (worktree.enabled: false) so it can read its own state. Layout under .archon/maintainer-standup/: - direction.md (committed) — project north-star: what Archon IS / IS NOT. Drives PR P4 polite-decline classification with cited clauses. - README.md / profile.md.example — setup docs and template for new maintainers. - profile.md, state.json, briefs/YYYY-MM-DD.md — gitignored, per-maintainer. Engine: - 3 parallel gather scripts in .archon/scripts/maintainer-standup-*.ts (git-status, gh-data, read-context) — bun runtime, JSON stdout. - Synthesis node: command file with output_format schema for { brief_markdown, next_state }. - Persist node: tiny inline bun script writes both to disk. Run-to-run continuity: state.json carries observed_prs/issues snapshots, so the next run can detect what merged, what closed, what the maintainer shipped, and which carry-over items aged past N days. Also adds .archon/** to the ESLint global ignore list (matches the existing .claude/skills/** pattern) since .archon/ is user content and not part of any tsconfig project. * fix(maintainer-standup): address CodeRabbit review on #1428 - gh-data: bump --limit 100 → 1000 on all_open_prs and warn loudly when the cap is hit; preserves the observed_prs invariant the next-run "resolved since last run" diff depends on. (CodeRabbit critical) - maintainer-standup.md: clarify P1 CI signal — the gathered payload only carries mergeStateStatus, not statusCheckRollup; for borderline P1s, drill in via `gh pr checks <n>`. (CodeRabbit minor) - workflow.yaml persist: write briefs under local YYYY-MM-DD (sv-SE locale) instead of UTC ISO date, so an evening run doesn't file tomorrow's brief and break recent_briefs lookups. (CodeRabbit minor) - workflow.yaml persist: wrap state/brief writes in try/catch; on failure dump brief_markdown and next_state to stderr so a 5-minute Sonnet synthesis isn't lost to a transient disk error. (CodeRabbit minor) - gh-data + git-status: switch from execSync (shell-string) to execFileSync (argv array) for git/gh invocations. Defense-in-depth against shell metacharacters in values that pass through (esp. the gh_handle from profile.md). (CodeRabbit nitpick)
Add optional `tags: string[]` to `workflowBaseSchema`. Explicit values take precedence over keyword inference; `tags: []` suppresses inference end-to-end; omitting the field falls back to inference (backwards compatible). Non-array values warn-and-ignore matching the sibling `worktree`/`additionalDirectories` patterns.
…ows under maintainer/ (#1430) * feat(workflows): add maintainer-review-pr and group maintainer workflows under .archon/workflows/maintainer/ Adds the maintainer-review-pr workflow — a Pi/Minimax-based PR triage flow that gates on direction alignment, scope focus, and PR-template quality before doing any deep review. If the gate clears, runs the five review aspects (code/error-handling/test-coverage/comment-quality/ docs-impact) as parallel Archon nodes and auto-posts a synthesized review comment. If the gate fails (direction conflict, multiple concerns, sprawling scope), drafts a polite-decline comment and pauses for the maintainer's approval before posting. Reorganizes the existing maintainer-standup workflow into the same subfolder so all maintainer-facing workflows live together. Subfolder grouping is supported by the workflow loader (1 level deep, resolution by filename). What lands: - .archon/workflows/maintainer/maintainer-standup.yaml (moved from .archon/workflows/maintainer-standup.yaml) - .archon/workflows/maintainer/maintainer-review-pr.yaml (new) - .archon/commands/maintainer-review-{gate,code-review,error-handling, test-coverage,comment-quality,docs-impact,synthesize,report}.md (new, Pi-tuned variants of the existing review-agent commands so they avoid Claude-only Task / sub-agent patterns) Pi/Minimax integration: - Uses provider: pi, model: minimax/MiniMax-M2.7 — verified via the e2e-minimax-smoke test that Pi correctly routes to Minimax (session jsonl confirms provider=minimax) and that Pi's best-effort output_format parser handles the gate's nested schema. - Two test runs landed real comments: a direction-decline on PR #1335 and a deep-review on PR #1369. Both were posted to GitHub via the workflow's gh pr comment node. * chore(workflows): also group repo-triage under .archon/workflows/maintainer/ repo-triage is the third maintainer-facing workflow alongside maintainer-standup and maintainer-review-pr; group it in the same subfolder for consistency. Subfolder resolution is by filename so the workflow name is unchanged.
…r unmapped providers (#1284) Closes #1096. - Switch Pi provider model lookup from pi-ai's getModel() (static catalog only) to ModelRegistry.create(authStorage).find() so user-configured custom models in ~/.pi/agent/models.json (LM Studio, ollama, llamacpp, custom OpenAI-compatible endpoints) are discoverable. - Remove the local lookupPiModel helper. - For env-var-mapped providers (anthropic, openai, etc.) still throw with a pi /login hint when credentials are missing. For unmapped providers, log pi.auth_missing at info and continue so local models that don't need credentials work without ceremony. - Surface modelRegistry.getError() in the not-found message and emit pi.model_not_found so users debugging custom-provider configs see the real cause (e.g. missing baseUrl in models.json). - Guard AuthStorage.create() and ModelRegistry.create() with try/catch so a malformed ~/.pi/agent/auth.json surfaces with Pi-framed context instead of a raw SDK stack trace. - Document the credential-free path for local providers in ai-assistants.md. Co-authored-by: Matt Chapman <Matt@NinjitsuWeb.com>
…add e2e-minimax-smoke (#1431) * chore(workflows): group all smoke-test workflows under .archon/workflows/test-workflows/ Move the 7 existing e2e-*.yaml smoke tests plus the new e2e-minimax-smoke test into a dedicated subfolder. Subfolder grouping is supported by the workflow loader (1 level deep, resolution by filename) so workflow names are unchanged. Mirrors the .archon/workflows/maintainer/ split landing in #1430. Also adds e2e-minimax-smoke.yaml — a sanity check that Pi correctly routes to Minimax M2.7 via the user's local pi auth, and that Pi's best-effort output_format parser handles a small nested schema. Asserts routing by reading the most recent Pi session jsonl rather than asking the model to self-identify (LLMs are unreliable narrators about their own identity, especially when Pi's system prompt mentions other providers as defaults). * fix(e2e-minimax-smoke): address CodeRabbit review on #1431 - Widen find window from -mmin -3 to -mmin -10. The smoke's three Pi nodes plus the assert can collectively run several minutes on slow networks; 3 minutes was tight enough to false-FAIL on a healthy run. (CodeRabbit minor) - Drop non-deterministic `head -1` over `find` output. find doesn't guarantee any order; on a tie, the wrong file would be picked. Now iterates all matching sessions and breaks on first one carrying the routing signal — any match is sufficient evidence. (CodeRabbit minor) - Replace single-regex `'"provider":"minimax".*"modelId":"MiniMax-M2.7"'` with two separate greps joined by `&&`. JSON field order isn't part of Pi's contract; a future Pi release reordering `provider` and `modelId` in the model_change event would silently false-FAIL the original pattern. The new check is order-independent. (CodeRabbit major)
Six findings, two majors and four minors/nitpicks: - gate.md L17 vs L77: resolved conflicting input-source instructions. Body claimed "all inline, no extra fetch" while a later phase permitted reading PULL_REQUEST_TEMPLATE.md. Now: explicit "one allowed extra read" callout in Phase 1 + matching wording in Gate C. (CodeRabbit major) - gate.md fenced blocks: added missing language identifiers (text/json/ markdown) to satisfy markdownlint MD040. (CodeRabbit minor) - gate.md L155 + read-context.ts: deterministic clock. The 3-day deadline was anchored to prior_state.last_run_at, which can be stale and produce past-dated deadlines. Moved both today and deadline_3d into the read-context.ts output (computed via sv-SE locale → ISO date in local time) and instructed the gate to use $read-context.output.deadline_3d directly. LLMs are unreliable at calendar arithmetic; this avoids it entirely. (CodeRabbit major) - maintainer-review-pr.yaml fetch-diff: dropped 2>/dev/null on gh pr diff so auth / network / deleted-PR failures fail the node instead of feeding an empty diff to the gate. Empty-but-successful diff (PR has no changes) is now an explicit marker the gate can detect. (CodeRabbit minor) - maintainer-review-pr.yaml approve-unclear: added capture_response: true so the maintainer's approve comment flows to the report node. Reject reasoning is already captured by Archon's run record. (CodeRabbit minor) - maintainer-review-pr.yaml post-decline + report.md: the gh pr edit --add-label call previously swallowed all errors with || true and the report still claimed the label was applied. Now writes applied/skipped to $ARTIFACTS_DIR/.label-applied + the gh stderr to .label-error so the report can describe the actual outcome. (CodeRabbit nitpick)
…ume (#1435) * fix(workflows): approval gate bypass after reject-with-redraft on resume When an approval node was rejected with on_reject.prompt, the synthetic PromptNode built to run the on_reject prompt reused the approval gate's own node ID. executeNodeInternal then wrote a node_completed event with that ID, causing getCompletedDagNodeOutputs to treat the gate as already completed on the next resume — bypassing the human gate entirely. Fix: give the synthetic node the ID `${node.id}:on_reject` so its node_completed event has a distinct step_name that won't match the approval gate slot in priorCompletedNodes. Adds a regression test asserting no node_completed event with the approval gate's ID is written during on_reject execution. Fixes #1429 * test(workflows): add positive assertion and SSE side-effect comment for on_reject synthetic node Add complementary positive assertion to the regression test to verify that node_completed is written exactly once with step_name 'review:on_reject', ensuring future refactors that suppress the event entirely would be caught. Add inline comment in executeApprovalNode documenting the known SSE side-effect: node_started/node_completed events with nodeId='review:on_reject' flow through the SSE pipeline into the web UI, resulting in a transient phantom node in the execution view. This is cosmetic-only — the human gate contract is preserved. * simplify: reduce duplicate cast pattern in on_reject test assertions
…e checkout (#1438) * feat(workflows): add mutates_checkout field to skip path-lock for concurrent runs Add `mutates_checkout: boolean` (optional, default true) to the workflow schema. When set to false, the executor skips the path-exclusive lock that serializes all runs on the same working path, allowing N concurrent runs on the same live checkout. The primary use case is `maintainer-review-pr`, which reads shared state but writes only to per-run artifact paths and GitHub PR comments — two parallel reviews of different PRs should not fail with "Workflow already active on this path". Changes: - `schemas/workflow.ts`: add optional `mutates_checkout` field - `loader.ts`: parse and propagate the field (warn-and-ignore on invalid values) - `executor.ts`: wrap path-lock guard in `if (workflow.mutates_checkout !== false)` - `executor.test.ts`: two new tests in the concurrent-run guard suite - `maintainer-review-pr.yaml`: opt in with `mutates_checkout: false` * test(workflows): add loader tests for mutates_checkout parsing - Add 5 tests covering false, true, omitted, and invalid (string "yes") values - Invalid non-boolean values are silently dropped with warn — now explicitly tested - Remove the // end mutates_checkout guard trailing comment (no precedent in file) - Clarify loader comment: "parse/warn pattern" not "warn-and-ignore pattern" to avoid implying the return style matches interactive * simplify: collapse nodeType/aiFields pair into single nonAiNode object in parseDagNode
…es (#1434) * docs: replace String.raw with direct assignment in script node examples String.raw`$nodeId.output` fails silently when substituted output contains a backtick, terminating the template literal early and producing cryptic parse errors. JSON is valid JS expression syntax, so direct assignment is safe for all valid JSON values including those with backticks. - Replace String.raw pattern in dag-workflow.yaml example - Replace String.raw pattern in archon-workflow-builder.yaml template - Add CAUTION bullet in workflow-dag.md Script Node section - Add Silent Failures item #14 in parameter-matrix.md - Add Starlight caution aside in script-nodes.md - Extend script bodies bullet in variables.md - Regenerate bundled-defaults.generated.ts Fixes #1427 * docs: fix Rule 6 in generate-yaml prompt to distinguish bun vs uv patterns Rule 6 still referenced JSON.parse after the example was updated to direct assignment, creating a contradiction for the AI code generator. Update the prose to explicitly distinguish TypeScript/bun (direct assignment) from Python/uv (json.loads), matching the updated embedded example.
…s/experimental/ Move two repo-scoped workflows that were sitting untracked at the workflow root into a dedicated subfolder. Subfolder grouping is supported by the loader (1 level deep, resolution by filename), so workflow names are unchanged and the /release skill still resolves archon-release correctly. Files moved: - archon-fix-github-issue-experimental.yaml — Path-A variant of the issue-fix workflow used today to land #1434, #1435, #1438. - archon-release.yaml — the live release workflow used by the /release skill end-to-end (validate -> binary smoke -> version bump -> changelog -> approval -> commit -> PR -> tag -> Homebrew formula update).
…des (#1387) executeBashNode previously only merged explicit envVars on top of process.env. The three well-known workflow directories (artifactsDir, logDir, baseBranch) were passed as function parameters and used for compile-time substitution of $ARTIFACTS_DIR / $LOG_DIR / $BASE_BRANCH in the script body, but were never added to the subprocess environment. As a result, any script that relied on shell-runtime expansion — e.g. JSON_FILE="${ARTIFACTS_DIR}/foo.output.json" inside a heredoc, an inherited helper script, or a `bash -c` subshell — saw the variable unset and silently fell back to its default (typically an empty string or "."), writing artifacts to the workflow cwd instead of the nominal artifacts directory. Always build subprocessEnv from process.env plus the three well-known directories, then allow explicit envVars to override. Compile-time substitution behavior is unchanged; existing scripts that do not reference these variables are unaffected; user-supplied envVars still win on conflict.
…1426) * fix(workflow): substitute \$nodeId.output refs in approval messages Approval node messages were emitted as raw strings, bypassing the substituteNodeOutputRefs() pass that prompt/bash/loop/cancel nodes all run. This made interactive workflows like atlas-onboard show literal "\$gather-context.output.repo_name" placeholders to humans at HITL gates, leaving them unable to know what they were approving. Fix: rendered the approval.message through substituteNodeOutputRefs once at the top of the standard approval gate path, then used the resolved string in all 4 emission sites (safeSendMessage, createWorkflowEvent, pauseWorkflowRun, event-emitter). Test: new dag-executor.test case wires a structured-output upstream node into an approval node and asserts pauseWorkflowRun receives the substituted message ("Repo: hcr-els | App: CCELS | Port: 3012") rather than the literal placeholders. Repro: any workflow with an approval node whose message references \$nodeId.output[.field]. Observed in the wild on atlas-onboard's confirm-context HITL gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(workflow): extend approval-substitution test to cover all 4 emission sites Per CodeRabbit review: the original test only verified pauseWorkflowRun received the substituted message, but the fix touches 4 emission sites. A future regression at safeSendMessage / createWorkflowEvent / event-emitter would silently leave the test passing while users still saw raw $node.output placeholders. Adds two additional assertions: - platform.sendMessage prompt contains substituted message + does NOT contain literal $gather-context.output placeholders - The persisted approval_requested workflow event's data.message is substituted Event-emitter assertion deferred (no existing pattern for spying on the global emitter in this test file). Two of three secondary surfaces covered closes the practical regression risk — both are user-visible (chat prompt + audit-log event); the emitter is internal only. Test count: 7 pass / 22 expect() (was 18). Full suite 193 pass / 353 expect() — no regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#1367) * feat(workflows): expose $LOOP_PREV_OUTPUT in loop node prompts (#1286) Adds a new substitution variable that carries the previous loop iteration's cleaned output into the next iteration's prompt. Empty on iteration 1; the prior iteration's output (after stripCompletionTags) on iteration 2+. Why: fresh_context: true loops have no way to reference what the previous pass produced or why it failed without dragging the full session forward. $LOOP_PREV_OUTPUT closes that gap with zero session-cost — same trust boundary as $nodeId.output, no new external surface. Changes: - packages/workflows/src/executor-shared.ts: substituteWorkflowVariables accepts a 10th positional loopPrevOutput arg and substitutes $LOOP_PREV_OUTPUT (defaults to ''). - packages/workflows/src/dag-executor.ts: executeLoopNode passes lastIterationOutput on iteration 2+ (and explicit '' on iteration 1 / the first iteration of an interactive resume, since lastIterationOutput is a per-call variable that does not survive resume metadata). - Unit tests: 3 new cases in executor-shared.test.ts. - Integration tests: 2 new cases in dag-executor.test.ts verifying the prompt sent to the AI on iter 1 vs iter 2, and that the value reflects cleaned output (no <promise> tags). - Docs: variables.md, loop-nodes.md (new "Retry-on-failure" pattern), CLAUDE.md variable reference. Backward compatibility: prompts that don't reference $LOOP_PREV_OUTPUT are unaffected. All 843 workflow tests + type-check + lint + format:check + bun run validate pass locally. * docs: address coderabbit review on variables/loop-nodes - variables.md: include $LOOP_PREV_OUTPUT in substitution-order list and availability table to match the new variable row at line 30 - loop-nodes.md: document the interactive-resume exception where the first iteration after an approval-gate resume still receives an empty $LOOP_PREV_OUTPUT regardless of iteration number (per dag-executor.ts L1781-1783 where i === startIteration always clears prev output) * docs(changelog): add Unreleased entry for $LOOP_PREV_OUTPUT (#1367 review) * test(loop): add resume-from-approval integration test for $LOOP_PREV_OUTPUT (#1367 review) Per maintainer-review-pr suggestion (Wirasm): two-call integration test covering the resume-from-approval scenario. - Call 1: fresh interactive loop pauses at the gate after iteration 1 and asserts $LOOP_PREV_OUTPUT substitutes to empty on iter 1 (no prior output) plus the gate pause is recorded. - Call 2: resumed run with metadata.approval populated. The first resumed iteration must substitute $LOOP_PREV_OUTPUT to '', NOT to the paused run's iter-1 output (which lived in a different process and is not persisted). $LOOP_USER_INPUT still flows through as normal. Locks the documented invariant at dag-executor.ts:1769-1772. --------- Co-authored-by: voidborne-d <DottyEstradalco@allergist.com>
…1457) The brief was missing a key signal — when contributors reply on PRs or issues, the maintainer wouldn't see it explicitly. Empirically reviewed PR replies were buried under aggregate updatedAt timestamps with no indication of WHO replied or WHAT they said. This adds a new "Replies waiting on you" section to the daily brief, sourced from two paginated GitHub API calls scoped by since=last_run_at: - /repos/{o}/{r}/issues/comments PR + issue conversation comments - /repos/{o}/{r}/pulls/comments inline code-review comments Filters applied: - Skip the maintainer's own comments (gh_handle from profile.md) - Skip GitHub bot accounts (login ending in [bot]) — coderabbitai, chatgpt-codex-connector, dependabot, etc. They post a constant churn of automated review tooling that drowns out human replies; the maintainer wants the latter. Output is grouped by PR/issue number with kind classification: - issue comment on a non-PR issue - pr_conversation PR conversation-level comment - pr_review inline code-review comment (most actionable — usually needs a code-level response, so kind upgrades to pr_review whenever review comments arrive on a PR that also has conversation ones) Sorted by recency (newest reply first). Synthesizer reads gh-data.output.replies_since_last_run and renders a section. Verified on a backdated state.json (last_run_at = yesterday morning): 22 human replies on 22 PRs/issues, bot noise filtered (32 → 22 after the [bot] filter). Surfaces exactly the contributor responses to yesterday's review comments and direction questions.
The maintainer-standup brief had no signal for "I already triaged that
PR via maintainer-review-pr 2 days ago" — it just kept listing reviewed
PRs in P1-P4 with no acknowledgement of prior work. Result: maintainer
ends up re-skimming the same PR several mornings in a row.
This adds a shared persistent state file at:
.archon/maintainer-standup/reviewed-prs.json (gitignored, per-maintainer)
shape:
{
"1338": {
"reviewed_at": "2026-04-27T16:34:57Z",
"gate_verdict": "review", // review | decline | needs_split | unclear
"run_id": "..."
},
...
}
Three pieces:
1. WRITER — new `record-review` script node in maintainer-review-pr.yaml,
runs after whichever branch fired (post-review / post-decline /
approve-unclear) with trigger_rule: one_success. Inline bun script;
reads $gate.output.verdict, $ARTIFACTS_DIR/.pr-number, and
$WORKFLOW_ID; appends/upserts the entry. report node now depends on
record-review so the state write happens before the run completes.
2. READER — read-context.ts loads reviewed-prs.json into a new
reviewed_prs field on the standup gather output. Same pattern as
prior_state and recent_briefs.
3. SURFACE — maintainer-standup command file gets a Phase 2h rule:
when listing PRs in P1-P4 / Polite-decline sections, append:
- "✓ reviewed Nd ago" for review-branch entries
- "✓ declined Nd ago" for decline / needs_split branches
- "✓ triaged Nd ago (unclear)" for unclear branch
and a STALENESS marker — compare reviewed_at to PR's updatedAt; if
contributor pushed since the prior review, append
"⚠ contributor pushed since" so the maintainer knows the prior pass
may need to be re-run.
Plus a one-shot backfill script:
.archon/scripts/maintainer-standup-backfill-reviews.ts
Scans the maintainer's gh comments in the last 7 days, pattern-matches
"## Review Summary" / direction-clause-citation / split-up wording, and
populates reviewed-prs.json. Idempotent; existing entries (from real
workflow runs) take precedence over backfilled ones (the writer-node
record is more authoritative than a body-pattern guess). Uses 64MB
maxBuffer on the gh exec because --paginate over 7 days of an active
repo's comments easily exceeds Node's default 1MB.
Backfill verified: 363 comments scanned, 18 matched, 17 unique PRs
populated — exactly the 17 PRs we reviewed via the workflow yesterday.
The new state file is gitignored alongside the existing per-maintainer
files (profile.md, state.json, briefs/).
…1460) Both SDKs were ~30 patch releases behind. Validation suite passes (type-check, lint, format, tests across all 10 packages) without code changes. The only sustained Claude SDK behavior change in the range — v0.2.111's options.env overlay/replace flap, since reverted to overlay — is a no-op for Archon, which already passes { ...process.env } as the SDK env.
…t scope (#1724) The maintainer-review-pr workflow's docs-impact reviewer has been flagging "missing CHANGELOG entry" at MEDIUM (and HIGH) on multiple PRs since we started using it. The project doesn't follow per-PR CHANGELOG maintenance — the `/release` skill generates entries from squash-commit history when cutting a release, so contributors writing per-PR entries would create duplicate work and merge conflicts. Removes: - the "Migration → CHANGELOG.md" bullet from the per-change analysis list - `CHANGELOG.md` from the "specific places to check" enumeration - "changelog entry" from the MEDIUM severity bucket heading Adds an explicit callout that CHANGELOG.md is out of scope for this review, with a one-line explanation of where it gets generated. Keeps the rest of docs-impact behavior unchanged — public APIs, CLI flags, env vars, and user-facing behavior changes are still in scope across the docs site and CLAUDE.md. Project-local command file (`.archon/commands/`), loaded from disk per run, so the change takes effect on the next maintainer-review-pr invocation.
… substitution corruption (fixes #1717) (#1718) * fix(workflows): write large node outputs to temp file to prevent bash substitution corruption (#1717) When a bash node references $nodeId.output from an upstream node whose output exceeds ~32KB, inlining the full value as a bash -c argument causes silent data corruption. This adds a size threshold (NODE_OUTPUT_FILE_THRESHOLD = 32KB): outputs below it are still shell-quoted inline; outputs at or above it are written to a temp file in logDir and substituted with $(cat '<path>') so bash reads the value at runtime without argv size issues. Affected paths: executeBashNode and loop-node until_bash. Closes #1717 * fix(workflows): wrap shellQuoteOrFile writeFileSync in try/catch with fallback Address review feedback from @Wirasm: - Wrap writeFileSync in try/catch so disk-full or permission errors produce a structured log instead of an unhandled exception - Fall back to inline shell-quoting on failure (pre-file-spill behavior) - Add test for fallback path using a non-existent directory Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com> --------- Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
…#1371) * fix(providers/codex): create a fresh AbortController per retry attempt Fixes #1266. The codex provider's retry loop reused the caller's AbortSignal across every attempt. When attempt N's Codex subprocess crashes, Node's `spawn({ signal })` linkage aborts the shared signal as part of SIGTERM'ing the dying child. On attempt N+1, `runStreamed` passes that same (already-aborted) signal into the next `spawn`, which SIGTERMs the freshly-spawned child before it reads any input. The "Reading prompt from stdin..." line in the resulting error is Codex CLI's normal startup banner, not a crash locus. The fix: signal assignment moves out of buildTurnOptions and into the retry loop. Each attempt gets a brand-new AbortController; the caller's signal (if provided) is chained in via a once-listener so cancellation still propagates. A try/finally removes the listener and aborts the per-attempt controller once the attempt terminates. Regression tests: - `retry after crash receives a fresh (non-aborted) AbortSignal` captures the signal passed to `runStreamed` at call-time (not by mock.calls reference, which would see the mutated .signal after reassignment) and asserts attempt 1 got a distinct, non-aborted signal. - `caller abort forwards into the active per-attempt signal` aborts the caller mid-attempt and asserts the per-attempt signal observes it. - Two existing tests updated: `buildTurnOptions` no longer attaches the caller's signal, so both "passes signal in TurnOptions" tests now assert presence of an AbortSignal without identity-equality against the caller's. Without the fix, these four tests fail and the rest pass (47/51). With the fix, all 51 pass. Out of scope: the binary HTTP timeout class-B path in the issue. * fix(providers/codex): synchronous abort check at stream entry, plus review-pass fixes `streamCodexEvents` now checks `abortSignal?.aborted` before entering the `for await` so a caller abort that lands between attempt setup and the first event surfaces immediately instead of waiting on the next event or end-of-stream. The existing between-events check is retained. Also from the same review pass: - Pi shim: wrap `mkdirSync`/`writeFileSync` in try/catch so EACCES/ ENOSPC surfaces as a classified "Pi shim setup failed at <dir>" instead of a raw node:fs error. - Codex retry path: `getLog().debug` before throwing the model-access error from a retry-attempt `startThread`; the outer query_error log only runs for retryable errors. - Docs: ai-assistants.md and configuration.md updated for Claude `~/.local/bin/claude` autodetect; ai-assistants.md gains a Codex autodetect bullet listing the five probed paths. - Tests: `homedir()` instead of `process.env.HOME ?? '/Users/test'` to match the implementation; Windows autodetect probe covered; config-over-autodetect precedence covered.
… every codebase registration (#1729) * fix(core): resolve default assistant via config + folder detection on every codebase registration ## Summary - Extract `resolveDefaultAssistant(repoPath)` helper into `packages/core/src/config/resolve-assistant.ts` with precedence: `.codex` / `.claude` folder → `loadConfig().assistant` → first built-in provider → `'claude'`. - Call the helper from `clone.ts` (replacing the inline block) and from the three forge adapters (`github`, `gitlab`, `gitea`) which previously passed no `ai_assistant_type` and silently defaulted to `'claude'` regardless of the configured assistant. - `createCodebase` stays a thin DB function with the simple `?? 'claude'` fallback. No dynamic-import config-loading inside the DB layer. - Lazy-load `@archon/providers` inside the helper so the resolve module doesn't pull provider SDK chains at every adapter import site (which previously broke adapter tests that mock `@archon/paths` without `BUNDLED_IS_BINARY`). - New `resolve-assistant.test.ts` uses `spyOn` (not `mock.module`) for `loadConfig` and `getRegisteredProviders` so the spies cleanly `mockRestore()` and do not pollute `config-loader.test.ts` running in the same batch. - `config-loader.test.ts` switches its file I/O mocks from `mock.module('./config-loader', ...)` to `mock.module('fs/promises', ...)` for cross-Bun-version compatibility. - New `clone.test.ts` cases verify the configured-provider and loadConfig-failure fallbacks via the integration path. Fixes #1580. ## Test plan - [x] `bun test src/db/adapters/sqlite.test.ts src/db/codebases.test.ts ... src/config/ src/state/` — exact CI batch, 366 pass / 0 fail - [x] `bun --filter @archon/core --filter @archon/adapters --filter @archon/server test` — only pre-existing macOS-only telegram-markdown failures (unrelated, present on dev) - [x] `bun run type-check`, `bun run lint`, `bun run format:check` all clean * fix(core): also lazy-load loadConfig in resolve-assistant config-loader.ts eagerly imports @archon/providers (for runtime validation of the configured assistant ID against the registry), which transitively loads claude/codex binary-resolvers and their BUNDLED_IS_BINARY dependency on @archon/paths. Static-importing loadConfig at module top therefore forces every caller of resolve-assistant — including the three forge adapters — to pull that chain too. Adapter tests that mock @archon/paths without BUNDLED_IS_BINARY then break on Linux. Move the loadConfig import to a dynamic import inside the function body alongside the getRegisteredProviders one. Nothing in this module is loaded eagerly anymore. * fix(core): pass repoPath to loadConfig in resolve-assistant loadConfig() without a path only merges the global config; the repo's own .archon/config.yaml (which can set assistant: pi at the project level) is silently skipped. Pass repoPath so repo-level config is honored during registration. Add a call-contract assertion in resolve-assistant.test.ts so a future regression that drops the path is caught. Surfaced by CodeRabbit review on #1729. * fix(adapters): import resolve-assistant via deep subpath to avoid config-loader chain @archon/core/config/index.ts does `export * from './config-loader'`, which forces eager loading of config-loader.ts (and its top-level @archon/providers import) at every site that imports from @archon/core/config. The three forge adapters were importing the helper through that barrel, which pulled in the binary-resolver chain and broke @archon/adapters tests on Linux (mocked @archon/paths without BUNDLED_IS_BINARY). Add a dedicated './config/resolve-assistant' subpath export and switch the three forge adapters to it. Only resolve-assistant.ts is loaded — no transitive @archon/providers at module-load time.
…dirs (#1723) (#1737) * fix(providers/claude): reject directory paths and expand npm package dirs The Claude binary resolver validated configured paths with existsSync, which returns true for directories. Users on Windows who installed Claude Code via npm and configured claudeBinaryPath to the npm platform-package directory (e.g. ...\@Anthropic-AI\claude-code-win32-x64) hit a confusing SDK-side ReferenceError ("Claude Code native binary not found at <path>") because the SDK's child_process.spawn(directory) failed with ENOENT. Replace the existence-only check with a pathKind() helper that distinguishes file / directory / missing, and transparently expand a configured directory to the platform-appropriate child executable (claude.exe on Windows, claude on Unix) when present. A directory without the expected binary now produces a directory-specific error that tells the user what to fix. The autodetect branch already targets a file path directly and is unchanged. Fixes #1723 * fix(providers/claude): address self-review — broken-symlink test + codex TODO - Add a regression test for pathKind() returning 'missing' on a broken symlink (uses a real tmp symlink so the statSync ENOENT path is actually exercised, not mocked). - Add a TODO marker in the Codex resolver pointing at #1723. The Codex resolver has the identical existsSync-on-directory gap; left unfixed in this PR to avoid scope creep but now discoverable from the file itself when a Codex bug report lands or someone does a deliberate parity pass. * fix(providers/claude): address review — autodetect parity, EACCES breadcrumb, doc updates Extends #1723 fix per multi-agent PR review: - Autodetect branch now uses pathKind === 'file' instead of fileExists so a directory at ~/.local/bin/claude no longer slips past validation and crashes the SDK as ENOENT (matches the env/config branches). - pathKind catches now distinguish ENOENT/ENOTDIR from other stat errors (EACCES, ELOOP, etc.) and emit a WARN log line with the error code so operators have a triage breadcrumb for permission issues that would otherwise surface as the misleading "file does not exist". - Extract CLAUDE_BINARY_NAME constant (was duplicated 7 times across source + tests) and export PathKind type so test mockReturnValue calls are type-checked against the union rather than being unknown strings. - Inline expandDirectoryToExecutable into validateAndExpand — single caller, body shorter than its JSDoc. Drop the WHAT-restating first sentence of validateAndExpand's docstring. - Strip the "Wrapped for spyOn parity" clause from pathKind's JSDoc — contradicted the accurate first sentence and implied the design was testability-driven rather than classification-driven. - Align spy declarations to `| undefined` in binary-resolver.test.ts to match the dev-mode file. Drop the now-unused fileExistsSpy. - Add pathKind happy-path tests (real file → 'file', real dir → 'directory'). Without these, a typo like isFile() → isDirectory() would pass every existing test because all resolver tests spy through pathKind and never exercise the real statSync logic. - Add two dev-mode tests for CLAUDE_BIN_PATH-as-directory. The env branch runs validateAndExpand before the BUNDLED_IS_BINARY guard, so dev users get expansion too; pin the contract. - Add a Windows autodetect-rejects-directory regression test. Docs: surface the new directory-accepting behavior so Windows users who install via npm can discover it without re-reading the source.
Capture the acceptance criteria and maintenance policy for community providers in direction.md so PR triage stops devolving into ad-hoc 'should this match Pi or not' debates. Policy in brief: - Coding-agent SDK required (no raw chat.completions wrappers — Pi already covers ~20 LLM backends via one harness) - Match the Pi pattern: provider class + options translator + event bridge + capability matrix, registered with builtIn: false, tests at parity with the Pi suite, docs page in ai-assistants.md - No cap on acceptance - Contributor + community maintain; non-functional providers get deprecated and removed in the next minor unless someone fixes them Cite as direction.md §community-providers when triaging.
…shes after SDK cleanup (#1735) (#1739) The codex-sdk's own finally calls child.removeAllListeners() + child.kill() before Archon's retry-loop finally runs. The subsequent attemptController.abort() fires Node's internal spawn-signal abort listener on the now-listenerless child, surfacing an uncaught AbortError that bypasses try/catch. The per-attempt AbortController is short-lived and goes out of scope at iteration end — no explicit abort() cleanup is needed. Caller signal cancellation is unaffected (removed via removeEventListener in the same finally block). Closes #1735
#1391) (#1730) * feat(workflows): add always_run node opt-out for resume caching Closes #1391. Adds an optional `always_run: boolean` field on every DAG node. When `true`, the node re-executes on resume even if it completed in the prior run. The resume pre-populate filters out always_run node IDs, and the per-node skip-check is gated by `!node.always_run`. Use case: producers whose exit code does not validate their output (bash that writes a file the consumer parses, code generators, fetch scripts). Today a successful-but-garbage producer stays cached across every resume; the only escape is renaming the node. Default is unchanged. Normal cached nodes in the same run still skip. Emits a new `dag.node_always_run_resume_forced` log event so operators can see the flag firing. * workflows: emit node_always_run_reset event on resume opt-out The always_run resume-forced path only wrote a structured log line. The prior_success skip path writes a DB workflow_event, so resume forensics could see skipped nodes but not nodes that were reset from the skip list. Add a symmetric node_always_run_reset event with the prior output so operators can reconstruct resume decisions from the workflow_events table. Drop the trailing PR reference from the comment — surrounding text explains intent.
…ed YAMLs (#1733) Fixes #1535 The workflow-builder's generate-yaml node did not explicitly require generated workflows to reference $ARGUMENTS (or $USER_MESSAGE). When the AI generated single-node workflows that accept user input, it described the input in prose but omitted the $ARGUMENTS substitution variable. The harness captured the user's invocation message but never injected it into the node's conversation. Changes: - Add rule 13 to generate-yaml prompt: every workflow that accepts user input MUST reference $ARGUMENTS in at least one node prompt - Add validation warning in validate-yaml when neither $ARGUMENTS nor $USER_MESSAGE appears in the generated YAML - Regenerate bundled defaults
…chon-refactor-safely (#1734) The analyze-impact and plan-refactor nodes are intentionally read-only (denied_tools: [Write, Edit, Bash]) but their prompts instructed the AI to write files. This caused the AI to waste turns searching for unavailable tools, and the plan/analysis was never persisted to disk. The execute-refactor node then failed to read the plan file, resulting in zero work done despite the workflow reporting completed. Changes: - Update prompts to output analysis/plan directly (captured as node output) instead of attempting file writes - Add persist-impact and persist-plan bash nodes to bridge the context boundary by writing node outputs to $ARTIFACTS_DIR files - Update dependency chain: plan-refactor depends on persist-impact, execute-refactor depends on persist-plan Closes #1477
#1728) * fix(providers): expand ${VAR_NAME} brace syntax in MCP config env vars (fixes #1612) Add two-group regex alternation to expandEnvVarsInRecord so both $VAR and ${VAR} forms are expanded in env/headers values. Add 5 tests for the new brace-form behavior and update MCP servers docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore(ai-layer): evolve AI Layer from PIV run --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…prove/reject (#1743) * Fix: workflow approve/resume discovery for worktree runs (#1663) When a workflow paused at an approval gate is resumed via `workflow approve` or `workflow resume`, the CLI re-invoked `workflowRunCommand` with `run.working_path` as the discovery cwd. If `working_path` is a worktree or workspace clone that does not contain the user's local (often untracked) workflow YAML, discovery failed with "Workflow 'foo' not found" before execution could begin. Separate the discovery path from the execution path by adding an optional `discoveryCwd` to `WorkflowRunOptions`. Resume, approve, and reject now look up the codebase and pass `codebase.default_cwd` as `discoveryCwd`, so the source repo is searched even when `working_path` lives elsewhere. The execution cwd and the existing `findResumableRun` keying are unchanged. Changes: - Add `WorkflowRunOptions.discoveryCwd`; use it for `loadWorkflows` in `workflowRunCommand` - `workflowResumeCommand`, `workflowApproveCommand`, and `workflowRejectCommand` resolve `codebase.default_cwd` (with graceful fallback) and pass it through - Tests covering discovery from `codebase.default_cwd` and fallback to `working_path` when no codebase is available Fixes #1663 * chore(workflows): regenerate bundled defaults after default YAML updates * fix: address review findings from PR #1743 - C1: Remove Write from denied_tools on analyze-impact and plan-refactor nodes in archon-refactor-safely.yaml — prompts write to $ARTIFACTS_DIR/*.md - H1: Add else branch with warn log when codebase record not found (null return) at all three discoveryCwd sites (resume/approve/reject) - H2: Log discovery path when discoveryCwd is set so the searched path is visible to users debugging workflow-not-found errors - I1: Add two regression tests for workflowRejectCommand discoveryCwd path (codebase found and fallback-when-null), mirroring approve/resume parity - Fix mock pollution: remove duplicate getWorkflowRun mockResolvedValueOnce in "throws when on_reject configured but working_path is null" test whose extra queued value leaked into subsequent tests - L3: Drop caller enumeration from discoveryCwd JSDoc; keep only the why - L4: Update codebaseId inline comment to include reject as a caller - L6: Fix workflowRejectCommand JSDoc to describe the auto-resume branch - M1: Add CHANGELOG entry for the #1663 fix under [Unreleased] - M2: Rename stale test name "fall through to auto-registration" to accurately describe the warn-and-fallback behavior on getCodebase failure - Regenerate bundled-defaults.generated.ts after YAML changes * simplify: merge redundant priorCompletedNodes checks into single if/else
…ixes #1738) (#1742) User-bubble <p> and the .chat-markdown typography rules had no overflow-wrap, so long URLs and tokens broke out of the max-w-[70%] container. - MessageBubble: add break-words + min-w-0 to the flex-1 paragraph so it can shrink below intrinsic content width. - index.css: add overflow-wrap: break-word to .chat-markdown p, li, td, and a. Code blocks already use overflow-x-auto and are excluded.
* docs(brand): add brand foundation page on archon.diy - Mount the canonical Archon brand sheet at `/brand/` in the docs site (Penpot-exported standalone HTML, top-right "Console →" cross-link surgically removed via a re-runnable patch script). - Add a Starlight overview page with a Quick reference (gradient, surface) and an embedded full brand sheet. - Sidebar gains a "🎨 Brand" entry between Roadmap and The Book of Archon. - Fix the dark-mode active sidebar link being unreadable (`color: var(--sl-color-white)`). - Require future UI changes to align with the brand foundation (new "UI and Visual Design" section in root CLAUDE.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(brand): switch foundation.html to plain source files, drop decoder scripts The brand sheet now ships as plain Penpot-exported source (Brand.html shell, brand-app.jsx, logo.jsx, tweaks-panel.jsx, standalone-tweaks-toggle.jsx, app.css, archon-logo.png) and is edited like any other code in the repo: open the JSX, change it, refresh the page. - public/brand/foundation.html now loads React + Babel from unpkg (with integrity hashes) and compiles the JSX in the browser. Adds one local override: hide the Penpot Tweaks toggle on the public site. - brand-app.jsx carries our single local delta: the top-right "Console →" cross-link is removed (the sibling Archon Console doc isn't published). - public/brand/README.md documents what each file owns and the local delta. - The 1.5 MB self-extracting bundle and the scripts/brand/ decoder pipeline (_find-console.ts, _dump.ts, _patch.ts) are deleted. Net: the repo loses ~1.5 MB of opaque base64 + 4 maintenance scripts; gains ~85 KB of editable source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(web): recognize loop and approval node types in DAG builder resolveNodeDisplay() fell through to the 'prompt' fallback for loop and approval nodes, giving them nodeType='prompt' with no promptText. useBuilderValidation then raised false-positive "prompt cannot be empty" errors for both node types. Changes: - dag-layout.ts: add loop and approval cases to resolveNodeDisplay() - DagNodeComponent.tsx: extend nodeType union; add TYPE_CONFIG entries and getContentPreview cases for loop and approval - index.css: add --node-loop (teal) and --node-approval (amber) tokens Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(web): add unit and integration tests for loop/approval DAG node types Tests requested by Wirasm for PR #1722: - resolveNodeDisplay(): loop node → { label, nodeType, promptText }, approval → { label, nodeType } - dagNodesToReactFlow() integration: asserts loop and approval nodes have correct nodeType in output - getContentPreview(): loop multi-line prompt returns first line; approval returns empty string - Exports getContentPreview from DagNodeComponent.tsx to make it testable - Extends test script to cover src/components/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: robby_kei <robby_kei@linecorp.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(providers): add GitHub Copilot provider configuration types Define CopilotProviderDefaults with model, reasoning effort, and auth options Include system message injection and CLI path configuration support * feat(providers): add GitHub Copilot community provider integration Implement full provider with session management, streaming, and binary resolution Include comprehensive test coverage and lazy-load SDK pattern * feat(providers): add Copilot provider registration and exports Export CopilotProvider, config parser, and binary resolver utilities Register Copilot provider in community providers initialization * test(e2e): add GitHub Copilot provider smoke and abort tests Include streaming verification, token validation, and interrupt handling Verify connectivity, output plumbing, and session management * feat(copilot): add reasoning effort alias and session timeout improvements Map Archon `max` effort to SDK `xhigh` and extend sendAndWait timeout to 60min Handle fork-session requests with fresh session creation fallback * feat(copilot): add environment variable override support and auto model default Add COPILOT_MODEL env var with envOverrides tracking across config system Update provider to default model to 'auto' and enhance settings UI * docs(copilot): clarify session option handling comment * feat(copilot): add MCP, skills, agents, and structured output support Implement full Copilot SDK feature translation including tool restrictions, session config assembly, and best-effort JSON parsing for structured output * feat(copilot): respect useLoggedInUser to override env token test(copilot): cover env token precedence and override behavior * refactor(copilot): remove isCopilotModelCompatible and model-ref delete model-ref.ts and model-ref.test.ts update copilot index and registration to drop isCopilotModelCompatible export * fix(struct-out): enforce object requirement for structured output parsing return undefined if parsed JSON is not an object add tests covering non-object JSON in structured output parsing * feat(copilot): add isExecutableFile check for Copilot binary implement isExecutableFile using stat/access and use it in path resolution update errors to reference executable file and chmod guidance * feat(copilot): add PATH lookup for copilot binary resolution export resolveFromPath and prefer PATH result when executable * ci(workflows): migrate and add Copilot CI workflows - rename e2e-copilot-abort.yaml to test-workflows/e2e-copilot-abort.yaml - add e2e-copilot-all-features.yaml and relocate smoke workflow to test-workflows * refactor(shared): centralize structured-output parsing and skills update providers to re-export shared implementations expose shared utilities: tryParseStructuredOutput, augmentPromptForJsonSchema * feat(registry): register Copilot community provider update registry tests to cover copilot provider registration verify no collision with built-ins and copilot appears in lists * feat(copilot): defer session error warning and harden abort flow update event-bridge to emit no system chunk on session.error add provider-hardening tests for abort, trim model config and cleanup * ci(workflow): simplify output capture in e2e-copilot-smoke workflow * ci(workflows): restructure Copilot e2e workflows for clarity refactor multiple files into sections for fixtures, demos, and checks * ci(workflow): remove e2e-copilot-all-features workflow * feat(workflows): add e2e-copilot-all-nodes-smoke workflow delete old e2e-copilot-smoke workflow extend Copilot smoke tests to cover all node types and structured outputs * refactor(config): remove envOverrides support and COPILOT_MODEL usage use DEFAULT_AI_ASSISTANT env var to select default ai assistant update tests and docs to reflect new default and env var usage * docs: update Copilot docs and env sample * feat(copilot): implement token precedence for Copilot auth introduce COPILOT_GITHUB_TOKEN and generic GH tokens; track tokenSource reorder provider registration to register Pi before Copilot * feat(copilot): improve binary resolution and skill dir validation use isExecutableFile for vendor and autodetect checks validate skill names to reject absolute or traversal paths * fix: address review feedback on Copilot community provider - Add packages/providers/src/shared/structured-output.test.ts covering augmentPromptForJsonSchema, the happy-path clean parse, fence stripping (both ```json and bare ```), the forward-brace scan recovery for reasoning-model prose preamble, fence + preamble combo, whitespace trimming, invalid JSON, empty input, and the bare-primitive rejection contract (null/number/string/boolean). - Add packages/providers/src/shared/skills.test.ts covering empty/null inputs, non-string and empty-string skipping, missing skills, cwd vs home resolution order, cwd-shadows-home semantics, deduplication, and the name-only contract (rejection of absolute paths, nested paths, and parent traversal). Uses a staged temp HOME so reads are isolated. - Wire both new test files into packages/providers/package.json so they run in CI as separate bun test invocations. - Add `copilot` to the registered-providers list in the validation error example at guides/authoring-workflows.md, add a Copilot bullet to the Model strings section, and add an AI Providers -- Copilot env-var subsection plus DEFAULT_AI_ASSISTANT enumeration to reference/configuration.md. The two duplicate-import HIGH findings from the May 14 review were hallucinations — the imports don't exist in the current branch — so they need no fix. * chore(rebase): resolve semantic conflicts from dev - Update loadMcpConfig import to ../../mcp/config — #1459 (Codex MCP nodes) extracted it out of claude/provider.ts into its own module. - Regenerate bun.lock from current dev (configVersion: 1). Old commits on this branch carried configVersion: 0; rebased forward unchanged but produced different transitive resolution on install (telegram markdown tests fail locally despite identical telegramify-markdown pin). bun install re-adds @github/copilot-sdk on top of the fresh lockfile. * test(copilot): address CodeRabbit feedback on shared/skills tests - Stage the home copy of `delta` in `.agents` (not `.claude`) so the "prefers cwd over home" precedence test actually verifies precedence within `.agents`. Previously the home copy was in `.claude`, which could not have beaten the cwd `.agents` copy regardless of the resolver's behavior. - Add explicit return types on `makeFakeWorld` and the inner `stageSkill` to satisfy the project's strict TS annotation rule. * fix(providers): address remaining Wirasm review items - pi/event-bridge.ts: consolidate the `export-from` + `import-from` pair on shared/structured-output into the idiomatic `import { X }; export { X };` form. The preceding comment already promised "import once for local use and re-export" but the prior order said the opposite. - authoring-workflows.md: add `copilot` to the prose listing of registered providers (the example validation error string below it already includes copilot). * chore(copilot): drop stale "Claude's loadMcpConfig" attribution #1459 (Codex MCP nodes) extracted loadMcpConfig out of claude/provider.ts into a shared mcp/config.ts module. Update the applyMcpServers docblock to reflect that the helper is shared, not Claude-specific. --------- Co-authored-by: Daniel Scholl <daniel.scholl@microsoft.com> Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
…1384) * feat(providers): add OpenCode community provider with correct capabilities - Add OpenCode provider using @opencode-ai/sdk - Support both embedded server and external server modes - Implement session resume, MCP, structured output, env injection - Correctly declare capabilities: hooks, skills, agents, toolRestrictions, effortControl, thinkingControl all supported - Add model/agent validation (one required) - Include E2E smoke workflow and registry tests - Update docs with auth guidance and feature table * feat(providers/opencode): remove agent field - use Archon's own agent impl Archon has its own agent implementation and should not delegate to OpenCode's agent profiles. Removed the agent field from: - OpencodeProviderDefaults interface - parseOpencodeConfig parsing - streamOpencodeSession function - Updated capabilities to agents: false Model is now required (no agent fallback). * feat(providers/opencode): enable agents support with adaptation layer - Flip agents capability from false to true - Add agent adaptation layer that maps nodeConfig.agents to OpenCode API: - Agent selection by sorted key order - Model override from agent config - Tools permissions map (deny wins) - Add 4 tests for agent adaptation behavior - Update smoke test to verify agent field works * fix(providers/opencode): address PR review feedback - Fix assert node to fail with exit 1 when pattern not found - Set effortControl/thinkingControl to false (not wired to SDK) - Replace generic 'terminated' with specific crash patterns - Add TODO for health endpoint (SDK limitation) - Fix race condition in releaseEmbeddedRuntime - Call iterator.return on abort in abortableStream - Tighten isOpencodeModelCompatible validation - Add agent field to OpencodeProviderDefaults type * fix(providers/opencode): address Oracle validation issues - Fix race condition: capture runtime instance at acquire time - Add agent field parsing in parseOpencodeConfig - Tighten isOpencodeModelCompatible to trim whitespace - Update registry test for effortControl/thinkingControl * fix(providers/opencode): address all CodeRabbit review feedback - Replace session.create() health check with global.health() (stateless) - Yield terminal result chunk when stream ends before session.idle - Move comment under agent: field in ai-assistants.md - Change 'Inline sub-agents' support to⚠️ Partial - Preserve insertion order in selectPrimaryAgent (remove .sort()) - Remove redundant nodeConfig argument from streamOpencodeSession - Preserve error structure in session.error handler (err.cause) - Consolidate model-ref validation (parseModelRef in registration.ts) - Update test mocks to include global.health() * fix(providers/opencode): address latest CodeRabbit review feedback - Add warning when multiple agents configured (first wins) - Add 2s timeout to global.health() probe - Add TODO for skipped abort test - Consolidate imports in registration.ts - Fix TypeScript error: use deferred pattern for creationPromise * fix(providers/opencode): address remaining PR review feedback - Fix deferred pattern hang: wire both resolve and reject in deferred promise so startup errors propagate to callers (3137799074) - Fix server close leak: decouple server.close() from cache identity check in releaseEmbeddedRuntime (3137799084) - Update TODO reference to follow-up issue #1400 for abort test (3136883117) * fix(providers/opencode): use direct HTTP fetch for health check The SDK's global.health() method only exists in v2, but we import from the root SDK which uses the old client. Switch to direct HTTP fetch to /global/health endpoint for checking existing servers. - Remove global.health from OpencodeClientLike interface - Use fetch() directly with 2s timeout for health check - Update tests to mock fetch for health check scenarios * fix(workflows): bash quoting for linux compatibility * refactor(providers/opencode): decompose provider into focused modules Extract runtime, session, multi-agent, agent-config, agent-fs, and error handling into separate files to reduce provider.ts complexity. Add inline multi-agent e2e workflow and expand test coverage. * Self AI Review suggestion. * chore: update opencode e2e smoke test with hooks coverage + refresh docs Add hook-node to e2e smoke workflow covering PreToolUse/PostToolUse hooks (10 node types total). Switch smoke model to cpamc/minimax. Remove deprecated baseUrl option and refresh feature support table in docs. * chore(providers/opencode): improve abort error logging and multi-agent e2e workflow * test(workflows): use default model for opencode e2e tests Switch from cpamc/minimax to opencode/big-pickle (provider default) for general e2e testing of OpenCode provider. * fix: match homebrew formula to upstream/dev * fix(providers/opencode): address code review findings - Add CHANGELOG.md entry for assistants.opencode provider (#1703) - Elevate silent debug catches to warn level with context (session, multi-agent, runtime) - Preserve error cause chain in retry loop (provider.ts) - Include retry count in final throw message - Fix doc typo: cofnig -> config - Update CLAUDE.md monorepo layout with community/opencode/ * chore: align SDK versions with origin/dev * version downgrade fix. * Add opencode-ai sdk * fix: enable abort test and remove redundant isModelCompatible - Enable skipped abort test with deterministic setTimeout timing - Remove unused isOpencodeModelCompatible function from registration - Remove isModelCompatible test from registry tests - Update bundled defaults with archon-four-role-loop workflow * chore: regenerate bun.lock to sync with package.json after rebase CI was failing on 'lockfile had changes, but lockfile is frozen' — the lockfile was missing the overrides entries (@hono/node-server, flatted, follow-redirects, path-to-regexp, qs) and had a stale @archon/providers version (0.3.9 → 0.3.12) after rebasing onto current dev. Net diff: +11/-8 in bun.lock, no source changes. * chore: regenerate bundled defaults to sync with current commands state CI failed on 'bundled-defaults.generated.ts is stale' after the lockfile fix unblocked the install step. The generated file was 1 line out of date relative to current dev's command set (drift from rebases). Functional diff is +1/-2 (a single trailing-newline difference in one embedded command); full diff is large only because the file inlines all commands as TypeScript strings. This is mechanical — produced by 'bun run generate:bundled' with no other changes. --------- Co-authored-by: cropse <cropse0219@gmail.com> Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
* fix: coalesce transient chat status updates * feat(web): improve streaming thinking and tool readability
…es (#1523) When a workflow run is approved/rejected via the Web UI but `tryAutoResumeAfterGate` cannot auto-resume — because there is no `parent_conversation_id`, the parent conversation is gone, or the parent sits on a non-web platform (Slack/Telegram/GitHub/CLI) — the success message said only "Send a message to continue" / "On-reject prompt will run on resume". A web-UI user whose run originated from a terminal has no obvious next step from that text and the run sits in `failed` status. Both approve and reject (on_reject branch) now include the exact `archon workflow resume <runId>` command in the non-auto-resumed response, so the web-UI surface always carries an actionable next step. The auto-resume happy path and the no-on_reject cancellation path are unchanged. The Resume endpoint's CLI hints (covered by #1329) are not touched. Closes #1522. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* experiment(console): scaffold primitives-first web UI spike at /console
Greenfield spike of Archon's web UI built around four primitives — Project,
Run, Workflow, Worktree — to validate a simpler mental model before any
migration. Lives under packages/web/src/experiments/console/, mounted at
/console/* outside the shared Layout so it does not inherit the production
TopNav. ESLint no-restricted-imports scope forbids coupling to @/components,
@/contexts, @/hooks, @/routes, @/stores, @tanstack/react-query so the spike
stays extractable or disposable.
Surface:
- Project rail (Discord-style 44x44 tiles with deterministic hashed colors)
with ALL scope toggle, remove-project via right-click, Add Project dialog.
- Runs view split into Active (rich cards for running/paused, pulsing blue
LiveDot for running + amber for paused) and Recent (compact monospace rows
for completed/failed/cancelled). Attention model: running is attention,
completed is audit trail.
- DraftRunCard — inline "start a run" primitive that lives at the top of
the Active list. Collapsed = thin + Start a new run row; expanded = full
card with workflow picker + context textarea. Same shape as a paused
approval card; N keybind expands.
- ApprovalPanel with ApprovalContext preview — shows the actual last
agent message so users see the question being asked, not just the gate
label. Supports capture_response gates and traditional approve/reject.
- Run detail page — header with live-ticking elapsed, StreamToolbar with
Tool calls / System / Graph toggles persisted to localStorage, stream of
StreamCards (message / tool / artifact / node_transition), state-
sensitive ActionBar (cancel/resume/abandon/re-run). Relative timestamps
(+MM:SS from run start) via a small StreamContext provider.
- RunGraphPanel sidebar — dagre TB layout, parallel nodes side-by-side,
loop/approval/bash/command/script/prompt glyphs, status-derived from
node_transition events, click a node to scroll-into-view.
Skill API (packages/web/src/experiments/console/skills/) is the single
mutation surface: listProjects/getProject/addProjectBy{Url,Path}/
removeProject/listWorkflows/getWorkflowGraph/listWorktrees/listRuns/
getRun/startRun/cancelRun/approveRun/rejectRun/resumeRun/abandonRun/
listMessages. Every UI action calls exactly one verb; internal
orchestrators (CLI, Claude Code skill, future LLM driver) hit the same
contract. startRun hides the legacy conversation coupling as a two-call
createConversation -> runWorkflow sequence.
State layer (store/cache.ts) is a Map + subs + useEntity hook. No React
Query, no Zustand, ~100 LOC. Polling fallback every 3s until SSE lands.
Warm theme scoped to .console-root (theme.css) — espresso surfaces,
tangerine accent reserved for CTAs, ocean-blue running, teal-green
completed/approve, amber paused, warm rose-red failed. Production theme
untouched.
Preview route at /console/_preview renders every status, every origin,
swatches for each token.
Milestones done: M1 scaffold, M2 skill+store+populated feed, M3 run
detail + event stream, M5 DraftRunCard, M3 polish (sticky toolbar,
compact tool cards, relative timestamps, empty/system filter, compact
user chips, graph sidebar). Pending: M4 SSE live updates, M6 polish.
* experiment(console): widen project rail with editable title + locator, fix invalidate-without-reload
Rail goes from 44x44 abbreviation tiles to a 240px sidebar of two-line rows:
small color dot + title + monospace locator (owner/repo from a git URL, last
two path segments otherwise). Title is editable per-project — double-click to
rename, Enter saves, Esc cancels, blank reverts to the API name. Override
persists in localStorage (console:displayName:<id>) via a small useDisplayName
hook so the spike stays self-contained. Right-click still removes.
Also fixes a latent bug in store/cache.ts: invalidate() and refetch() cleared
the cache and notified subscribers but never re-ran the loader, so add/remove
project and the run-action / approval flows all required a page reload to
reflect new state. useEntity now registers its loader and ensureLoad() refires
it on any cleared key that still has an active subscriber.
ProjectTile is left in place — still used by /console/_preview.
* experiment(console): fix startRun — pass platform id to dispatch, recover run id by polling
Two bugs were preventing workflows from launching from the spike:
1. The dispatch call was sending conv.id (DB UUID) where the route looks the
conversation up via findConversationByPlatformId. The lookup silently
returned null, the orchestrator dispatched against an unknown reference,
and no workflow_run was ever created. Fix: pass conv.conversationId (the
web-<ts>-<rand> platform id) to /api/workflows/:name/run. Keep conv.id (the
DB UUID) for the parent-conversation match in the recovery step.
2. POST /api/workflows/:name/run returns { accepted, status } — never a run
id, since the workflow_run row is written asynchronously inside the
orchestrator after the HTTP response returns. The old extractRunId() always
threw. Replace with pollForRun(): fetches /api/dashboard/runs filtered by
codebaseId, matches on parent_conversation_id === conv.id, returns the
first hit. Bound at 30s / 400ms interval to absorb cold-start worktree and
isolation-env setup; timeout message points users to the active list since
the run is almost certainly already running by then.
* experiment(console): make startRun optimistic — dispatch and let the runs feed surface the new run
Submit-button no longer blocks for up to 30s while the orchestrator spins up
worktrees and isolation envs. startRun now does just the two dispatch calls
and returns; the workflow_run row appears in the active list as ambient runs
polling picks it up. DraftRunCard fires an immediate invalidate('runs') after
dispatch to nudge the next refetch instead of waiting up to 3s for the next
poll tick.
Drops pollForRun + the runId return value — callers were the navigate-to-run-
detail path only, which traded one bad UX (30s spinner) for another (forced
context switch away from the runs list right after starting). The active card
that appears within a few seconds is a better affordance.
* experiment(console): port to Archon brand foundation — duotone gradient + Geist
Replace the warm espresso/tangerine palette with the cool charcoal +
brand-magenta-to-teal duotone from the Archon brand standalone. All changes
remain scoped under `.console-root` so the production /app surface is
untouched.
theme.css
- Surfaces shift hue 40° → 265° (warm → cool charcoal)
- Accent tokens point at --brand-magenta; --success uses --brand-teal so
affirmative reads as brand
- --brand-gradient + .brand-text / .brand-bar / .brand-bar-soft utilities
added (gradient-soft is the translucent wash used for selected states)
- --accent-ring set to 30% alpha magenta, matching the brand spec
- Geist + Geist Mono loaded from Google Fonts on console route mount
only; .console-root font-family override + higher-specificity .font-mono
rule beat Tailwind v4's @theme inline literal
Components
- ConsoleApp wordmark: .brand-text on "Archon"
- DraftRunCard: 4px gradient strip as an absolute child (keeps the card's
overflow:visible so the workflow picker dropdown can escape); Start run
button background is the duotone bar
- FilterChips: active filter shows a 2px gradient underline pill
- ProjectRail: ALL projects pill now uses brand-bar-soft instead of the
chunky 2px ring with offset
* experiment(console): brand the run detail page
The first brand pass cascaded surfaces + accents into the detail view via
tokens but never threaded the gradient itself through, leaving the timeline
visually flat. This adds three brand moments:
RunDetailHeader
- 1px brand-gradient strip along the bottom edge (replaces the flat
border-border line) anchors the detail view in the same way the
DraftRunCard strip anchors the runs feed
- Run id renders with .brand-text so the focal piece of mono data carries
the duotone
StreamCard
- YOU pill: accent-soft background + brand-magenta text
- AGENT pill: success-soft background + brand-teal text
Role pills now read as the brand duotone across every exchange — magenta
for user (presence/authorship), teal for agent (execution/affirmative)
RunGraphPanel
- Same 1px gradient strip under the GRAPH label; bumps the label color
from tertiary to secondary so the panel header doesn't disappear
theme.css
- Adds --running-soft / --success-soft / --warning-soft / --error-soft
translucent companions for status colors (StreamCard now consumes
--success-soft; the others are there for symmetry)
* fix(experiment/console): surface tool calls from workflow_events
Two independent bugs caused tool calls to never render on the run detail
page despite the toggle being on.
primitives/event.ts
- Server emits `tool_called` / `tool_completed`; the normalizer matched
`tool_started` (a name that's never written). Result: 43 tool_called
events fell through to the text-fallback branch and rendered as
junk-string placeholders elsewhere
- Field names were also wrong: read `toolName` / `args` / `durationMs`
instead of the snake_case `tool_name` / `tool_input` / `duration_ms`
actually present in the JSONB payload, so the few tool_completed
events that did match the branch produced empty entries that
downstream filters dropped
components/RunStream.tsx
- Even with the normalizer fixed, RunStream explicitly skipped
`tool_call` events under the assumption that conversation metadata
is canonical. That's true for Claude (the SDK persists into
message.metadata.toolCalls) but false for Pi / Codex / bash nodes,
which only emit workflow events. Now: if no message carries inline
tool calls, the paired workflow tool events are surfaced instead.
Pairing matches each tool_called to the next unclaimed tool_completed
in the same step so the duration shows correctly.
routes/RunDetailPage.tsx
- Toolbar `toolCallCount` mirrors the same source-of-truth rule so the
"X tool calls" header counts the rendered events, not just the
(empty) inline metadata
* fix(experiment/console): tab bar for Log/Graph, wire System toggle, fix subfoldered workflow 404
Detail page now has a Log / Graph tab pair instead of a fixed-width log
with an optional right-rail graph. Both views get the full main content
area (next to the project rail); switching between them is a toggle, not
a side-by-side compromise.
StreamToolbar
- Hosts the tab pair (Log / Graph) on the left with the gradient
underline indicating the active tab
- "X messages · Y tool calls" + Tool calls / System checkboxes only
render when the Log tab is active — irrelevant in Graph view
RunGraphPanel
- Drops the fixed 420px aside chrome; renders as full-width content
- Bigger node dimensions (160×40, 56/20 sep) for the larger canvas
- Returns to centered overflow-auto when content exceeds viewport
RunDetailPage
- `view: 'log' | 'graph'` state persisted to localStorage
- Layout switches single-view; Log view drops the 820px max-width so
the stream uses the full main area
- Clicking a graph node switches to Log and scrolls to that node's
transition
System toggle (the second half of the fix)
- workflow_started / workflow_completed / workflow_failed were
falling through to the text-fallback branch, rendering as junk
`workflow_started — {payload}` strings
- Added SystemEvent kind + explicit branch in `toRunEvent`; surfaced
in RunStream as compact rows behind the System toggle
- Error events also flow into the same system bucket
Graph 404 fix
- The single-fetch `/api/workflows/:name` endpoint doesn't recurse
into `.archon/workflows/<subdir>/`; subfoldered workflows like
`maintainer/maintainer-review-pr.yaml` were unreachable
- `getWorkflowGraph` now goes through the list endpoint (which does
recurse) and filters by name. One extra row of JSON, but the graph
now resolves for every workflow Archon knows about
* feat(experiment/console): live updates via SSE, drop 3s polling
Replaces the per-page 3s setInterval polling loops in RunsPage and
RunDetailPage with subscriptions to the server's existing SSE streams.
Events flow through the existing cache: an SSE message invalidates the
relevant cache keys, useEntity refetches authoritative state, the UI
re-renders. No partial in-memory event-payload merging — keeps the wire
shape decoupled from React state.
lib/sse.ts
- useDashboardSSE subscribes to /api/stream/__dashboard__ and
invalidates runs:* (and run:<id> if the event carries a runId) on
workflow_status / dag_node events. Mounted from RunsPage.
- useRunStreamSSE subscribes to /api/stream/<conversationPlatformId>
and invalidates run:<id> + messages:<convId> on text / tool_call /
tool_result / workflow_* events. A 100ms coalesce timer dedupes
bursts from streamed text. No-ops while the conversation id is
still null (e.g. before the run detail loads).
RunsPage
- Drops the 3s setInterval that re-fetched listRuns; calls
useDashboardSSE instead.
RunDetailPage
- Drops the 3s setInterval that re-fetched getRun + listMessages;
calls useRunStreamSSE with the platform conversation id.
EventSource auto-reconnects on transient failures, so no explicit
recovery logic is needed; permanent close happens at unmount.
* feat(experiment/console): make System toggle reveal real diagnostic content
The toggle was technically working but only added two thin rows
(workflow_started / workflow_completed) for Pi-driven runs that lack
system-role messages. Functional but invisible. This pass turns it into
the framework-chatter view it should always have been.
What System now reveals
- Workflow lifecycle: workflow_started / workflow_completed /
workflow_failed (existing, now styled to stand out)
- Skipped-node reasons: when a node is skipped, an inline second line
on the NodeDivider shows `reason when_condition · expr ...` — catches
DAG-branching surprises without making the user open the YAML
- Workflow dispatch metadata: assistant messages with
`category: workflow_dispatch_status` (carrying a workflowDispatch
blob) now collapse into a compact 'Workflow dispatch' system row
displaying the workflow name, instead of being rendered as agent
prose. Same for any message whose metadata.category starts with
workflow_ or system_
- Empty / no-signal messages: previously dropped by isMeaningful();
now surface as 'Noise' rows so the timeline is gap-less and SDK
plumbing chatter is visible
Styling
- System rows now use brand-teal for the pill label + a translucent
teal top hairline (instead of a flat charcoal border on all sides).
Border colors land via inline style because the console's
.console-root * { border-color: var(--border) } rule outweighs
Tailwind utility-class color in the cascade; this finally makes
border-success/30 and friends paint the intended hue too
Cleanup
- StreamCard kind styles now own their full border (width + sides +
color) rather than splitting between the base class and a partial
override
- message.ts exports isSystemCategory + WorkflowDispatchMeta so
RunStream can keep the rendering decision local
- event.ts NodeTransitionEvent carries skipReason + skipExpr;
NodeDivider accepts them and renders only when showDetail is true
* feat(experiment/console): cost on cards + reject-with-reason expander
Cost on cards
- Read `metadata.total_cost_usd` into a typed `Run.costUsd: number | null`
- formatCost picks precision by magnitude: $24.35 / $0.023 / $0.0082
- Surfaces on RecentRunRow (between elapsed and origin badge), on
ActiveRunCard (between origin and elapsed), and on the run detail
header (between origin and elapsed). Hidden when null
- typeof === 'number' guard so demo runs without the field don't blow
up at .toFixed()
Reject-with-reason
- ApprovalPanel now has two distinct flows instead of one shared field
+ Approve / Continue: one click, single-line input above for an
optional comment captured as $<node-id>.output
+ Reject: two-step. First click reveals a 3-row textarea with a red
"REASON FOR REJECTING · REQUIRED" label; confirm only enables
with non-empty text
- Cmd+Enter confirms reject, Esc cancels back to idle
- Reduces accidental rejects (which previously fired on any click of
a single button when the input happened to be non-empty) and makes
the reviewer's reasoning explicit and unavoidable
* feat(experiment/console): per-project env vars dialog
A gear icon on each project row in the rail (visible on hover / always
on the selected row) opens an EnvVarsDialog modal that lists, adds, and
removes per-project environment variables. Wires straight into the
existing GET/PUT/DELETE /api/codebases/:id/env endpoints.
Design notes
- The server never returns values, only keys — the UI mirrors that
constraint (no "reveal" affordance, no edit-in-place). To rotate a
secret the user adds a new value at the same key; the server
overwrites
- Key input auto-uppercases for the conventional ENV_VAR_NAME look;
value input uses type=password so it doesn't shoulder-surf
- Cache invalidates on every dialog open so external edits (CLI, other
web sessions) show up — without it the in-memory cache pinned the
stale empty list across close/reopen
- skill.listEnvVarKeys / setEnvVar / deleteEnvVar live in a new
skills/envVars.ts module, exported through skills/index.ts to match
the existing skill-verb surface
* feat(experiment/console): artifact tab with sidebar + viewer
Adds a third tab on the run detail page that lets you browse and read the
files a run wrote to disk — the new go-to surface for plans, reports, PR
diffs, and synthesis docs that workflows produce as their actual output.
Server: GET /api/runs/:runId/artifacts
- Walks the run's artifact directory (recursively, dotfiles skipped)
- Returns { files: [{ path, size, modifiedAt }] }
- Needed because workflow_artifact events are empty for nearly every
run we have — bash/script nodes write straight to $ARTIFACTS_DIR
without emitting an event, so an event-driven file list shows nothing
- Reuses the same owner/repo derivation + path-escape guards the
existing /api/artifacts/:runId/* handler uses
Client: ArtifactPanel
- 260px sidebar lists every file with size + parent-dir hint; clicking
a row loads it into the main viewer
- Viewer renders .md / .mdx through react-markdown + GFM + rehype-
highlight (same stack the old UI used), everything else as
pre-formatted monospace text
- Auto-selects the first file on mount so the tab isn't empty
- "open raw ↗" link in the file header for downloads or PR pasting
- Empty-state copy points at $ARTIFACTS_DIR so users understand what
fills the panel
StreamToolbar
- Tabs now accept an optional count; Artifacts shows it ("ARTIFACTS 7")
so users can tell at a glance whether a run produced anything
RunDetailPage
- The artifact-list useEntity is hoisted above the early returns so
React's hook order stays stable (the obvious-in-retrospect bug that
hit the first attempt — early returns after running detail-related
hooks meant the artifacts hook didn't fire on the loading render)
- Cache key is K.artifacts(runId), shared between the tab badge and
the panel so navigating to the tab doesn't refetch
* feat(console + server): file upload on DraftRunCard
Server
- /api/workflows/:name/run now accepts multipart/form-data alongside the
existing application/json. conversationId + message + files[] (max 5,
≤10 MB each). Body schema dropped from the OpenAPI route config so
@hono/zod-openapi doesn't try to validate multipart against the JSON
shape — same pattern sendMessageRoute uses. Handler manually branches
on content-type
- persistUploadedFiles helper lifted out of sendMessageRoute so both
routes go through the same validate-write-rollback logic. Returns
either { ok: true, savedFiles, uploadDir } or a structured error the
caller forwards via apiError. sendMessageRoute is untouched for this
pass; could be refactored to use the helper later
- extraContext.attachedFiles + filesToCleanup are passed straight to
dispatchToOrchestrator so cleanup happens inside the lock handler,
after handleMessage completes — matches the freeform-message flow
Client
- skill.startRun gains an optional files: File[]. With files, posts
multipart (browser-set boundary); without, keeps the JSON path
- DraftRunCard handles three input paths the chat input has always
handled: drag-and-drop on the whole card, paste of clipboard images
inside the textarea, and a paperclip button that opens the file
picker. Same MAX_FILES=5 and MAX_FILE_BYTES=10 MB caps the server
enforces, surfaced as inline errors
- File chips render above the start row with name + size + remove (X).
Drag-over shows a brand-gradient-soft overlay with a "drop files to
attach" pill so the affordance is obvious without persistent chrome
- Collapse / submit both clear the file list so reopening the card
starts clean
* feat(experiment/console): open-in-IDE, rerun, SSE-drop safety net
Three tier-2 affordances that each accelerate the iteration loop without
adding chrome.
Open in IDE
- vscode://file/<workingPath> button on ActiveRunCard (hover), every
RecentRunRow (hover), and RunDetailHeader (always visible)
- Hidden when /api/health reports is_docker=true. The first request
defaults isDocker to true so a flash of broken links inside Docker
never happens — matches the old UI's safer default
- new lib/health.ts exposes useIsDocker() (cached via useEntity on the
'health' key so all callers share one fetch) and openInIde(path)
which normalises backslashes on Windows paths the same way the old
Header.tsx did
Rerun
- ↻ button on completed/failed/cancelled RecentRunRows. Navigates to
/console/p/<id>?rerun=1&workflow=<name>&message=<userMessage> with
URLSearchParams so spaces / unicode survive
- DraftRunCard watches searchParams: when rerun=1 arrives (whether by
fresh mount or by within-component navigation) it expands the card,
fills the workflow picker + textarea, then strips the params via
setSearchParams(..., { replace: true }) so a reload doesn't re-fire
- Deliberately depends on [searchParams] not [] — the rerun click
typically lands while DraftRunCard is already mounted (same project
route, search-param-only change). The empty-deps version was the
bug that made the first attempt look like nothing happened
SSE-drop safety net
- 30s setInterval on RunDetailPage that invalidates K.run(runId) +
K.messages(convId) while status is running or paused
- Stops automatically the moment status flips terminal, so it's not
polling proper — just a heartbeat refetch that catches dropped SSE
streams (network hiccup, mobile sleep/wake) without us noticing
- Replaces nothing — the existing useRunStreamSSE keeps streaming
when the connection is alive; this is purely a "if we missed the
terminal event, find it within 30s" insurance
* fix(experiment/console): project rail — selection visible, identity vs status, real path locator
Five compounding issues in the project rail, all addressed.
1. Routing param read (the load-bearing bug)
ProjectRail mounts outside the inner <Routes> (it's sibling to <main>
in ConsoleApp), so useParams() returns {} for it. `scope` was
always 'all'; the ALL PROJECTS button was always aria-pressed=true;
the selected ProjectRow never received selected:true and therefore
never showed the ring or background. Fix: useLocation() + a regex
pull on `/console/p/:id`.
2. Selection is now unmistakable
Each row paints a 4px brand-gradient left strip + bg-surface-elevated +
brighter title when selected. Replaces the magenta ring (which was
invisible against the dark inset background even when it did fire).
The gradient strip rounds at the corners via rounded-l-md so we
don't need overflow-hidden on the row — which had been clipping the
⋯ menu dropdown.
3. Identity vs status disambiguated
The hash-coloured dot was identity (project tile color) but read as
a status indicator. Replaced with a 20×20 rounded square showing the
project's first letter on the hash-coloured background — clearly a
"this is which project" affordance, can't be confused with status.
4. Activity status, when it exists
Right-side dot is now real: pulsing blue when the project has a
running run, pulsing amber when paused, solid red when only failed
runs are recent. Idle projects show nothing. Sources data from the
shared K.runs('all') cache (so the dashboard SSE invalidation we
already have keeps it live; no extra fetch). Priority: running >
paused > failed-only, so a project with one running and one failed
run reads as "running", not "broken".
5. Locator below the name = the actual local path
formatProjectLocator now returns `~/path/to/project` (homedir
shortened). The old `owner/repo` derivation was identical to the
project name for github projects, so the row read as duplicated
text. After rename, the path stays as a stable identity anchor —
which is what the user wanted: "rename a project but still show the
path below."
Bonus fixes
ALL PROJECTS button: same selection treatment as project rows
(strip + elevated bg), sentence case label ("All projects"), uses
an `∗` avatar in a small square — visually consistent with rows.
Remove project is now discoverable: ⋯ menu button on hover (always
visible on the selected row), opens a small dropdown with "Remove
project". Right-click still works for power users and now also
opens the same menu.
Add project hover treatment normalised to border-bright/surface-hover
to match the rest of the rail (used to be magenta).
* refactor(experiment/console): drop avatar + activity dot from project rail
Both added noise more than signal:
- The first-letter avatar carried no information for owner/repo names
(we were rendering the owner's first letter). Removed it entirely
rather than try to derive something cleverer
- The right-side activity dot lit up red for any project with a
failed run in recent history. That's a thing that happened, not
something the user needs to act on from the rail. Removed
The rail row is now: optional gradient strip when selected, title,
path subtitle, hover actions (gear + ⋯). Selection is still
unmistakable via the brand strip + elevated background + brighter
title color. Width is reclaimed for the path (Widinglabs/sasha-demo's
full ~/Projects/mine/sasha now fits where it was truncated before).
Also drops the matching ∗ avatar from the "All projects" row for
consistency, and the K.runs('all') fetch + deriveActivityByProject
helper that only existed to feed the now-gone status dots.
* feat(console + old ui): real logo, drop spike chrome, cross-UI switch buttons
Console header
- Replace text-only "Archon" with the actual shield mark from
packages/web/public/favicon.png (the existing brand mark) +
gradient wordmark
- Drop the "spike" badge — the experiment is real enough now; the
"console" tag stays as a "this is a separate surface" hint
- Drop the stray "m2 populated" telemetry text in the right slot;
replaced with a small "← Old UI" link so users always have an
escape hatch back to the classic chrome
Old UI TopNav
- Add a gradient "Try the new console →" CTA between the last tab
and the version readout. Inline-styled with the brand
magenta → violet → teal gradient because the old UI's token set
doesn't include the brand-gradient variables (those live in the
console-scoped theme.css)
- Sized to read as a primary CTA without dominating the nav. Arrow
nudges 2px on hover for an inviting affordance
* tweak(old ui): rename console CTA to 'Try the new console UI'
* fix(experiment/console + server): satisfy validate suite after rebase
Type-check
- Demo run factories in RunsPage and PreviewPage now include
costUsd: null so the test fixtures match the Run type that was
extended with the new cost field
- startRun's HttpError throw on multipart failure now passes the
URL path as the 2nd arg (HttpError takes status/path/body) so
the upload-error path constructs correctly
Server test
- /api/workflows/:name/run only forwards the message metadata 4th
arg to addMessage when files are present, so the JSON path keeps
the 3-arg signature the existing api.workflow-runs.test asserted
Format
- prettier --write on eslint.config.mjs and theme.css
Telegram-markdown blockquote tests are 3 pre-existing failures on dev
(verified by checking out dev's adapters/ before the run) — unrelated
to this PR's scope.
* fix(console): correct silent invalidate + recover errored entries (C1+C2)
The cache's invalidate(prefix) checked `key === prefix || key.startsWith(`${prefix}:`)`
so passing 'runs:' looked for 'runs::' — three callers (ApprovalPanel
approve/reject, RunActionBar cancel/resume/abandon) silently did nothing,
and the runs feed only refreshed on the next SSE event. Drop the trailing
colon at the three sites.
Separately, errored cache entries lived only in the `errors` Map, but
invalidate() walked `cache.keys()` only — so a failed fetch was stuck
until full page reload. Extend the walk to both maps so recovery works.
* fix(server): guard new artifacts route + register OpenAPI (C3+I1+I3+I4)
Convert GET /api/runs/:runId/artifacts from raw app.get() to
registerOpenApiRoute against a typed schema (ArtifactFile +
ListArtifactsResponse in workflow.schemas.ts). The route was the only
recently-added endpoint bypassing the project's OpenAPI rule
(CLAUDE.md L25) without a constraint that justifies it — the response
is plain JSON of a fixed shape. Generated types now include it, so
skills/runs.ts re-exports the schema type instead of maintaining a
parallel hand-written interface (I3).
Other guards on the same handler:
- I1: defense-in-depth path-containment check on the resolved
artifact directory. A maliciously crafted codebase name (`..` in
owner/repo) would have escaped ARCHON_HOME; now blocked with a
400 + artifacts.path_escape_blocked log
- I4: getCodebase() now wrapped in try/catch, mirroring the
getWorkflowRun() block above it. DB errors produce a logged 500
instead of an unlogged crash
- I3: stat() error swallow narrowed — ENOENT/EACCES are skipped
(file deleted mid-walk, permission flip) but unknown errors now
propagate to the outer artifacts.walk_failed log + 500 response,
so we never return a half-list silently
* fix(console): real defects from review (CR-1..CR-5, CR-7, CR-9, I2, I5)
- AddProjectDialog: import FormEvent from 'react' instead of relying on
the ambient React namespace which isn't actually imported here. Real
type bug in strict-mode setups (CR-1)
- lib/sse: route EventSource opens through SSE_BASE_URL so dev bypasses
the Vite proxy. The proxy buffers SSE; bare paths reintroduce the
buffering useSSE already worked around in the old UI (CR-2)
- DraftRunCard: guard Enter-submit during IME composition. Without the
e.nativeEvent.isComposing check, Japanese/Chinese/Korean candidate
selection dispatches the run prematurely (CR-3)
- display-name: wrap localStorage in try/catch. Private-browsing modes
throw SecurityError and crashed the rail row on mount (CR-4)
- ActiveRunCard: add role/tabIndex/onKeyDown so the card is operable
with Enter/Space, matching RecentRunRow which already had this (CR-5)
- eslint.config: harden import-restriction patterns. * → ** so nested
paths (@/components/layout/foo) can't slip past, and the @/lib/api
restriction now applies to all named imports rather than only the
default. Generated types from @/lib/api.generated are still allowed
via a different module path (CR-7)
- NodeDivider: only emit the scroll-anchor id on 'started' transitions
so multiple transitions for the same node don't produce duplicate
ids in the DOM. The graph 'jump to node' still works (it lands on
the entry point, which is the right target anyway) (CR-9)
- primitives/workflow: toWorkflow now preserves 'global' as a distinct
source. Previously `raw.source === 'project' ? 'project' : 'bundled'`
silently demoted home-scoped (~/.archon/workflows) workflows to the
bundled badge + sort rank (I2)
- lib/sse: SSE onerror logs at console.warn when readyState is CLOSED,
so dropped streams aren't completely silent (I5)
* fix(console): SPA nav + nullable project type + truncate multipart errors (CR-6, CR-8, S4)
- TopNav and ConsoleApp: swap <a href> for <Link to> on the cross-UI
switch buttons. Same React app, same DOM tree, no need to trigger a
full reload (CR-6)
- RunsPage and RunDetailPage: useEntity<Project | null> instead of
useEntity<Project> with a Promise.resolve(null as unknown as Project)
loader. Removes the type cast and keeps downstream readers honest
about nullability — added explicit `if (detail === null)` guard in
RunDetailPage where the type narrowed (CR-8)
- skills/startRun: multipart error path now truncates to 200 chars
matching requestJson, so an HTML 502 body doesn't land in the error
toast as raw markup (S4 from multi-agent review)
* test(server): 6 tests for GET /api/runs/:runId/artifacts (I6)
Cover the branches that can be tested without mocking fs/promises:
- 400 for invalid run ids that fail the [A-Za-z0-9_-] regex guard
- 404 when the workflow run does not exist
- 200 + empty files when run has no codebase_id (orphan)
- 200 + empty files when codebase name lacks owner/repo shape
- 500 when the codebase DB lookup throws
- 400 when the resolved artifact dir escapes ARCHON_HOME
(defense-in-depth path-containment guard)
Multipart-dispatch unit testing would require mocking c.req.parseBody —
deferring; the end-to-end multipart round-trip was verified during
development against a real workflow with server-side
`run_workflow.files_uploaded` log + upload dir written under
~/.archon/artifacts/uploads/. The existing JSON-path tests continue
to assert addMessage is called with 3 args (not 4) for the JSON branch.
Tweaks to the test harness:
- paths mock now exports getArchonHome and getRunArtifactsPath so the
new handler can resolve a deterministic test path
- getCodebase is now a top-level mockGetCodebase that supports
.mockImplementationOnce per-test
* docs: register new artifacts endpoint + clean stale references (I7+C4+S5)
CLAUDE.md
- Add GET /api/runs/:runId/artifacts to the API Endpoints section
- Extend the directory tree to mention packages/web/src/experiments/
(lint-guarded in-repo spike directory, currently hosting /console)
- Update the registerOpenApiRoute rule to enumerate the two narrow
exceptions: raw-content wildcard routes (e.g.
/api/artifacts/:runId/*) and multipart-or-JSON routes (drop
request.body from the route config; handler parses both)
docs-web/reference/api.md
- Add the artifacts row to the Runs table + a 'List Run Artifacts'
section with curl
- Expand the 'Run a Workflow' example to show the new multipart
branch alongside the existing JSON one
packages/web/src/experiments/console/README.md
- Replace the dead /Users/rasmus/.claude/plans/quiet-twirling-bentley.md
link with a Status section noting that milestone planning has been
superseded by PR-template-driven feedback
packages/web/src/experiments/console/lib/format.ts
- Drop the orphan JSDoc that described formatProjectLocator above the
formatCost function
packages/web/src/experiments/console/theme.css
- The 'maps --color-* to --base vars' line invented terminology that
doesn't exist in Tailwind. Replace with the accurate version:
@theme inline defines color tokens that reference plain CSS vars,
redefining those vars inside .console-root cascades through every
utility that reads them
packages/server/src/routes/api.ts
- persistUploadedFiles docstring no longer claims to be shared by
both message + workflow routes (only run uses it today;
sendMessageRoute still inlines the same logic and could migrate
in a separate pass)
store/cache.ts and routes/RunDetailPage.tsx
- Drop the (M4) milestone references — the SSE wiring landed weeks
ago; the comments now describe the actual lib/sse.ts coupling
* feat(console): neovim-style keymap for project / workflow / run selection
Adds a light-modal keymap so picking a project, picking a workflow, and
starting a run can all be driven from the keyboard:
- p anywhere: full-screen project palette (subsequence fuzzy match,
↑↓/Enter/Esc, listbox + combobox a11y)
- n in a project: opens the draft card and auto-summons the workflow
picker; closing the picker hands focus to the context textarea
- ? anywhere: keyboard shortcuts overlay (esc/? to dismiss)
- runs feed: j/k move, gg/G jump, Enter open, Esc clear, / focus search,
1-5 filter by status (with magenta selection ring + scroll-into-view)
- run detail: 1/2/3 tabs, t/s toggle tool / system rows, a/r approve /
reject (paused only), Esc/h back to runs
Shared infrastructure in lib/keymap.ts: chord buffer with 500ms window,
input + modal-dialog guards so route bindings don't leak through when a
palette is open. Help catalogue lives in lib/shortcuts.ts and is kept
in sync per-page.
…slash commands (#1757) * feat(slack): umbrella Slack UX upgrade — buttons, status, reactions, slash commands Single Slack adapter PR pulling together the in-thread interactivity primitives the team will need on a shared instance: - Interactive Block Kit Approve/Reject buttons on approval gates - Cancel button on a per-run status message edited in place as DAG nodes progress - Lifecycle reactions on the triggering message (🔄 → ✅ / ❌) - Native `/archon` and `/archon-workflow` slash commands (Socket Mode, no URL needed) - `_part i/n_` annotations on long replies split across multiple messages - Italic cost/token footer after direct-chat replies and on terminal workflow status Approve/Reject/Cancel buttons call existing platform-agnostic operations (approveWorkflow / rejectWorkflow / abandonWorkflow); no schema or workflow engine changes. Authorization re-uses the existing SLACK_ALLOWED_USER_IDS whitelist for button clicks and slash commands. Per-user attribution in thread context is intentionally deferred to a separate PR — it needs a user_id column on conversations/messages/workflow_runs and orchestrator plumbing. * fix(adapters): declare @archon/providers as workspace dep CI's stricter package-resolution caught that @archon/adapters imports @archon/providers/types (TokenUsage) without declaring the workspace dependency. Locally bun resolved it transitively via @archon/core; CI's clean install does not. * fix(slack): address coderabbit review - Drop ephemeral denial from slash command auth path so unauthorized users are silently rejected, matching the existing app_mention / message.im pattern. Posting a denial leaks that a bot is listening. - Surface failureReason on cancelled runs too, not just failed. The type already documents this for both terminal states. - Stop forwarding raw error messages to Slack when a cancel click fails. Backend / DB errors stay in server logs; user sees a generic "check the server logs or try again" line. Adds a test for the cancelled-with-reason rendering. * fix(slack): address 6-agent PR review Critical: - Declare @archon/workflows as an explicit workspace dep on @archon/adapters (same class of fix as the providers one). Resolves today via hoisting but breaks under stricter installs. - Split workflow-bridge.test.ts into its own bun test invocation so its irreversible mock.module() calls on @archon/core and @archon/workflows/event-emitter cannot leak into the slack/telegram batch. - Fix "trailing-edge" debounce comments — the implementation is leading-edge. Document the Slack chat.update rate limit as the 500ms rationale. Important: - Wire slackBridge.detach() into the server graceful shutdown path so the event subscription doesn't leak and a pending chat.update can't fire against a closed Bolt socket. - Drop dead `comment` plumbing through handleApprovalDecision / applyResolutionEdit / buildApprovalResolutionBlocks — Block Kit buttons have no UI to capture it. - Widen the action-handler try/catch to also cover applyResolutionEdit so block-builder or chat.update failures don't bubble as unhandled rejections. - Cancel-click with missing run state now logs and posts an ephemeral acknowledgement (using the button message's channel/ts) so the user isn't left wondering whether the click registered. - Use Bolt's BlockButtonAction / ButtonAction types directly on the app.action() registrations instead of the ad-hoc ActionBody / ActionElement aliases. Test coverage: - Slash command silent-rejection of unauthorized users. - triggeringMessages 1000-entry FIFO eviction at the cap boundary. - Slash command seed-post failure → ephemeral error + handler not called. - Single-chunk message path skips the _part i/n_ footer. - rejectWorkflow → { cancelled: true, maxAttemptsReached: true } branch. Docs: - architecture.md IPlatformAdapter listing includes sendResultFooter. - approval-nodes.md mentions the Slack in-thread Approve button. - CLAUDE.md test-isolation batch count for @archon/adapters updated to 6 (was 3 — pre-existing drift, now also accounts for workflow-bridge). Polish: - removeReactionSafe gets the same intentional-fallback comment as addReactionSafe (no_reaction is a normal terminal-state interleave). - IPlatformAdapter.sendResultFooter signature uses TokenUsage directly. - Drop "for v1" tag on the unhandled-event comment. - Remove what-comments from blocks.ts / blocks.test.ts / adapter.ts.
) * fix(orchestrator): resume interactive workflows on chat platforms (#1741) Interactive approval-gate and interactive-loop workflows started from Slack, Telegram, Discord, or GitHub never resumed after the user provided their answer — each approval response triggered a brand-new workflow run from node 0 in a fresh worktree, re-asking the same questions indefinitely. The cause was a `platform.getPlatformType() === 'web'` gate that wrapped the entire resume-detection block in `dispatchOrchestratorWorkflow`, leaving all chat platforms to unconditionally fall through to a fresh `executeWorkflow`. The chat-side `resumeRun` mechanism that previously handled this was removed in #915 (natural-language approval routing) without lifting the resume lookup out of the web branch. Changes: - Restructure dispatchOrchestratorWorkflow so resume detection (findResumableRunByParentConversation + hydrateResumableRun) runs for every platform; only the background-dispatch branch remains web-only - Add codebaseId parameter to findResumableRunByParentConversation so persistent chat conversation IDs (Telegram chat_id, Slack thread) cannot resume a stale run from a different project - Add tests for chat resume, codebase scoping, and fresh-run fallback Fixes #1741 * test(orchestrator): strengthen mock coverage and add web non-interactive resume test - Add hydrateResumableRun to executor mock in orchestrator.test.ts to mirror the real module exports and prevent opaque TypeErrors for future test contributors - Add test asserting that a web non-interactive workflow with a resumable run resumes foreground rather than dispatching a fresh background run, pinning the priority order of the if/else if dispatch block * simplify: inline single-use mock vars in orchestrator.test.ts
…1703) (#1746) createCodebase() hardcoded 'claude' as the fallback when ai_assistant_type was not provided. Now checks process.env.DEFAULT_AI_ASSISTANT first, consistent with how getOrCreateConversation() resolves the default. Falls back to 'claude' only when both the parameter and env var are unset.
…or and workflow runs (#1783) * feat(core): plumb user_id from chat/forge adapters through orchestrator and workflow runs Adds remote_agent_users + remote_agent_user_identities tables (Archon identity + per-platform mapping, UNIQUE(platform, platform_user_id)) and threads a resolved user_id through HandleMessageContext into the orchestrator, workflow executor, and isolation resolver. Every new conversation, message, workflow_run, and isolation_environment row created from Slack/Telegram/Discord/GitHub now carries attribution. Slack additionally enriches first-sight users with their real name via users.info (requires bot scope users:read — reinstall the app to grant). Telegram/Discord derive display name from the inbound event payload. GitHub resolves event.comment.user.login or event.sender.login on each webhook. Resolution failure warn-logs and continues — never drops a message. Schema is additive and nullable everywhere: existing rows remain valid with NULL, ON DELETE SET NULL on every new FK. Web POST /api/conversations and the CLI continue to write NULL user_id; those surfaces become attributed in a follow-up PR. Solo installs with GITHUB_TOKEN are unchanged. Race-safe create-on-first-sight: UNIQUE(platform, platform_user_id) trips on concurrent first-sight webhooks; the losing transaction rolls back and we re-SELECT the winner's identity. Orphaned identities (user row deleted out from under them) are auto-repaired. Foundation for the small-team Archon initiative. Follow-ups will swap the shared GITHUB_TOKEN for a GitHub App and wire per-user GitHub tokens via device flow. * fix(core): address PR review — FK semantics, narrowed error handling, identity type union Critical fix: SQLite migrateColumns ALTERs now include ON DELETE SET NULL on all four new user_id / created_by_user_id FK columns. Upgraded SQLite DBs previously inherited the default NO ACTION ≈ RESTRICT semantic, contradicting the PR's documented "no destructive cascade on user deletion" guarantee. Hardening on the new user-identity surface: - findOrCreateUserByPlatformIdentity narrows its race-recovery catch to true UNIQUE-constraint violations (PG sqlstate 23505 or SQLite "UNIQUE constraint failed" message). Any other error logs as user.create_failed and propagates — no more masking generic DB failures as recoverable races. - backfillDisplayName wraps its UPDATE in try/catch with a dedicated warn event. A failed opportunistic backfill must not silently fail the entire user resolution path; the caller already has the resolved user row. - repairOrphanedIdentity now logs user.identity_orphan_repair_failed on transaction failure (previously surfaced only as a generic resolve_failed upstream). - New IdentityPlatform literal union ('slack' | 'telegram' | 'discord' | 'github' | 'web' | 'cli') replaces the unconstrained `platform: string` on UserIdentity and the findOrCreate signature. Typos now fail at compile time rather than silently breaking the UNIQUE(platform, platform_user_id) invariant. - user.create_started/_completed/_failed are now properly paired per the project event-naming convention. Slack adapter: - users.info missing_scope WARN now gated by an instance flag so it fires once per adapter lifetime instead of once per unknown user. The misconfiguration is permanent — flooding logs after every restart in a 100-user workspace was the wrong shape. - users_info_failed log strips err (which can include err.data with workspace metadata) in favor of structured errMessage / slackErrorCode / slackUserId fields. No PII through the log pipeline. Server resolver: - resolveUserId exported for testability and now logs as a single static event server.user_resolve_failed (platform in structured fields) instead of the templated ${platform}.user_resolve_failed which collided with the GitHub adapter's own event name. - Dead `=== null` branch removed (TypeScript already narrows the type). GitHub adapter: - User-identity resolution moved up to immediately after self-filtering + @mention checks. Now runs before the codebase-ensure and comment- history Octokit calls so resolution can't be silently skipped by an upstream Octokit failure (which was masking a missing-mock bug in the existing test suite). Tests: - New packages/server/src/resolve-user-id.test.ts covers the never-throws contract that three adapter handlers depend on. 6 cases including the static-event-name regression. - GitHub adapter test now mocks @archon/core/db/users and covers the comment.user.login ?? sender.login attribution fallback in both directions, plus a never-throws case for resolution failure. - users.test.ts gains the asymmetric-backfill case, backfill-failure- does-not-block-resolution case, both PG and SQLite UNIQUE-error shapes for race recovery, and a non-UNIQUE-rethrows-without-recovery case that explicitly counts the query calls. - isolation-environments.test.ts adds a "ON CONFLICT does NOT update created_by_user_id" regression guard so a copy-paste in the SET clause can't silently transfer ownership across re-activations. Comment cleanup: stripped the (PR-A) / (until PR-C) / pre-PR-A history labels from production types, migrations, and source files. They were PR- state markers that would rot on merge; the substantive WHY content stays. Docs: - CLAUDE.md table count: 8 → 10; users and user_identities documented. - docs-web/reference/database.md: 8 → 10 with explicit ON DELETE semantics and a note that re-running 000_combined.sql is idempotent and picks up the new ALTERs. - docs-web/reference/architecture.md: 7-table diagram → 10-table; full schema block extended with the new tables and user_id columns. - docs-web/adapters/slack.md: users:read scope added to the Bot Token Scopes setup with a note about graceful degradation if omitted. Skipped (with reason): - Converting User/UserIdentity to z.infer<typeof schema>: all sibling row interfaces in types/index.ts are hand-crafted; doing this for just the two new types creates inconsistency. A separate consistency pass should convert the whole file, not selectively. - Threading userId into the four web/CLI addMessage callsites: those surfaces don't have an auth flow yet, so threading now means passing `undefined` from every caller. Added explicit TODOs at each callsite pointing at the upcoming web/CLI auth work instead.
…tion routing (#1788) Phase 2 of the team-foundation PRD. Replaces the bot's single shared GITHUB_TOKEN PAT with a registered GitHub App that supports multi-installation token routing from day one. New @archon/core/github-auth/ module wrapping @octokit/auth-app with a three-level cache: - lookupCache: owner/repo → installationId (1h TTL; evicted on 401) - tokenCache: installationId → access token (1h GitHub TTL, refreshed 5min before expiry) - octokitCache: installationId → Octokit (per-installation auth strategy; evicted on 401 so the SDK's hidden internal token state can't keep serving the dead token) GitHubAdapter takes a `GitHubAuth` discriminated union at construction. All 4 Octokit callsites (postComment / listComments / repos.get / pulls.get) plus the clone path route through resolveOctokit + a withTokenRefresh wrapper that calls invalidateRepo and retries once on 401. Webhook event.installation.id primes the lookup cache to skip a round-trip. Secondary self-filter compares against `<slug>[bot]` in App mode (via a botLogin getter, distinct from botMention) so PR-C's per-user tokens won't trip it. Server bootstrap detects App vs PAT mode via env and fails fast if both are configured. In App mode it registers the provider on a module singleton consumed by createWorkflowDeps(), so the workflow executor's bash/script subprocesses inherit a fresh GH_TOKEN/GITHUB_TOKEN. New POST /internal/git-credential endpoint (App mode only) backs a POSIX git credential helper installed at clone time, covering workflows that outlive the 1h installation-token expiry. The public-bind guard runs BEFORE Bun.serve so a rejected config never opens the listening socket — opt-out via ARCHON_ALLOW_INTERNAL_ON_PUBLIC_BIND=1 for deployments where the reverse proxy already drops /internal/*. Refactor + extracted helpers in server/src/github-auth-bootstrap.ts (selectGitHubAuthMode + parseGitCredentialPath) so the security-critical decisions are testable in isolation without spinning up Hono. Backwards compat: solo installs running GITHUB_TOKEN only see zero functional change. All 54 existing PAT-mode adapter tests pass unchanged. Tests added: 23 strictly-mocked auth-module tests (PRD Q7 — no live api.github.com in CI); 10 new App-mode adapter tests (multi-install routing, payload short-circuit, 401 retry + retry-on-retry propagation, AppNotInstalledError surfacing, clone-token resolution, post-clone credential helper install); 20 server-bootstrap unit tests (dual-mode fail-fast, /internal path validation incl. traversal + null bytes). Closes phase 2 of .claude/PRPs/prds/github-app-and-user-identity.prd.md. Depends on #1783 (PR-A user-identity foundation).
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Release 0.4.0
GitHub App auth for the bot, multi-user attribution, Slack UX overhaul, experimental
/console, two new community providers (OpenCode and GitHub Copilot), Codex MCP support, and broad workflow/provider hardening.Added
GITHUB_TOKENPAT with a GitHub App + multi-installation routing. Each repo resolves to the installation that owns it; tokens are minted on demand, cached per installation, refreshed before expiry, and never persisted. Includes a loopback-only/internal/git-credentialendpoint (with a hard127.0.0.1bind check, opt-out viaARCHON_ALLOW_INTERNAL_ON_PUBLIC_BIND=1) so long-running workflowgitoperations can transparently refresh installation tokens via agit-credential-archonhelper installed into the worktree's.git/config(feat(core): swap shared GitHub PAT for GitHub App with multi-installation routing #1788).user_idis now plumbed from chat and forge adapters through the orchestrator intoconversations,messages,workflow_runs, andisolation_environments. Newusersanduser_identitiestables map platform identities (Slack U-id, Telegram chat id, Discord snowflake, GitHub login) to an Archon-internal user, created lazily on first sight (feat(core): plumb user_id from chat/forge adapters through orchestrator and workflow runs #1783)./console, mounted as an isolated in-repo spike underpackages/web/src/experiments/console/. Lint-guarded against importing production web modules so it can be dropped in or deleted cleanly (feat(web): experimental console UI at /console #1747).assistants.opencodeprovider: community provider that runs OpenCode as an embedded runtime, with per-node agent materialization, multi-agent sessions, structured output, token usage, and multi-agent MCP tool execution (feat(providers): add OpenCode community provider with agents support #1384).builtIn: falseprovider in the registry (feat(providers): add GitHub Copilot community provider #1505).loadMcpConfigmodule — passmcp: <path>on a Codex node and the config is translated to Codex'smcp_serversoverrides at runtime. MCP client errors are surfaced to the workflow author assystemchunks when MCP is explicitly configured for the node (feat(workflows): support Codex MCP nodes #1459).always_runnode opt-out for resume caching: opt-out for nodes that must re-execute on every resume rather than being skipped as "already completed" (closes Auto-resume cache traps workflows on "successful" producer with bad output — only escape is renaming the node #1391, feat(workflows): add always_run node opt-out for resume caching (closes #1391) #1730).packages/docs-web/src/content/docs/brand/(docs(brand): add brand foundation page on archon.diy #1745).piv-system-evolutionandarchon-comprehensive-mr-review.Changed
loopandapprovalnode types and renders them correctly (fix(web): recognize loop and approval node types in DAG builder #1744).source/subdirectory to match the standard workspace layout (fix(adapters): place webhook clones in workspace source/ subdirectory #1554).safeSendMessageconsolidated intoexecutor-sharedto remove duplication across executor variants (refactor(workflows): consolidate duplicated safeSendMessage into executor-shared #1496).Fixed
workflow approve/resume/rejectno longer fail with "Workflow not found" when the run's working path is a worktree or workspace clone. Resume, approve, and reject now usecodebase.default_cwdfor workflow YAML discovery, falling back toworking_pathwhen no codebase record is found. Fixes workflow approve/resume fails for project-scoped workflows with worktree.enabled: false #1663 (fix(cli): use source checkout cwd for workflow discovery on resume/approve/reject #1743).DEFAULT_AI_ASSISTANTis now read increateCodebaseso the env var actually controls the default assistant for newly registered codebases (fixes feat(db): codebase ai_assistant_type should inherit defaultAssistant, not hardcode "claude" #1703, fix(db): read defaultAssistant from config in createCodebase (fixes #1703) #1746).decidenode hardened against non-JSON ai-review output so a prose-prefixed verdict doesn't crash the workflow.${VAR_NAME}brace syntax in addition to$VAR_NAME(fix(providers): expand ${VAR_NAME} brace syntax in MCP config env vars #1728).archon-refactor-safelypersists read-only node outputs via bash bridges so downstream nodes can reference them (fix(workflows): persist read-only node outputs via bash bridges in archon-refactor-safely #1734).$ARGUMENTSinto generated YAMLs so user arguments reach the first node (fix(workflows): ensure workflow-builder injects $ARGUMENTS in generated YAMLs #1733).attemptController.abort()that crashed after SDK cleanup (bug(providers/codex):attemptController.abort()in retry finally crashes via codex-sdk's removeAllListeners (regression from #1371) #1735, fix(providers/codex): remove stale attemptController.abort() that crashes after SDK cleanup #1739); freshAbortControllerper retry attempt so a previously-aborted controller can't kill the new attempt (bug(codex): retry loop reuses caller AbortSignal; crash on attempt N poisons attempt N+1 #1266, fix(providers/codex): fresh AbortController per retry attempt (#1266) #1371).claudeBinaryPathand expands npm platform-package directories (e.g.@anthropic-ai/claude-code-darwin-arm64) to the bundled binary (Windows:claudeBinaryPathresolved from config but SDK fails with "native binary not found" #1723, fix(providers/claude): reject directory paths and expand npm package dirs (#1723) #1737).*_URLenv vars rather than assuminggithub.com(fixesresolveForgeAuthdoesn't useGITEA_TOKENfor self-hosted Gitea instances with non-standard hostnames #1704, fix(clone): resolve forge auth via configured *_URL env vars (fixes #1704) #1706); non-GitHub forge URLs authenticate viaGITLAB_TOKEN/GITEA_TOKEN(fixes Web UI clone handler only injects auth token for github.com URLs #1655, fix(clone): authenticate non-GitHub forge URLs via GITLAB_TOKEN / GITEA_TOKEN (fixes #1655) #1658).ARCHON_STATE_JSONmarker extraction uses line-anchored regex so embedded marker-like strings in script output don't confuse the parser (fix(scripts): use line-anchored regex to extract ARCHON_STATE_JSON markers #1695).condition_json_parse_failedis now surfaced as a workflow error instead of silently skipping the conditional branch (Workflow exits success when condition_json_parse_failed cascades to silent node skips #1673, fix(workflows): surface condition_json_parse_failed as workflow error instead of silent skip (#1673) #1694).Merging this PR releases 0.4.0 to main.