Skip to content

Release 0.4.1#1793

Merged
Wirasm merged 149 commits into
mainfrom
dev
May 28, 2026
Merged

Release 0.4.1#1793
Wirasm merged 149 commits into
mainfrom
dev

Conversation

@Wirasm

@Wirasm Wirasm commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Release 0.4.1

Hotfix for the v0.4.0 upgrade path.

Fixed

  • Upgrading from v0.3.x to v0.4.0 left every operation broken with Error: no such column: user_id. The v0.4.0 SQLite schema initializer (createSchema()) added two CREATE INDEX statements referencing user_id on conversations and workflow_runs, but the columns themselves are added by migrateColumns() — which runs after createSchema(). On any database created before v0.4.0, CREATE INDEX aborted the entire init block, the SqliteAdapter constructor threw, and every subsequent DB call failed. New users with a fresh ~/.archon/archon.db were unaffected because the columns are present from table creation. The fix moves both index creations into migrateColumns() so they run after the matching ALTER TABLE. A regression test seeds a pre-v0.4.0 schema and asserts the upgrade path now completes cleanly (fix(db/sqlite): defer user_id index creation until after column ALTER (v0.4.1 hotfix) #1792).

Merging this PR releases 0.4.1 to main.

github-actions Bot and others added 30 commits April 22, 2026 11:26
…be (#1359)

The pre-flight binary smoke does a bare `bun build --compile` — it
deliberately skips `scripts/build-binaries.sh` to stay fast. That means
packages/paths/src/bundled-build.ts retains its dev defaults, including
BUNDLED_IS_BINARY = false.

version.ts branches on BUNDLED_IS_BINARY: when true it returns the
embedded string; when false it calls getDevVersion(), which reads
package.json at `SCRIPT_DIR/../../../../package.json`. Inside a compiled
binary SCRIPT_DIR resolves under `$bunfs/root/`, the walk produces a CWD-
relative path that doesn't exist, and the smoke aborts with "Failed to
read version: package.json not found" — a false positive.

Hit during the 0.3.8 release attempt: the real Pi lazy-load fix was
working end-to-end; the smoke test was the only thing failing.

Use --help instead. It exercises the same module-init graph (so it still
catches the real failure modes the skill lists — Pi package.json init
crash, Bun --bytecode bugs, CJS wrapper issues, circular imports under
minify) but has no dev/binary branch, so no false positive.

Also add a longer comment block explaining why --help is preferred, so
this doesn't get "normalized" back to `version` by a future drive-by.
The brew path of /test-release runs `brew uninstall` in Phase 5 to leave the
system in its pre-test state. For operators using the dual-homebrew pattern
(renamed brew binary at `/opt/homebrew/bin/archon-stable` so it coexists with
a `bun link` dev `archon`), that uninstall wipes the Cellar dir the
`archon-stable` symlink points into → `archon-stable` becomes dangling →
`brew cleanup` sweeps it away on the next brew op. Next time the operator
wants stable, they have to manually re-run `brew-upgrade-archon`.

Fix: make the skill aware of `archon-stable` and restore it transparently.

- Phase 2 item 4: detect the `archon-stable` symlink before any brew op;
  export `ARCHON_STABLE_WAS_INSTALLED=yes` so Phase 5 knows to restore it.
  Only triggers for the brew path (curl-mac/curl-vps don't touch brew so
  they leave `archon-stable` alone).
- Phase 5 brew path: after `brew uninstall + untap`, if the flag was set,
  re-tap + re-install + rename. Verifies the restored `archon-stable`
  reports a version and warns (non-fatal) if the rename target is missing.
  Documents the tradeoff: the restored version is "whatever the tap ships
  today", not necessarily the pre-test version — usually that's what the
  operator wants (the release they just tested becomes stable) but the
  back-version-QA case requires a manual `brew-upgrade-archon` after.
- Phase 1 confirmation banner now mentions that `archon-stable` will be
  preserved so the operator isn't surprised by the reinstall during Phase 5.

No changes to curl-mac/curl-vps paths. No changes to Phase 4 test suite.
… a compiled binary (#1360)

v0.3.9 made Pi boot-safe: lazy-loading its imports meant `archon version`
no longer crashed on `@mariozechner/pi-coding-agent/dist/config.js`'s
module-init `readFileSync(getPackageJsonPath())`. That's what the
`provider-lazy-load.test.ts` regression test guards.

The fix was only half the problem though. When a Pi workflow actually
runs, sendQuery() triggers the dynamic import — and Pi's config.js
module-init fires then, hitting the exact same ENOENT on
`dirname(process.execPath)/package.json`. Discovered by running
`archon workflow run test-pi` against a locally-compiled 0.3.9 binary:

    [main] Failed: ENOENT: no such file or directory,
           open '/private/tmp/package.json'
        at readFileSync (unknown)
        at <anonymous> (/$bunfs/root/archon-providertest:184:7889)
        at init_config

Boot-safe ≠ runtime-safe. The `/test-release` run for 0.3.9 passed
because it only exercised `archon-assist` (Claude); Pi was never
actually invoked on the released binary.

Fix: before the dynamic `import('@mariozechner/pi-coding-agent')` in
sendQuery, install a PI_PACKAGE_DIR shim. Pi's config.js checks
`process.env.PI_PACKAGE_DIR` first in its `getPackageDir()` and
short-circuits the `dirname(process.execPath)` walk. We write a
minimal `{name, version, piConfig:{}}` stub to
`tmpdir()/archon-pi-shim/package.json` (idempotent — existsSync check)
and set the env var. Pi only reads `piConfig.name`, `piConfig.configDir`,
and `version` from that file, all optional, so the stub surface is
genuinely minimal.

Localized to PiProvider: no global state, no mutation of any shared
config, no upstream fork. Claude and Codex providers are unaffected
(their SDKs don't have this class of module-init side effect).

Verified end-to-end: built a compiled archon binary with this patch,
ran `archon workflow run test-pi --no-worktree` (Pi workflow with
model `anthropic/claude-haiku-4-5`), got a clean response. Before the
patch, same binary crashed at `dag_node_started` with the ENOENT above.

Regression test added: asserts `PI_PACKAGE_DIR` is set after sendQuery
hits even its fast-fail "no model" path. Together with the existing
`provider-lazy-load.test.ts` (boot-safe) this covers both halves.
… and Codex (#1361)

Both binary resolvers previously stopped at env-var + explicit config and
threw a "not found" error when neither was set. Users who followed the
upstream-recommended install flow (Anthropic's `curl install.sh` for
Claude, `npm install -g @openai/codex`) still had to manually set either
`CLAUDE_BIN_PATH` / `CODEX_BIN_PATH` or the corresponding config field
before any workflow could run.

Add a tier-N autodetect step between the explicit config tier and the
install-instructions throw. Purely additive: env and config still win
when set (precedence covered by new tests). On autodetect miss, the same
install-instructions error fires as before.

Claude probe list (verified against docs.claude.com "Uninstall Claude
Code → Native installation" section):
  - $HOME/.local/bin/claude            (mac/linux native installer)
  - $USERPROFILE\.local\bin\claude.exe (Windows native installer)

Codex probe list (verified against openai/codex README; npm global-
install puts the binary at `{npm_prefix}/bin/<name>` on POSIX,
`{npm_prefix}\<name>.cmd` on Windows):
  - $HOME/.npm-global/bin/codex   (user-set `npm config set prefix`)
  - /opt/homebrew/bin/codex       (mac arm64 with homebrew-node)
  - /usr/local/bin/codex          (mac intel / linux system node)
  - %APPDATA%\npm\codex.cmd       (Windows npm global default)
  - $HOME\.npm-global\codex.cmd   (Windows user-set prefix)

Not probed (explicit override still required):
  - Custom npm prefixes — `npm root -g` would need a subprocess per
    resolve, too much surface for a probe helper
  - `brew install --cask codex` — cask layout isn't a PATH binary
  - Manual GitHub Releases extracts — placement is user-determined
  - `~/.bun/bin/codex` — not documented in openai/codex README

Pi provider intentionally has no equivalent change: the Pi SDK is
bundled into the archon binary (no subprocess), so there's no "binary"
to resolve. Pi auth lives at `~/.pi/agent/auth.json` which the SDK
already finds by default, and the PR A shim (`PI_PACKAGE_DIR`) handles
the package-dir case via Pi's own documented escape hatch.

E2E verified: removed both config entries from ~/.archon/config.yaml,
rebuilt compiled binary, ran `archon workflow run archon-assist` and a
Codex workflow. Logs showed `source: 'autodetect'` for both, responses
returned cleanly.
…ry autodetect test

The native-installer autodetect test computed its expected path from
process.env.HOME, but the implementation uses node:os homedir(). On
Windows, HOME is typically unset (Windows uses USERPROFILE), so the
test fell back to '/Users/test' while the resolver returned the real
home dir — making the spy's path-equality check fail and breaking CI
on windows-latest.

Mirror the implementation by importing homedir() from node:os and
joining with node:path so the expected path matches the actual
platform-resolved home and separator.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ver (#1365)

Reported in #1365: a user running `archon serve` with DISCORD_BOT_TOKEN
set but the "Message Content Intent" toggle disabled in the Discord
Developer Portal saw the entire server crash with `Used disallowed
intents`. Discord rejects the gateway connection (close code 4014) when
a privileged intent is requested without being enabled, and the
unguarded `await discord.start()` propagated the error all the way up,
taking the web UI down with it.

Wrap discord.start() in try/catch — log the failure with an actionable
hint (special-cased for the disallowed-intent error) and continue
running. Other adapters and the web UI come up regardless. The shutdown
handler already uses optional chaining (`discord?.stop()`) so nulling
discord after a failed start is safe.

Other adapters (Telegram, Slack, GitHub, Gitea, GitLab) have the same
unguarded-start pattern but are out of scope for this fix — addressing
them is tracked separately.

Also expanded the Discord setup docs with a caution callout that names
the exact error string and the new log event so users can grep for
both.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(script-nodes): add dedicated guide and teach the archon skill how to write them

Script nodes (script:) have been a first-class DAG node type since v0.3.3 but
were documented only as one-liners in CLAUDE.md and a CI smoke test. Claude
Code reading the archon skill would see "Four Node Types: command, prompt,
bash, loop" and reach for bash+node/python one-liners instead of a proper
script node — losing bun's --no-env-file isolation, uv's --with dependency
pins, and the .archon/scripts/ reuse story.

- New packages/docs-web/src/content/docs/guides/script-nodes.md mirroring the
  structure of loop-nodes.md / approval-nodes.md: schema, inline vs named
  dispatch, runtime/deps semantics, scripts directory precedence (repo > home),
  extension-runtime mapping, env isolation, stdout/stderr contract, patterns,
  and the explicit list of ignored AI fields.
- guides/authoring-workflows.md and guides/index.md updated so the new guide is
  discoverable from both the node-types table and the guides landing page.
- reference/variables.md calls out the no-shell-quote difference between
  bash: and script: substitution — a subtle correctness trap when adapting a
  bash pattern into a script node.
- Sidebar order bumped +1 on hooks/mcp-servers/skills/global-workflows/
  remotion-workflow to slot script-nodes at order 5 next to the other
  node-type guides.

- .claude/skills/archon/SKILL.md: replaces stale "Four Node Types" (which
  also silently omitted approval and cancel) with the accurate seven, with a
  script-node code block showing both inline and named patterns.
- references/workflow-dag.md: full Script Node section covering dispatch,
  resolution, deps, stdout contract, and the list of AI-only fields that are
  ignored; validation-rules list updated.
- references/dag-advanced.md and references/variables.md: retry-support line
  corrected; no-shell-quote note added.
- examples/dag-workflow.yaml: added an extract-labels TypeScript script node
  and updated the header comment.

* fix(docs): review follow-ups for script-node guide

- skills example: extract-labels was reading process.env.ISSUE_JSON which is
  never set; use String.raw`$fetch-issue.output` so the upstream bash node's
  JSON is actually consumed
- guides/script-nodes.md + skills/workflow-dag.md: idle_timeout is accepted
  but ignored on script (and bash) nodes — executeScriptNode only reads
  node.timeout. Clarify that script/bash use `timeout`, not idle_timeout
- archon-workflow-builder.yaml: prompt enumerated only bash/prompt/command/loop,
  so the AI builder could never propose script or approval nodes. Add both
  (plus examples + rule about script output not being shell-quoted) and
  regenerate bundled defaults
- book/dag-workflows.md + book/quick-reference.md + adapters/web.md: fill in
  the node-type references that were missing script, approval, and cancel.
  adapters/web.md also overclaimed "loop" in the palette — NodePalette.tsx
  only drags command/prompt/bash, so note that the other kinds are YAML-only
…nv gaps, add good-practices + troubleshooting (#1363)

* fix(skill/when): document the full `when:` operator set and compound expressions

The skill reference previously stated "operators: ==, != only" which is
materially wrong — the condition evaluator supports ==, !=, <, >, <=, >=
plus && / || compound expressions with && binding tighter than ||, plus
dot-notation JSON field access. An agent authoring a workflow from the
skill would think half the operators don't exist.

Replaces the single-sentence section with a structured reference covering:
- All six comparison operators (string and numeric modes)
- Compound expressions with precedence rules and short-circuit eval
- JSON dot notation semantics and failure modes
- The fail-closed rules in full (invalid expression, non-numeric side,
  missing field, skipped upstream)

Grounded in packages/workflows/src/condition-evaluator.ts.

* feat(skill): document Approval and Cancel node types

Approval and cancel nodes are first-class DAG node types (approval since the
workflow lifecycle work in #871, cancel as a guarded-exit primitive) but the
skill never described either one. An agent reading the skill and asked to
"add a review gate before implementation" or "stop the workflow if the input
is unsafe" would fall back to bash + exit 1, losing the proper semantics
(cancelled vs. failed, on_reject AI rework, web UI auto-resume).

Approval node coverage (references/workflow-dag.md, SKILL.md):
- Full configuration block with message, capture_response, on_reject
- The interactive: true workflow-level requirement for web UI delivery
- Approve/reject commands across all platforms (CLI, slash, natural
  language) and the capture_response → $node-id.output flow
- Ignored-fields list + the on_reject.prompt AI sub-node exception

Cancel node coverage (references/workflow-dag.md, SKILL.md):
- Single-field schema (cancel: "<reason>")
- Lifecycle: cancelled (not failed); in-flight parallel nodes stopped;
  no DAG auto-resume path
- The "cancel: vs bash-exit-1" decision rule (expected precondition miss
  vs. check itself failing)
- Two canonical patterns — upstream-classification gate, pre-expensive-step
  gate

Validation-rules list updated to enumerate approval/cancel constraints
(message non-empty, on_reject.max_attempts range 1-10, cancel reason
non-empty), plus a forward note that script: joins the mutually-exclusive
set once PR #1362 lands.

Placement in both files is after the Loop section and before the validation
section, so this commit stays additive with respect to PR #1362's Script
node insertion between Bash and Loop — rebase is clean.

* feat(skill): document workflow-level fields beyond name/provider/model

The skill's Schema section previously showed only name, description, provider,
and model at the workflow level — which is most of a stub. Agents asked to
"use the 1M-context Claude beta" or "run this under a network sandbox" or
"add a fallback model in case Opus rate-limits" had no way to discover
that any of these fields existed at the workflow level.

Adds a comprehensive Workflow-Level Fields section covering:
- Core: name, description, provider, model, interactive (with explicit
  callout that interactive: true is REQUIRED for approval/loop gates on
  web UI — a common footgun)
- Isolation: worktree.enabled for pin-on/pin-off (the only worktree field
  at workflow level; baseBranch/copyFiles/path/initSubmodules are
  config.yaml only, so a cross-reference points there)
- Claude SDK advanced: effort, thinking, fallbackModel, betas, sandbox,
  with explicit per-node-only exceptions (maxBudgetUsd, systemPrompt)
- Codex-specific: modelReasoningEffort (with note that it's NOT the same
  as Claude's effort — this has confused users), webSearchMode,
  additionalDirectories
- A complete worked example combining sandbox + approval + interactive

All fields cross-referenced against packages/workflows/src/schemas/workflow.ts
and packages/workflows/src/schemas/dag-node.ts.

* feat(skill/loop): document interactive loops and gate_message

Interactive loop nodes pause between iterations for human feedback via
/workflow approve — used by archon-piv-loop and archon-interactive-prd.
The skill's Loop Nodes section previously omitted both interactive: true
and gate_message entirely, so an agent writing a guided-refinement
workflow wouldn't know the feature exists or that gate_message is
required at parse time.

Adds:
- interactive and gate_message rows to the config table (marking
  gate_message as required when interactive: true — enforced by the
  loader's superRefine)
- A dedicated "Interactive Loops" subsection explaining the 6-step
  iterate-pause-approve-resume flow
- Explicit call-out that $LOOP_USER_INPUT populates ONLY on the first
  iteration of a resumed session — easy to miss and a common surprise
- Workflow-level interactive: true requirement for web UI delivery
  (loader warning otherwise) so the full-flow example is complete
- Note that until_bash substitution DOES shell-quote $nodeId.output
  (unlike script bodies) — called out since the audit surfaced this
  inconsistency

* fix(skill/cli): complete the CLI command reference with missing lifecycle commands

The CLI reference previously documented only list, run, cleanup, validate,
complete, version, setup, and chat — missing nearly every workflow
lifecycle command an agent needs to operate a paused, failed, or stuck
run. The interactive-workflows reference assumed these commands existed
without actually documenting them.

Adds full documentation for:
- archon workflow status — show running workflow(s)
- archon workflow approve <run-id> [comment] — resume approval gate
  (also populates $LOOP_USER_INPUT on interactive loops and the gate
  node's output when capture_response: true)
- archon workflow reject <run-id> [reason] — reject gate; cancels or
  triggers on_reject rework depending on node config
- archon workflow cancel <run-id> — terminate running/paused with
  in-flight subprocess kill
- archon workflow abandon <run-id> — mark stuck row cancelled without
  subprocess kill (for orphan-cleanup after server crashes — matches
  the #1216 precedent)
- archon workflow resume <run-id> [message] — force-resume specific
  run (auto-resume is default; this is for explicit override)
- archon workflow cleanup [days] — disk hygiene for old terminal runs
  (with explicit callout that it does NOT transition 'running' rows,
  a common confusion)
- archon workflow event emit — used inside loop prompts for state
  signalling; documented so agents don't invent their own mechanism
- archon continue <branch> [flags] [msg] — iterative-session entry
  point with --workflow and --no-context flags

Also:
- Adds --allow-env-keys flag to the `workflow run` flag table with
  audit-log context and the env-leak-gate remediation use case
- Adds an "Auto-resume without --resume" note disambiguating when
  --resume is needed vs. when auto-resume handles it
- Adds --include-closed flag to `isolation cleanup`, which was
  previously missing; converts the flag list to a structured table
- Explains the cancel/abandon distinction (live subprocess vs. orphan)

All grounded in packages/cli/src/commands/workflow.ts, continue.ts,
and isolation.ts.

* feat(skill/repo-init): add scripts/ and state/, three-path env model, per-project env injection

The repo-init reference was missing two first-class .archon/ directories
(scripts/ since v0.3.3, state/ since the workflow-state feature) and had
nothing to say about env — the #1 thing a user hits on first-run when
their repo has a .env file with API keys.

Directory tree updates:
- Adds .archon/scripts/ with the extension->runtime rule (.ts/.js -> bun,
  .py -> uv) so agents know where to put named scripts referenced by
  script: nodes.
- Adds .archon/state/ with explicit "always gitignore" callout — these
  are runtime artifacts, not source. Previously undocumented in the skill.
- Adds .archon/.env (repo-scoped Archon env) and distinguishes it from
  the target repo's top-level .env.
- Adds a "What each directory is for" list so the structure isn't just
  a tree with no narrative.

.gitignore guidance:
- state/ and .env added as must-gitignore (state/ matches CLAUDE.md and
  reference/archon-directories.md — skill was lagging).
- mcp/ demoted to conditional — gitignore only if you hardcode secrets.

New "Three-Path Env Model" section:
- ~/.archon/.env (trusted, user), <cwd>/.archon/.env (trusted, repo),
  <cwd>/.env (UNTRUSTED, target project — stripped from subprocess env).
- Precedence (override: true across archon-owned paths) and the
  observable [archon] loaded N keys / stripped K keys log lines so
  operators can verify what actually happened.
- Decision tree for where to put API keys vs. target-project env vs.
  things Archon shouldn't touch.
- Links to archon setup --scope home|project with --force for writing
  to the right file with timestamped backups.

New "Per-Project Env Injection" section:
- Documents both managed surfaces: .archon/config.yaml env: block
  (git-committed, $REF expansion) and Web UI Settings → Projects →
  Env Vars (DB-stored, never returned over API).
- Names every execution surface that receives the injected vars:
  Claude/Codex/Pi subprocess, bash: nodes, script: nodes, and direct
  codebase-scoped chat.
- Documents the env-leak gate with all 5 remediation paths so an agent
  hitting "Cannot register: env has sensitive keys" knows the options.

Grounded in CHANGELOG v0.3.7 (three-path env + setup flags), v0.3.0
(env-leak gate), and reference/security.md on the docs site.

* fix(skill/authoring-commands): correct override paths and add home-scoped commands

The file-location and discovery sections described an override layout that
does not match the actual resolver. It showed:

  .archon/commands/defaults/archon-assist.md  # Overrides the bundled

and claimed `.archon/commands/defaults/` was where repo-level overrides
lived. In fact the resolver (executor-shared.ts:152-200 + command-
validation.ts) walks `.archon/commands/` 1 level deep and uses basename
matching — putting `archon-assist.md` at the top of `.archon/commands/`
is the canonical way to override the bundled version. The `defaults/`
subfolder is a Archon-internal convention for shipping bundled defaults,
not a user-facing override pattern.

Also, home-scoped commands (`~/.archon/commands/`, shipped in v0.3.7)
were completely absent — agents authoring personal helpers wouldn't
know they could live at the user level and be shared across every repo.

Changes:
- File Location section now shows all three discovery scopes (repo,
  home, bundled) with precedence ordering and 1-level subfolder rules
- Duplicate-basename rule documented as a user error surface
- Discovery and Priority section rewritten with accurate 3-step lookup
  order — no more references to the nonexistent defaults/ override path
- Adds the Web UI "Global (~/.archon/commands/)" palette label note so
  users authoring helpers for the builder know what to expect

No code changes — this is a pure fix of stale/incorrect skill reference
material.

* feat(skill): add workflow good-practices and troubleshooting reference pages

Closes two gaps from the audit. The skill previously had zero guidance on
designing multi-node workflows (what to avoid, what to reach for first,
how to structure artifact chains) and zero guidance on where to look
when things go wrong (log paths, env-leak gate remediations, orphan-row
cleanup, resume semantics).

New references/good-practices.md (9 Good Practices + 7 Anti-Patterns):

- Use deterministic nodes (bash:/script:) for deterministic work, AI for
  reasoning — the single biggest quality lever
- output_format required whenever downstream when: reads a field — the
  most common source of "workflow silently routes wrong"
- trigger_rule: none_failed_min_one_success after conditional branches —
  the classic bug where all_success fails because a skipped when:-gated
  branch doesn't count as a success
- context: fresh requires artifacts for state passing — commands must
  explicitly "read $ARTIFACTS_DIR/..." when downstream of fresh
- Cheap models (haiku) for glue, strong for substance
- Workflow descriptions as routing affordances
- Validate (archon validate workflows) + smoke-run before shipping
- Artifact-chain-first design
- worktree.enabled: true for code-changing workflows (reversibility)
- Anti-patterns with before/after YAML examples for each (AI-for-tests,
  free-form when: matching, context: fresh without artifacts, long flat
  AI-node layers, secrets in YAML, retry on loop nodes, tiny
  max_iterations, missing workflow-level interactive:, tool-restricted
  MCP nodes)

New references/troubleshooting.md:

- Log location (~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl)
  with jq recipes for common queries (last assistant message, failed
  events, full stream)
- Artifact location for cross-node handoff debugging
- 9 Common Failure Modes, each with root cause + concrete fix:
  - $BASE_BRANCH unresolvable
  - Env-leak gate (5 remediations)
  - Claude/Codex binary not found (compiled-binary-only)
  - "running" forever (AI working / orphan / idle_timeout)
  - Mid-workflow failure and auto-resume semantics
  - Approval gate missing on web UI (workflow-level interactive:)
  - MCP plugin connection noise (filtered by design)
  - Empty $nodeId.output / field access (4 causes)
- Diagnostic command cheat sheet (list, status, isolation list, validate,
  tail-log, --verbose, LOG_LEVEL=debug)
- Escalation protocol (version + validate + log tail + CHANGELOG + issue)

SKILL.md routing table now dispatches "Workflow good practices /
anti-patterns" and "Troubleshoot a failing / stuck workflow" to the new
references so an agent can find them without having to know they exist.

* docs(book): update node-types coverage from four to all seven

The book is the curated first-contact reading path (landing page → "Get
Started" → /book/). Both dag-workflows.md and quick-reference.md were
stuck on "four node types" — missing script, approval, and cancel. A user
reading the book as their first introduction would form an incomplete
mental model, then find three more node types in the reference section
later with no explanation of when they arrived.

book/dag-workflows.md:
- "four node types" → "seven node types. Exactly one mode field is
  required per node"
- Table now lists Command, Prompt, Bash, Script, Loop, Approval, Cancel
  with one-line "when to use" for each, and cross-links to the dedicated
  guide pages for Script / Loop / Approval
- New sections below the table for Script (inline + named examples with
  runtime and deps), Approval (with the interactive: true workflow-level
  note that's easy to miss), and Cancel (guarded-exit pattern) — keeping
  the existing narrative shape for Bash and Loop

book/quick-reference.md:
- Node Options table now includes script, approval, cancel rows
- agents row added (inline sub-agents, Claude-only)
- New "Script-specific fields" and "Approval-specific fields" subsections
  so the cheat-sheet is actually complete rather than pointing users
  elsewhere for the required constraints
- Retry row callout that loop nodes hard-error on retry — previously
  omitted
- bash timeout note widened to cover script timeout (same semantics)

Both files are docs-web content; the CI build on the docs-script-nodes
PR (#1362) previously validated the Starlight build path with a similar
table addition, so this should render clean.

* fix(skill/cli): remove nonexistent \`archon workflow cancel\`, fix workflow status jq recipe

Two accuracy issues from the PR code-reviewer (comment 4311243858).

C1: \`archon workflow cancel <run-id>\` does NOT exist as a CLI subcommand.
The switch at packages/cli/src/cli.ts:318-485 dispatches on list / run /
status / resume / abandon / approve / reject / cleanup / event — running
\`archon workflow cancel\` hits the default case and exits with "Unknown
workflow subcommand: cancel" (cli.ts:478-484). Active cancellation is
only available via:
  - /workflow cancel <run-id> chat slash command (all platforms)
  - Cancel button on the Web UI dashboard
  - POST /api/workflows/runs/{runId}/cancel REST endpoint

cli-commands.md: removed the \`### archon workflow cancel <run-id>\`
subsection; kept the \`abandon\` subsection but made it explicit that
abandon does NOT kill a subprocess. Added a call-out box at the bottom
of the abandon section explaining where to go for actual cancellation.

troubleshooting.md "running forever" section: split the original
cancel-vs-abandon advice into three bullets — Web UI / CLI abandon (for
orphans, no subprocess kill) / chat \`/workflow cancel\` (for live runs
that need interruption). Added an explicit "there is no archon workflow
cancel CLI subcommand" parenthetical since the wrong command was being
suggested in flow.

I1: the \`archon workflow list --json\` diagnostic used an incorrect jq
filter. workflow list's --json output (workflow.ts:185-219) has shape
{ workflows: [{ name, description, provider?, model?, ... }], errors: [...] }
with no \`runs\` field — \`jq '.workflows[] | select(.runs)'\` returns empty
unconditionally. Replaced with \`archon workflow status --json | jq '.runs[]'\`,
which matches the actual shape of workflowStatusCommand at
workflow.ts:852+ ({ runs: WorkflowRun[] }). Also tightened the narration
to distinguish JSON from human-readable status output.

No change to the commit history in this PR — these are follow-up fixes
to claims I introduced in earlier commits of this branch (f10b989 for
C1, 66d2b86 for I1).

* fix(skill): remove env-leak gate references (feature was removed in provider extraction)

C2 from the PR code-reviewer (comment 4311243858). The pre-spawn env-leak
gate was removed from the codebase during the provider-extraction refactor
— see TODO(#1135) at packages/providers/src/claude/provider.ts:908. Zero
hits for --allow-env-keys / allowEnvKeys / allow_env_keys / allow_target_repo_keys
across packages/. The CLI's parseArgs (cli.ts:182-208) has no
--allow-env-keys option, and because parseArgs uses strict: false, an
unknown --allow-env-keys would be silently ignored rather than error.

What remains accurate and is NOT touched:
- Three-Path Env Model section (user/repo archon-owned envs are loaded;
  target repo <cwd>/.env keys are stripped from process.env at boot)
  still correctly describes current behavior, grounded in
  packages/paths/src/strip-cwd-env.ts + env-integration.test.ts
- Per-Project Env Injection section (Option 1: .archon/config.yaml env:
  block; Option 2: Web UI Settings → Projects → Env Vars) is unchanged —
  both remain the sanctioned way to get env vars into subprocesses

Removed claims (all three files):
- cli-commands.md: --allow-env-keys flag row in the workflow run flags
  table
- repo-init.md: the "Env-leak gate" subsection at the end of Per-Project
  Env Injection listing 5 remediations (all of which reference UI/CLI/
  config surfaces that don't exist). Replaced with a succinct callout
  that explains the actual current behavior — target repo .env keys are
  stripped, workflows that need those values should use managed
  injection — so the reader still gets the "where to put my env vars"
  answer
- troubleshooting.md: the "Cannot register: codebase has sensitive env
  keys" section (error message that can no longer be emitted)

If the env-leak gate is ever resurrected per TODO(#1135), the docs can be
re-added then. The CHANGELOG v0.3.0 entry describing the gate is a
historical record of past behavior and does not need to be rewritten.

* fix(skill/troubleshooting): correct JSONL event type names and field name

C3 from the PR code-reviewer (comment 4311243858). The troubleshooting
reference's event-types table used _started / _completed / _failed
suffixes, but packages/workflows/src/logger.ts:19-30 shows the actual
WorkflowEvent.type enum is:

  workflow_start | workflow_complete | workflow_error |
  assistant | tool | validation |
  node_start | node_complete | node_skipped | node_error

The second jq recipe also queried `.event` but the discriminator is `.type`.

Fixes:
- Event table: renamed columns (_started → _start, _completed → _complete,
  _failed → _error). Explicitly called out the field name as `type` so the
  reader knows what jq selector to use
- Replaced the "tool_use / tool_result" row with a single `tool` row and
  listed its actual payload fields (tool_name, tool_input, duration_ms,
  tokens) — tool_use/tool_result are SDK message kinds that appear within
  the AI stream, not top-level log event types
- Added a `validation` row (was missing; it's emitted by workflow-level
  validation calls with `check` and `result` fields)
- Removed `retry_attempt` row — this event type is not emitted to the
  JSONL file. Retry bookkeeping goes through pino logs, not the workflow
  log file
- Added an explicit callout that loop_iteration_started /
  loop_iteration_completed (and other emitter-only events) go through
  the workflow event emitter + DB workflow_events table, NOT the JSONL
  file. Pointed readers to the DB or Web UI for loop-level detail. This
  distinguishes the two parallel event systems — easy to conflate
  (store.ts:11-17 uses _started/_completed/_failed for the DB side,
  logger.ts uses _start/_complete/_error for JSONL)
- Fixed the "all failed events" jq recipe: .event → .type and _failed → _error
- Minor cleanup: the inline "tool_use events" mention in the "running
  forever" section said the wrong event name — updated to "tool or
  assistant events in the tail"

Grounded in packages/workflows/src/logger.ts (canonical JSONL event
shape) and packages/workflows/src/store.ts (the parallel DB event
naming, which the reviewer correctly flagged as different and worth
keeping distinct).

* fix(skill): two stragglers from the code-reviewer audit

Cleanup of two references that slipped through the earlier C1 and C3 fixes:

- references/troubleshooting.md:126: \`node_failed\` → \`node_error\`
  (the "Node output is empty" diagnostics section references the JSONL
  log, which uses the logger.ts enum — not the DB workflow_events table
  which does use \`node_failed\`). The C3 fix corrected the event table
  and one jq recipe but missed this inline mention.

- references/interactive-workflows.md:106: removed \`archon workflow
  cancel <run-id>\` (nonexistent CLI subcommand) from the
  troubleshooting bullet. This was pre-existing before the hardening
  PR but fell within the C1 remediation scope. Replaced with the
  correct triage: reject (approval gate only) vs abandon (orphan
  cleanup, no subprocess kill) vs chat /workflow cancel (actual
  subprocess termination).

Grounded in the same sources as the earlier C1/C3 commits:
packages/cli/src/cli.ts:318-485 (no cancel case) and
packages/workflows/src/logger.ts:19-30 (JSONL type enum).

* feat(skill): point to archon.diy as the canonical docs source

The skill had no reference to archon.diy (the live docs site built from
packages/docs-web/). Several reference files said "see the docs site"
without naming the URL, leaving the agent to guess or grep the repo for
the hostname. An agent with the skill loaded should know that when the
distilled reference pages don't cover a case, the full canonical docs
are one WebFetch away.

SKILL.md: new "Richer Context: archon.diy" section between Routing and
Running Workflows. Covers:
- When to reach for the live docs (longer examples, tutorial framing,
  features the skill only mentions in passing, "where's that
  documented?" user questions)
- URL map — 13 starting points covering getting-started, book (tutorial
  series), guides/ (authoring + per-node-type + per-node-feature),
  reference/ (variables, CLI, security, architecture, configuration,
  troubleshooting), adapters/, deployment/
- Precedence: skill refs first (context-cheap, tuned for agents), docs
  site as escalation. Prevents agents defaulting to WebFetch when a
  local skill ref already covers the answer

Also upgrades the 5 existing generic "docs site" mentions across
reference files to concrete archon.diy URLs with anchor fragments where
helpful:
- good-practices.md: Inline sub-agents pattern → archon.diy/guides/
  authoring-workflows/#inline-sub-agents
- troubleshooting.md: "Install page on the docs site" → archon.diy/
  getting-started/installation/
- workflow-dag.md: "Workflow Description Best Practices" → anchor link;
  sandbox schema reference → archon.diy/guides/authoring-workflows/
  #claude-sdk-advanced-options
- repo-init.md: Security Model reference → archon.diy/reference/
  security/#target-repo-env-isolation (deep-link into the section that
  covers the <cwd>/.env strip behavior)

URL source of truth: astro.config.mjs:5 (site: 'https://archon.diy').
URL structure mirrors packages/docs-web/src/content/docs/<section>/
<page>.md — verified by the 62 pages the docs build produces.
Anthropic's Opus 4.7 landed 2026-04-16; on the Anthropic API, opus /
opus[1m] now resolve to 4.7 with a 1M context window at standard
pricing. Using the alias instead of the hard-pinned claude-opus-4-6[1m]
lets bundled default workflows auto-track the recommended Opus version.

No explicit effort is set, so nodes inherit the per-model default
(xhigh on 4.7, high on 4.6).
* fix(workflow): migrate piv-loop plan handoff to $ARTIFACTS_DIR (#1380)

The create-plan node used a relative path (.claude/archon/plans/{slug}.plan.md)
that the AI agent would sometimes write to a different location, breaking all
downstream nodes that glob for the plan file. Migrated all plan/progress file
references to $ARTIFACTS_DIR/plan.md and $ARTIFACTS_DIR/progress.txt, matching
the pattern used by archon-fix-github-issue and other workflows.

Changes:
- Replace slug-based plan path with $ARTIFACTS_DIR/plan.md in create-plan node
- Replace ls -t glob discovery with direct $ARTIFACTS_DIR/plan.md reads in
  refine-plan, code-review, and fix-feedback nodes
- Replace empty-string guard with file-existence check in implement-setup bash
- Migrate progress.txt references in implement loop to $ARTIFACTS_DIR/
- Add explicit plan/progress paths in finalize node
- Regenerated bundled-defaults.generated.ts

Fixes #1380

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(workflow): address review findings in archon-piv-loop

- Rename 'Step 2: Write the Plan' to 'Step 2: Plan File Location' to
  eliminate the duplicate heading that collided with Step 3's identical
  title in the create-plan node
- Guard implement-setup against a 0-task plan file: exit 1 with a
  clear error when no '### Task N:' sections are found, preventing a
  silent no-op implement loop
- Remove 2>/dev/null from code-review commit so pre-commit hook failures
  and other stderr are visible to the agent instead of silently swallowed
- Replace '|| true' on git push in finalize with an explicit WARNING echo
  so push failures (auth, upstream conflict, no remote) surface to the
  agent rather than being silently ignored
- Regenerate bundled-defaults.generated.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(workflows): regenerate bundled defaults to match opus[1m] alias

The bundle was stale relative to the YAML sources after #1395 merged —
check:bundled was failing CI. Regenerated; no YAML edits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cutor (#1403)

PIV Task 1: Adds three new tests in a dedicated describe block
'executeDagWorkflow -- final status derivation' covering the anyFailed
branch (dag-executor.ts ~line 2956) that previously had no direct test:
- one success + one independent failure calls failWorkflowRun (not completeWorkflowRun)
- multiple successes + one failure calls failWorkflowRun (not completeWorkflowRun)
- trigger_rule: none_failed skips dependent node but anyFailed still marks run failed

Fixes #1381.
New reference for the archon skill: a single-glance lookup of which
parameter works on which node type, an intent-based "how do I..." table,
a consolidated silent-failure catalog, and an inline agents: section
(previously only referenced via archon.diy).

Purpose is complementary, not duplicative:
- workflow-dag.md remains the authoring guide
- dag-advanced.md remains the hooks/MCP/skills/retry deep-dive
- good-practices.md remains the patterns and anti-patterns
- parameter-matrix.md is the grep-this-first lookup when you know the
  outcome you want but not which field gets you there

Also registers the new reference in SKILL.md routing table.
Add explicit references to .github/PULL_REQUEST_TEMPLATE.md in both
CONTRIBUTING.md and CLAUDE.md, plus a reminder to link issues with
Closes/Fixes/Resolves so they auto-close on merge. Repo-triage runs
were flagging dozens of partially-filled or unlinked PRs each cycle.
…riage (#1428)

* feat(workflows): add maintainer-standup workflow for daily PR/issue triage

Daily morning briefing that pulls origin/dev, triages all open PRs and assigned
issues against direction.md, and surfaces progress vs. the previous run. Designed
for live-checkout use (worktree.enabled: false) so it can read its own state.

Layout under .archon/maintainer-standup/:
  - direction.md (committed) — project north-star: what Archon IS / IS NOT.
    Drives PR P4 polite-decline classification with cited clauses.
  - README.md / profile.md.example — setup docs and template for new maintainers.
  - profile.md, state.json, briefs/YYYY-MM-DD.md — gitignored, per-maintainer.

Engine:
  - 3 parallel gather scripts in .archon/scripts/maintainer-standup-*.ts
    (git-status, gh-data, read-context) — bun runtime, JSON stdout.
  - Synthesis node: command file with output_format schema for
    { brief_markdown, next_state }.
  - Persist node: tiny inline bun script writes both to disk.

Run-to-run continuity: state.json carries observed_prs/issues snapshots, so the
next run can detect what merged, what closed, what the maintainer shipped, and
which carry-over items aged past N days.

Also adds .archon/** to the ESLint global ignore list (matches the existing
.claude/skills/** pattern) since .archon/ is user content and not part of any
tsconfig project.

* fix(maintainer-standup): address CodeRabbit review on #1428

- gh-data: bump --limit 100 → 1000 on all_open_prs and warn loudly when
  the cap is hit; preserves the observed_prs invariant the next-run
  "resolved since last run" diff depends on. (CodeRabbit critical)
- maintainer-standup.md: clarify P1 CI signal — the gathered payload only
  carries mergeStateStatus, not statusCheckRollup; for borderline P1s,
  drill in via `gh pr checks <n>`. (CodeRabbit minor)
- workflow.yaml persist: write briefs under local YYYY-MM-DD (sv-SE
  locale) instead of UTC ISO date, so an evening run doesn't file
  tomorrow's brief and break recent_briefs lookups. (CodeRabbit minor)
- workflow.yaml persist: wrap state/brief writes in try/catch; on
  failure dump brief_markdown and next_state to stderr so a 5-minute
  Sonnet synthesis isn't lost to a transient disk error. (CodeRabbit minor)
- gh-data + git-status: switch from execSync (shell-string) to
  execFileSync (argv array) for git/gh invocations. Defense-in-depth
  against shell metacharacters in values that pass through (esp. the
  gh_handle from profile.md). (CodeRabbit nitpick)
Add optional `tags: string[]` to `workflowBaseSchema`. Explicit values take precedence over keyword inference; `tags: []` suppresses inference end-to-end; omitting the field falls back to inference (backwards compatible). Non-array values warn-and-ignore matching the sibling `worktree`/`additionalDirectories` patterns.
…ows under maintainer/ (#1430)

* feat(workflows): add maintainer-review-pr and group maintainer workflows under .archon/workflows/maintainer/

Adds the maintainer-review-pr workflow — a Pi/Minimax-based PR triage
flow that gates on direction alignment, scope focus, and PR-template
quality before doing any deep review. If the gate clears, runs the
five review aspects (code/error-handling/test-coverage/comment-quality/
docs-impact) as parallel Archon nodes and auto-posts a synthesized
review comment. If the gate fails (direction conflict, multiple
concerns, sprawling scope), drafts a polite-decline comment and pauses
for the maintainer's approval before posting.

Reorganizes the existing maintainer-standup workflow into the same
subfolder so all maintainer-facing workflows live together. Subfolder
grouping is supported by the workflow loader (1 level deep, resolution
by filename).

What lands:

- .archon/workflows/maintainer/maintainer-standup.yaml (moved from
  .archon/workflows/maintainer-standup.yaml)
- .archon/workflows/maintainer/maintainer-review-pr.yaml (new)
- .archon/commands/maintainer-review-{gate,code-review,error-handling,
  test-coverage,comment-quality,docs-impact,synthesize,report}.md (new,
  Pi-tuned variants of the existing review-agent commands so they avoid
  Claude-only Task / sub-agent patterns)

Pi/Minimax integration:

- Uses provider: pi, model: minimax/MiniMax-M2.7 — verified via the
  e2e-minimax-smoke test that Pi correctly routes to Minimax (session
  jsonl confirms provider=minimax) and that Pi's best-effort
  output_format parser handles the gate's nested schema.
- Two test runs landed real comments: a direction-decline on PR #1335
  and a deep-review on PR #1369. Both were posted to GitHub via the
  workflow's gh pr comment node.

* chore(workflows): also group repo-triage under .archon/workflows/maintainer/

repo-triage is the third maintainer-facing workflow alongside maintainer-standup and maintainer-review-pr; group it in the same subfolder for consistency. Subfolder resolution is by filename so the workflow name is unchanged.
…r unmapped providers (#1284)

Closes #1096.

- Switch Pi provider model lookup from pi-ai's getModel() (static catalog
  only) to ModelRegistry.create(authStorage).find() so user-configured
  custom models in ~/.pi/agent/models.json (LM Studio, ollama, llamacpp,
  custom OpenAI-compatible endpoints) are discoverable.
- Remove the local lookupPiModel helper.
- For env-var-mapped providers (anthropic, openai, etc.) still throw
  with a pi /login hint when credentials are missing. For unmapped
  providers, log pi.auth_missing at info and continue so local models
  that don't need credentials work without ceremony.
- Surface modelRegistry.getError() in the not-found message and emit
  pi.model_not_found so users debugging custom-provider configs see the
  real cause (e.g. missing baseUrl in models.json).
- Guard AuthStorage.create() and ModelRegistry.create() with try/catch
  so a malformed ~/.pi/agent/auth.json surfaces with Pi-framed context
  instead of a raw SDK stack trace.
- Document the credential-free path for local providers in ai-assistants.md.

Co-authored-by: Matt Chapman <Matt@NinjitsuWeb.com>
…add e2e-minimax-smoke (#1431)

* chore(workflows): group all smoke-test workflows under .archon/workflows/test-workflows/

Move the 7 existing e2e-*.yaml smoke tests plus the new e2e-minimax-smoke
test into a dedicated subfolder. Subfolder grouping is supported by the
workflow loader (1 level deep, resolution by filename) so workflow names
are unchanged. Mirrors the .archon/workflows/maintainer/ split landing
in #1430.

Also adds e2e-minimax-smoke.yaml — a sanity check that Pi correctly
routes to Minimax M2.7 via the user's local pi auth, and that Pi's
best-effort output_format parser handles a small nested schema. Asserts
routing by reading the most recent Pi session jsonl rather than asking
the model to self-identify (LLMs are unreliable narrators about their
own identity, especially when Pi's system prompt mentions other
providers as defaults).

* fix(e2e-minimax-smoke): address CodeRabbit review on #1431

- Widen find window from -mmin -3 to -mmin -10. The smoke's three Pi
  nodes plus the assert can collectively run several minutes on slow
  networks; 3 minutes was tight enough to false-FAIL on a healthy run.
  (CodeRabbit minor)
- Drop non-deterministic `head -1` over `find` output. find doesn't
  guarantee any order; on a tie, the wrong file would be picked. Now
  iterates all matching sessions and breaks on first one carrying the
  routing signal — any match is sufficient evidence. (CodeRabbit minor)
- Replace single-regex `'"provider":"minimax".*"modelId":"MiniMax-M2.7"'`
  with two separate greps joined by `&&`. JSON field order isn't part of
  Pi's contract; a future Pi release reordering `provider` and `modelId`
  in the model_change event would silently false-FAIL the original
  pattern. The new check is order-independent. (CodeRabbit major)
Six findings, two majors and four minors/nitpicks:

- gate.md L17 vs L77: resolved conflicting input-source instructions.
  Body claimed "all inline, no extra fetch" while a later phase
  permitted reading PULL_REQUEST_TEMPLATE.md. Now: explicit "one
  allowed extra read" callout in Phase 1 + matching wording in Gate C.
  (CodeRabbit major)

- gate.md fenced blocks: added missing language identifiers (text/json/
  markdown) to satisfy markdownlint MD040. (CodeRabbit minor)

- gate.md L155 + read-context.ts: deterministic clock. The 3-day deadline
  was anchored to prior_state.last_run_at, which can be stale and produce
  past-dated deadlines. Moved both today and deadline_3d into the
  read-context.ts output (computed via sv-SE locale → ISO date in local
  time) and instructed the gate to use $read-context.output.deadline_3d
  directly. LLMs are unreliable at calendar arithmetic; this avoids it
  entirely. (CodeRabbit major)

- maintainer-review-pr.yaml fetch-diff: dropped 2>/dev/null on gh pr diff
  so auth / network / deleted-PR failures fail the node instead of
  feeding an empty diff to the gate. Empty-but-successful diff (PR has
  no changes) is now an explicit marker the gate can detect. (CodeRabbit
  minor)

- maintainer-review-pr.yaml approve-unclear: added capture_response: true
  so the maintainer's approve comment flows to the report node. Reject
  reasoning is already captured by Archon's run record. (CodeRabbit
  minor)

- maintainer-review-pr.yaml post-decline + report.md: the gh pr edit
  --add-label call previously swallowed all errors with || true and the
  report still claimed the label was applied. Now writes applied/skipped
  to $ARTIFACTS_DIR/.label-applied + the gh stderr to .label-error so
  the report can describe the actual outcome. (CodeRabbit nitpick)
…ume (#1435)

* fix(workflows): approval gate bypass after reject-with-redraft on resume

When an approval node was rejected with on_reject.prompt, the synthetic
PromptNode built to run the on_reject prompt reused the approval gate's
own node ID. executeNodeInternal then wrote a node_completed event with
that ID, causing getCompletedDagNodeOutputs to treat the gate as already
completed on the next resume — bypassing the human gate entirely.

Fix: give the synthetic node the ID `${node.id}:on_reject` so its
node_completed event has a distinct step_name that won't match the
approval gate slot in priorCompletedNodes.

Adds a regression test asserting no node_completed event with the
approval gate's ID is written during on_reject execution.

Fixes #1429

* test(workflows): add positive assertion and SSE side-effect comment for on_reject synthetic node

Add complementary positive assertion to the regression test to verify that
node_completed is written exactly once with step_name 'review:on_reject',
ensuring future refactors that suppress the event entirely would be caught.

Add inline comment in executeApprovalNode documenting the known SSE side-effect:
node_started/node_completed events with nodeId='review:on_reject' flow through
the SSE pipeline into the web UI, resulting in a transient phantom node in the
execution view. This is cosmetic-only — the human gate contract is preserved.

* simplify: reduce duplicate cast pattern in on_reject test assertions
…e checkout (#1438)

* feat(workflows): add mutates_checkout field to skip path-lock for concurrent runs

Add `mutates_checkout: boolean` (optional, default true) to the workflow
schema. When set to false, the executor skips the path-exclusive lock
that serializes all runs on the same working path, allowing N concurrent
runs on the same live checkout.

The primary use case is `maintainer-review-pr`, which reads shared state
but writes only to per-run artifact paths and GitHub PR comments — two
parallel reviews of different PRs should not fail with "Workflow already
active on this path".

Changes:
- `schemas/workflow.ts`: add optional `mutates_checkout` field
- `loader.ts`: parse and propagate the field (warn-and-ignore on invalid values)
- `executor.ts`: wrap path-lock guard in `if (workflow.mutates_checkout !== false)`
- `executor.test.ts`: two new tests in the concurrent-run guard suite
- `maintainer-review-pr.yaml`: opt in with `mutates_checkout: false`

* test(workflows): add loader tests for mutates_checkout parsing

- Add 5 tests covering false, true, omitted, and invalid (string "yes") values
- Invalid non-boolean values are silently dropped with warn — now explicitly tested
- Remove the // end mutates_checkout guard trailing comment (no precedent in file)
- Clarify loader comment: "parse/warn pattern" not "warn-and-ignore pattern" to avoid implying the return style matches interactive

* simplify: collapse nodeType/aiFields pair into single nonAiNode object in parseDagNode
…es (#1434)

* docs: replace String.raw with direct assignment in script node examples

String.raw`$nodeId.output` fails silently when substituted output contains
a backtick, terminating the template literal early and producing cryptic parse
errors. JSON is valid JS expression syntax, so direct assignment is safe for
all valid JSON values including those with backticks.

- Replace String.raw pattern in dag-workflow.yaml example
- Replace String.raw pattern in archon-workflow-builder.yaml template
- Add CAUTION bullet in workflow-dag.md Script Node section
- Add Silent Failures item #14 in parameter-matrix.md
- Add Starlight caution aside in script-nodes.md
- Extend script bodies bullet in variables.md
- Regenerate bundled-defaults.generated.ts

Fixes #1427

* docs: fix Rule 6 in generate-yaml prompt to distinguish bun vs uv patterns

Rule 6 still referenced JSON.parse after the example was updated to direct
assignment, creating a contradiction for the AI code generator. Update the
prose to explicitly distinguish TypeScript/bun (direct assignment) from
Python/uv (json.loads), matching the updated embedded example.
…s/experimental/

Move two repo-scoped workflows that were sitting untracked at the workflow
root into a dedicated subfolder. Subfolder grouping is supported by the
loader (1 level deep, resolution by filename), so workflow names are
unchanged and the /release skill still resolves archon-release correctly.

Files moved:
- archon-fix-github-issue-experimental.yaml — Path-A variant of the
  issue-fix workflow used today to land #1434, #1435, #1438.
- archon-release.yaml — the live release workflow used by the /release
  skill end-to-end (validate -> binary smoke -> version bump -> changelog
  -> approval -> commit -> PR -> tag -> Homebrew formula update).
…des (#1387)

executeBashNode previously only merged explicit envVars on top of
process.env. The three well-known workflow directories (artifactsDir,
logDir, baseBranch) were passed as function parameters and used for
compile-time substitution of $ARTIFACTS_DIR / $LOG_DIR / $BASE_BRANCH
in the script body, but were never added to the subprocess environment.

As a result, any script that relied on shell-runtime expansion — e.g.
JSON_FILE="${ARTIFACTS_DIR}/foo.output.json" inside a heredoc, an
inherited helper script, or a `bash -c` subshell — saw the variable
unset and silently fell back to its default (typically an empty string
or "."), writing artifacts to the workflow cwd instead of the nominal
artifacts directory.

Always build subprocessEnv from process.env plus the three well-known
directories, then allow explicit envVars to override. Compile-time
substitution behavior is unchanged; existing scripts that do not
reference these variables are unaffected; user-supplied envVars still
win on conflict.
…1426)

* fix(workflow): substitute \$nodeId.output refs in approval messages

Approval node messages were emitted as raw strings, bypassing the
substituteNodeOutputRefs() pass that prompt/bash/loop/cancel nodes
all run. This made interactive workflows like atlas-onboard show
literal "\$gather-context.output.repo_name" placeholders to humans
at HITL gates, leaving them unable to know what they were approving.

Fix: rendered the approval.message through substituteNodeOutputRefs
once at the top of the standard approval gate path, then used the
resolved string in all 4 emission sites (safeSendMessage,
createWorkflowEvent, pauseWorkflowRun, event-emitter).

Test: new dag-executor.test case wires a structured-output upstream
node into an approval node and asserts pauseWorkflowRun receives the
substituted message ("Repo: hcr-els | App: CCELS | Port: 3012")
rather than the literal placeholders.

Repro: any workflow with an approval node whose message references
\$nodeId.output[.field]. Observed in the wild on atlas-onboard's
confirm-context HITL gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(workflow): extend approval-substitution test to cover all 4 emission sites

Per CodeRabbit review: the original test only verified pauseWorkflowRun
received the substituted message, but the fix touches 4 emission sites.
A future regression at safeSendMessage / createWorkflowEvent / event-emitter
would silently leave the test passing while users still saw raw $node.output
placeholders.

Adds two additional assertions:
- platform.sendMessage prompt contains substituted message + does NOT
  contain literal $gather-context.output placeholders
- The persisted approval_requested workflow event's data.message is
  substituted

Event-emitter assertion deferred (no existing pattern for spying on the
global emitter in this test file). Two of three secondary surfaces
covered closes the practical regression risk — both are user-visible
(chat prompt + audit-log event); the emitter is internal only.

Test count: 7 pass / 22 expect() (was 18). Full suite 193 pass / 353
expect() — no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#1367)

* feat(workflows): expose $LOOP_PREV_OUTPUT in loop node prompts (#1286)

Adds a new substitution variable that carries the previous loop iteration's
cleaned output into the next iteration's prompt. Empty on iteration 1; the
prior iteration's output (after stripCompletionTags) on iteration 2+.

Why: fresh_context: true loops have no way to reference what the previous
pass produced or why it failed without dragging the full session forward.
$LOOP_PREV_OUTPUT closes that gap with zero session-cost — same trust
boundary as $nodeId.output, no new external surface.

Changes:
- packages/workflows/src/executor-shared.ts: substituteWorkflowVariables
  accepts a 10th positional loopPrevOutput arg and substitutes
  $LOOP_PREV_OUTPUT (defaults to '').
- packages/workflows/src/dag-executor.ts: executeLoopNode passes
  lastIterationOutput on iteration 2+ (and explicit '' on iteration 1 /
  the first iteration of an interactive resume, since lastIterationOutput
  is a per-call variable that does not survive resume metadata).
- Unit tests: 3 new cases in executor-shared.test.ts.
- Integration tests: 2 new cases in dag-executor.test.ts verifying the
  prompt sent to the AI on iter 1 vs iter 2, and that the value reflects
  cleaned output (no <promise> tags).
- Docs: variables.md, loop-nodes.md (new "Retry-on-failure" pattern),
  CLAUDE.md variable reference.

Backward compatibility: prompts that don't reference $LOOP_PREV_OUTPUT are
unaffected. All 843 workflow tests + type-check + lint + format:check +
bun run validate pass locally.

* docs: address coderabbit review on variables/loop-nodes

- variables.md: include $LOOP_PREV_OUTPUT in substitution-order list and
  availability table to match the new variable row at line 30
- loop-nodes.md: document the interactive-resume exception where the first
  iteration after an approval-gate resume still receives an empty
  $LOOP_PREV_OUTPUT regardless of iteration number (per dag-executor.ts
  L1781-1783 where i === startIteration always clears prev output)

* docs(changelog): add Unreleased entry for $LOOP_PREV_OUTPUT (#1367 review)

* test(loop): add resume-from-approval integration test for $LOOP_PREV_OUTPUT (#1367 review)

Per maintainer-review-pr suggestion (Wirasm): two-call integration test
covering the resume-from-approval scenario.

  - Call 1: fresh interactive loop pauses at the gate after iteration 1 and
    asserts $LOOP_PREV_OUTPUT substitutes to empty on iter 1 (no prior
    output) plus the gate pause is recorded.
  - Call 2: resumed run with metadata.approval populated. The first
    resumed iteration must substitute $LOOP_PREV_OUTPUT to '', NOT to the
    paused run's iter-1 output (which lived in a different process and is
    not persisted). $LOOP_USER_INPUT still flows through as normal.

Locks the documented invariant at dag-executor.ts:1769-1772.

---------

Co-authored-by: voidborne-d <DottyEstradalco@allergist.com>
…1457)

The brief was missing a key signal — when contributors reply on PRs or
issues, the maintainer wouldn't see it explicitly. Empirically reviewed
PR replies were buried under aggregate updatedAt timestamps with no
indication of WHO replied or WHAT they said.

This adds a new "Replies waiting on you" section to the daily brief,
sourced from two paginated GitHub API calls scoped by since=last_run_at:

  - /repos/{o}/{r}/issues/comments  PR + issue conversation comments
  - /repos/{o}/{r}/pulls/comments   inline code-review comments

Filters applied:
  - Skip the maintainer's own comments (gh_handle from profile.md)
  - Skip GitHub bot accounts (login ending in [bot]) — coderabbitai,
    chatgpt-codex-connector, dependabot, etc. They post a constant
    churn of automated review tooling that drowns out human replies;
    the maintainer wants the latter.

Output is grouped by PR/issue number with kind classification:
  - issue              comment on a non-PR issue
  - pr_conversation    PR conversation-level comment
  - pr_review          inline code-review comment (most actionable —
                       usually needs a code-level response, so kind
                       upgrades to pr_review whenever review comments
                       arrive on a PR that also has conversation ones)

Sorted by recency (newest reply first). Synthesizer reads
gh-data.output.replies_since_last_run and renders a section.

Verified on a backdated state.json (last_run_at = yesterday morning):
22 human replies on 22 PRs/issues, bot noise filtered (32 → 22 after
the [bot] filter). Surfaces exactly the contributor responses to
yesterday's review comments and direction questions.
The maintainer-standup brief had no signal for "I already triaged that
PR via maintainer-review-pr 2 days ago" — it just kept listing reviewed
PRs in P1-P4 with no acknowledgement of prior work. Result: maintainer
ends up re-skimming the same PR several mornings in a row.

This adds a shared persistent state file at:

  .archon/maintainer-standup/reviewed-prs.json (gitignored, per-maintainer)

shape:

  {
    "1338": {
      "reviewed_at": "2026-04-27T16:34:57Z",
      "gate_verdict": "review",     // review | decline | needs_split | unclear
      "run_id": "..."
    },
    ...
  }

Three pieces:

1. WRITER — new `record-review` script node in maintainer-review-pr.yaml,
   runs after whichever branch fired (post-review / post-decline /
   approve-unclear) with trigger_rule: one_success. Inline bun script;
   reads $gate.output.verdict, $ARTIFACTS_DIR/.pr-number, and
   $WORKFLOW_ID; appends/upserts the entry. report node now depends on
   record-review so the state write happens before the run completes.

2. READER — read-context.ts loads reviewed-prs.json into a new
   reviewed_prs field on the standup gather output. Same pattern as
   prior_state and recent_briefs.

3. SURFACE — maintainer-standup command file gets a Phase 2h rule:
   when listing PRs in P1-P4 / Polite-decline sections, append:
     - "✓ reviewed Nd ago" for review-branch entries
     - "✓ declined Nd ago" for decline / needs_split branches
     - "✓ triaged Nd ago (unclear)" for unclear branch
   and a STALENESS marker — compare reviewed_at to PR's updatedAt; if
   contributor pushed since the prior review, append
   "⚠ contributor pushed since" so the maintainer knows the prior pass
   may need to be re-run.

Plus a one-shot backfill script:

  .archon/scripts/maintainer-standup-backfill-reviews.ts

Scans the maintainer's gh comments in the last 7 days, pattern-matches
"## Review Summary" / direction-clause-citation / split-up wording, and
populates reviewed-prs.json. Idempotent; existing entries (from real
workflow runs) take precedence over backfilled ones (the writer-node
record is more authoritative than a body-pattern guess). Uses 64MB
maxBuffer on the gh exec because --paginate over 7 days of an active
repo's comments easily exceeds Node's default 1MB.

Backfill verified: 363 comments scanned, 18 matched, 17 unique PRs
populated — exactly the 17 PRs we reviewed via the workflow yesterday.

The new state file is gitignored alongside the existing per-maintainer
files (profile.md, state.json, briefs/).
…1460)

Both SDKs were ~30 patch releases behind. Validation suite passes
(type-check, lint, format, tests across all 10 packages) without code
changes. The only sustained Claude SDK behavior change in the range —
v0.2.111's options.env overlay/replace flap, since reverted to overlay —
is a no-op for Archon, which already passes { ...process.env } as the
SDK env.
Wirasm and others added 28 commits May 21, 2026 12:22
Capture the acceptance criteria and maintenance policy for community
providers in direction.md so PR triage stops devolving into ad-hoc
'should this match Pi or not' debates.

Policy in brief:
- Coding-agent SDK required (no raw chat.completions wrappers — Pi
  already covers ~20 LLM backends via one harness)
- Match the Pi pattern: provider class + options translator + event
  bridge + capability matrix, registered with builtIn: false, tests
  at parity with the Pi suite, docs page in ai-assistants.md
- No cap on acceptance
- Contributor + community maintain; non-functional providers get
  deprecated and removed in the next minor unless someone fixes them

Cite as direction.md §community-providers when triaging.
…shes after SDK cleanup (#1735) (#1739)

The codex-sdk's own finally calls child.removeAllListeners() + child.kill()
before Archon's retry-loop finally runs. The subsequent attemptController.abort()
fires Node's internal spawn-signal abort listener on the now-listenerless child,
surfacing an uncaught AbortError that bypasses try/catch.

The per-attempt AbortController is short-lived and goes out of scope at iteration
end — no explicit abort() cleanup is needed. Caller signal cancellation is
unaffected (removed via removeEventListener in the same finally block).

Closes #1735
#1391) (#1730)

* feat(workflows): add always_run node opt-out for resume caching

Closes #1391.

Adds an optional `always_run: boolean` field on every DAG node. When
`true`, the node re-executes on resume even if it completed in the
prior run. The resume pre-populate filters out always_run node IDs,
and the per-node skip-check is gated by `!node.always_run`.

Use case: producers whose exit code does not validate their output
(bash that writes a file the consumer parses, code generators, fetch
scripts). Today a successful-but-garbage producer stays cached across
every resume; the only escape is renaming the node.

Default is unchanged. Normal cached nodes in the same run still skip.
Emits a new `dag.node_always_run_resume_forced` log event so operators
can see the flag firing.

* workflows: emit node_always_run_reset event on resume opt-out

The always_run resume-forced path only wrote a structured log line.
The prior_success skip path writes a DB workflow_event, so resume
forensics could see skipped nodes but not nodes that were reset from
the skip list. Add a symmetric node_always_run_reset event with the
prior output so operators can reconstruct resume decisions from the
workflow_events table.

Drop the trailing PR reference from the comment — surrounding text
explains intent.
…ed YAMLs (#1733)

Fixes #1535

The workflow-builder's generate-yaml node did not explicitly require
generated workflows to reference $ARGUMENTS (or $USER_MESSAGE). When
the AI generated single-node workflows that accept user input, it
described the input in prose but omitted the $ARGUMENTS substitution
variable. The harness captured the user's invocation message but never
injected it into the node's conversation.

Changes:
- Add rule 13 to generate-yaml prompt: every workflow that accepts user
  input MUST reference $ARGUMENTS in at least one node prompt
- Add validation warning in validate-yaml when neither $ARGUMENTS nor
  $USER_MESSAGE appears in the generated YAML
- Regenerate bundled defaults
…chon-refactor-safely (#1734)

The analyze-impact and plan-refactor nodes are intentionally read-only
(denied_tools: [Write, Edit, Bash]) but their prompts instructed the AI
to write files. This caused the AI to waste turns searching for
unavailable tools, and the plan/analysis was never persisted to disk.

The execute-refactor node then failed to read the plan file, resulting
in zero work done despite the workflow reporting completed.

Changes:
- Update prompts to output analysis/plan directly (captured as node
  output) instead of attempting file writes
- Add persist-impact and persist-plan bash nodes to bridge the context
  boundary by writing node outputs to $ARTIFACTS_DIR files
- Update dependency chain: plan-refactor depends on persist-impact,
  execute-refactor depends on persist-plan

Closes #1477
#1728)

* fix(providers): expand ${VAR_NAME} brace syntax in MCP config env vars (fixes #1612)

Add two-group regex alternation to expandEnvVarsInRecord so both $VAR and
${VAR} forms are expanded in env/headers values. Add 5 tests for the new
brace-form behavior and update MCP servers docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore(ai-layer): evolve AI Layer from PIV run

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…prove/reject (#1743)

* Fix: workflow approve/resume discovery for worktree runs (#1663)

When a workflow paused at an approval gate is resumed via `workflow approve` or
`workflow resume`, the CLI re-invoked `workflowRunCommand` with `run.working_path`
as the discovery cwd. If `working_path` is a worktree or workspace clone that
does not contain the user's local (often untracked) workflow YAML, discovery
failed with "Workflow 'foo' not found" before execution could begin.

Separate the discovery path from the execution path by adding an optional
`discoveryCwd` to `WorkflowRunOptions`. Resume, approve, and reject now look up
the codebase and pass `codebase.default_cwd` as `discoveryCwd`, so the source
repo is searched even when `working_path` lives elsewhere. The execution cwd
and the existing `findResumableRun` keying are unchanged.

Changes:
- Add `WorkflowRunOptions.discoveryCwd`; use it for `loadWorkflows` in
  `workflowRunCommand`
- `workflowResumeCommand`, `workflowApproveCommand`, and `workflowRejectCommand`
  resolve `codebase.default_cwd` (with graceful fallback) and pass it through
- Tests covering discovery from `codebase.default_cwd` and fallback to
  `working_path` when no codebase is available

Fixes #1663

* chore(workflows): regenerate bundled defaults after default YAML updates

* fix: address review findings from PR #1743

- C1: Remove Write from denied_tools on analyze-impact and plan-refactor nodes
  in archon-refactor-safely.yaml — prompts write to $ARTIFACTS_DIR/*.md
- H1: Add else branch with warn log when codebase record not found (null return)
  at all three discoveryCwd sites (resume/approve/reject)
- H2: Log discovery path when discoveryCwd is set so the searched path is
  visible to users debugging workflow-not-found errors
- I1: Add two regression tests for workflowRejectCommand discoveryCwd path
  (codebase found and fallback-when-null), mirroring approve/resume parity
- Fix mock pollution: remove duplicate getWorkflowRun mockResolvedValueOnce
  in "throws when on_reject configured but working_path is null" test whose
  extra queued value leaked into subsequent tests
- L3: Drop caller enumeration from discoveryCwd JSDoc; keep only the why
- L4: Update codebaseId inline comment to include reject as a caller
- L6: Fix workflowRejectCommand JSDoc to describe the auto-resume branch
- M1: Add CHANGELOG entry for the #1663 fix under [Unreleased]
- M2: Rename stale test name "fall through to auto-registration" to accurately
  describe the warn-and-fallback behavior on getCodebase failure
- Regenerate bundled-defaults.generated.ts after YAML changes

* simplify: merge redundant priorCompletedNodes checks into single if/else
…ixes #1738) (#1742)

User-bubble <p> and the .chat-markdown typography rules had no
overflow-wrap, so long URLs and tokens broke out of the max-w-[70%]
container.

- MessageBubble: add break-words + min-w-0 to the flex-1 paragraph so
  it can shrink below intrinsic content width.
- index.css: add overflow-wrap: break-word to .chat-markdown p, li, td,
  and a. Code blocks already use overflow-x-auto and are excluded.
* docs(brand): add brand foundation page on archon.diy

- Mount the canonical Archon brand sheet at `/brand/` in the docs site
  (Penpot-exported standalone HTML, top-right "Console →" cross-link
  surgically removed via a re-runnable patch script).
- Add a Starlight overview page with a Quick reference (gradient,
  surface) and an embedded full brand sheet.
- Sidebar gains a "🎨 Brand" entry between Roadmap and The Book of Archon.
- Fix the dark-mode active sidebar link being unreadable
  (`color: var(--sl-color-white)`).
- Require future UI changes to align with the brand foundation
  (new "UI and Visual Design" section in root CLAUDE.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(brand): switch foundation.html to plain source files, drop decoder scripts

The brand sheet now ships as plain Penpot-exported source (Brand.html shell,
brand-app.jsx, logo.jsx, tweaks-panel.jsx, standalone-tweaks-toggle.jsx,
app.css, archon-logo.png) and is edited like any other code in the repo:
open the JSX, change it, refresh the page.

- public/brand/foundation.html now loads React + Babel from unpkg (with
  integrity hashes) and compiles the JSX in the browser. Adds one local
  override: hide the Penpot Tweaks toggle on the public site.
- brand-app.jsx carries our single local delta: the top-right "Console →"
  cross-link is removed (the sibling Archon Console doc isn't published).
- public/brand/README.md documents what each file owns and the local delta.
- The 1.5 MB self-extracting bundle and the scripts/brand/ decoder pipeline
  (_find-console.ts, _dump.ts, _patch.ts) are deleted. Net: the repo loses
  ~1.5 MB of opaque base64 + 4 maintenance scripts; gains ~85 KB of editable
  source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(web): recognize loop and approval node types in DAG builder

resolveNodeDisplay() fell through to the 'prompt' fallback for loop and
approval nodes, giving them nodeType='prompt' with no promptText.
useBuilderValidation then raised false-positive "prompt cannot be empty"
errors for both node types.

Changes:
- dag-layout.ts: add loop and approval cases to resolveNodeDisplay()
- DagNodeComponent.tsx: extend nodeType union; add TYPE_CONFIG entries
  and getContentPreview cases for loop and approval
- index.css: add --node-loop (teal) and --node-approval (amber) tokens

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(web): add unit and integration tests for loop/approval DAG node types

Tests requested by Wirasm for PR #1722:
- resolveNodeDisplay(): loop node → { label, nodeType, promptText }, approval → { label, nodeType }
- dagNodesToReactFlow() integration: asserts loop and approval nodes have correct nodeType in output
- getContentPreview(): loop multi-line prompt returns first line; approval returns empty string
- Exports getContentPreview from DagNodeComponent.tsx to make it testable
- Extends test script to cover src/components/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: robby_kei <robby_kei@linecorp.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(providers): add GitHub Copilot provider configuration types
Define CopilotProviderDefaults with model, reasoning effort, and auth options
Include system message injection and CLI path configuration support

* feat(providers): add GitHub Copilot community provider integration
Implement full provider with session management, streaming, and binary resolution
Include comprehensive test coverage and lazy-load SDK pattern

* feat(providers): add Copilot provider registration and exports
Export CopilotProvider, config parser, and binary resolver utilities
Register Copilot provider in community providers initialization

* test(e2e): add GitHub Copilot provider smoke and abort tests
Include streaming verification, token validation, and interrupt handling
Verify connectivity, output plumbing, and session management

* feat(copilot): add reasoning effort alias and session timeout improvements
Map Archon `max` effort to SDK `xhigh` and extend sendAndWait timeout to 60min
Handle fork-session requests with fresh session creation fallback

* feat(copilot): add environment variable override support and auto model default
Add COPILOT_MODEL env var with envOverrides tracking across config system
Update provider to default model to 'auto' and enhance settings UI

* docs(copilot): clarify session option handling comment

* feat(copilot): add MCP, skills, agents, and structured output support

Implement full Copilot SDK feature translation including tool restrictions,
session config assembly, and best-effort JSON parsing for structured output

* feat(copilot): respect useLoggedInUser to override env token
test(copilot): cover env token precedence and override behavior

* refactor(copilot): remove isCopilotModelCompatible and model-ref
delete model-ref.ts and model-ref.test.ts
update copilot index and registration to drop isCopilotModelCompatible export

* fix(struct-out): enforce object requirement for structured output parsing
return undefined if parsed JSON is not an object
add tests covering non-object JSON in structured output parsing

* feat(copilot): add isExecutableFile check for Copilot binary
implement isExecutableFile using stat/access and use it in path resolution
update errors to reference executable file and chmod guidance

* feat(copilot): add PATH lookup for copilot binary resolution
export resolveFromPath and prefer PATH result when executable

* ci(workflows): migrate and add Copilot CI workflows
- rename e2e-copilot-abort.yaml to test-workflows/e2e-copilot-abort.yaml
- add e2e-copilot-all-features.yaml and relocate smoke workflow to test-workflows

* refactor(shared): centralize structured-output parsing and skills
update providers to re-export shared implementations
expose shared utilities: tryParseStructuredOutput, augmentPromptForJsonSchema

* feat(registry): register Copilot community provider
update registry tests to cover copilot provider registration
verify no collision with built-ins and copilot appears in lists

* feat(copilot): defer session error warning and harden abort flow
update event-bridge to emit no system chunk on session.error
add provider-hardening tests for abort, trim model config and cleanup

* ci(workflow): simplify output capture in e2e-copilot-smoke workflow

* ci(workflows): restructure Copilot e2e workflows for clarity
refactor multiple files into sections for fixtures, demos, and checks

* ci(workflow): remove e2e-copilot-all-features workflow

* feat(workflows): add e2e-copilot-all-nodes-smoke workflow
delete old e2e-copilot-smoke workflow
extend Copilot smoke tests to cover all node types and structured outputs

* refactor(config): remove envOverrides support and COPILOT_MODEL usage
use DEFAULT_AI_ASSISTANT env var to select default ai assistant
update tests and docs to reflect new default and env var usage

* docs: update Copilot docs and env sample

* feat(copilot): implement token precedence for Copilot auth
introduce COPILOT_GITHUB_TOKEN and generic GH tokens; track tokenSource
reorder provider registration to register Pi before Copilot

* fe​at(copilot): improve binary resolution and skill dir validation
use isExecutableFile for vendor and autodetect checks
validate skill names to reject absolute or traversal paths

* fix: address review feedback on Copilot community provider

- Add packages/providers/src/shared/structured-output.test.ts covering
  augmentPromptForJsonSchema, the happy-path clean parse, fence stripping
  (both ```json and bare ```), the forward-brace scan recovery for
  reasoning-model prose preamble, fence + preamble combo, whitespace
  trimming, invalid JSON, empty input, and the bare-primitive rejection
  contract (null/number/string/boolean).
- Add packages/providers/src/shared/skills.test.ts covering empty/null
  inputs, non-string and empty-string skipping, missing skills, cwd vs
  home resolution order, cwd-shadows-home semantics, deduplication, and
  the name-only contract (rejection of absolute paths, nested paths,
  and parent traversal). Uses a staged temp HOME so reads are isolated.
- Wire both new test files into packages/providers/package.json so they
  run in CI as separate bun test invocations.
- Add `copilot` to the registered-providers list in the validation
  error example at guides/authoring-workflows.md, add a Copilot bullet
  to the Model strings section, and add an AI Providers -- Copilot
  env-var subsection plus DEFAULT_AI_ASSISTANT enumeration to
  reference/configuration.md.

The two duplicate-import HIGH findings from the May 14 review were
hallucinations — the imports don't exist in the current branch — so
they need no fix.

* chore(rebase): resolve semantic conflicts from dev

- Update loadMcpConfig import to ../../mcp/config — #1459 (Codex MCP
  nodes) extracted it out of claude/provider.ts into its own module.
- Regenerate bun.lock from current dev (configVersion: 1). Old commits
  on this branch carried configVersion: 0; rebased forward unchanged
  but produced different transitive resolution on install (telegram
  markdown tests fail locally despite identical telegramify-markdown
  pin). bun install re-adds @github/copilot-sdk on top of the fresh
  lockfile.

* test(copilot): address CodeRabbit feedback on shared/skills tests

- Stage the home copy of `delta` in `.agents` (not `.claude`) so the
  "prefers cwd over home" precedence test actually verifies precedence
  within `.agents`. Previously the home copy was in `.claude`, which
  could not have beaten the cwd `.agents` copy regardless of the
  resolver's behavior.
- Add explicit return types on `makeFakeWorld` and the inner
  `stageSkill` to satisfy the project's strict TS annotation rule.

* fix(providers): address remaining Wirasm review items

- pi/event-bridge.ts: consolidate the `export-from` + `import-from`
  pair on shared/structured-output into the idiomatic
  `import { X }; export { X };` form. The preceding comment already
  promised "import once for local use and re-export" but the prior
  order said the opposite.
- authoring-workflows.md: add `copilot` to the prose listing of
  registered providers (the example validation error string below it
  already includes copilot).

* chore(copilot): drop stale "Claude's loadMcpConfig" attribution

#1459 (Codex MCP nodes) extracted loadMcpConfig out of
claude/provider.ts into a shared mcp/config.ts module. Update the
applyMcpServers docblock to reflect that the helper is shared, not
Claude-specific.

---------

Co-authored-by: Daniel Scholl <daniel.scholl@microsoft.com>
Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
…1384)

* feat(providers): add OpenCode community provider with correct capabilities

- Add OpenCode provider using @opencode-ai/sdk
- Support both embedded server and external server modes
- Implement session resume, MCP, structured output, env injection
- Correctly declare capabilities: hooks, skills, agents, toolRestrictions,
  effortControl, thinkingControl all supported
- Add model/agent validation (one required)
- Include E2E smoke workflow and registry tests
- Update docs with auth guidance and feature table

* feat(providers/opencode): remove agent field - use Archon's own agent impl

Archon has its own agent implementation and should not delegate to
OpenCode's agent profiles. Removed the agent field from:
- OpencodeProviderDefaults interface
- parseOpencodeConfig parsing
- streamOpencodeSession function
- Updated capabilities to agents: false

Model is now required (no agent fallback).

* feat(providers/opencode): enable agents support with adaptation layer

- Flip agents capability from false to true
- Add agent adaptation layer that maps nodeConfig.agents to OpenCode API:
  - Agent selection by sorted key order
  - Model override from agent config
  - Tools permissions map (deny wins)
- Add 4 tests for agent adaptation behavior
- Update smoke test to verify agent field works

* fix(providers/opencode): address PR review feedback

- Fix assert node to fail with exit 1 when pattern not found
- Set effortControl/thinkingControl to false (not wired to SDK)
- Replace generic 'terminated' with specific crash patterns
- Add TODO for health endpoint (SDK limitation)
- Fix race condition in releaseEmbeddedRuntime
- Call iterator.return on abort in abortableStream
- Tighten isOpencodeModelCompatible validation
- Add agent field to OpencodeProviderDefaults type

* fix(providers/opencode): address Oracle validation issues

- Fix race condition: capture runtime instance at acquire time
- Add agent field parsing in parseOpencodeConfig
- Tighten isOpencodeModelCompatible to trim whitespace
- Update registry test for effortControl/thinkingControl

* fix(providers/opencode): address all CodeRabbit review feedback

- Replace session.create() health check with global.health() (stateless)
- Yield terminal result chunk when stream ends before session.idle
- Move comment under agent: field in ai-assistants.md
- Change 'Inline sub-agents' support to ⚠️ Partial
- Preserve insertion order in selectPrimaryAgent (remove .sort())
- Remove redundant nodeConfig argument from streamOpencodeSession
- Preserve error structure in session.error handler (err.cause)
- Consolidate model-ref validation (parseModelRef in registration.ts)
- Update test mocks to include global.health()

* fix(providers/opencode): address latest CodeRabbit review feedback

- Add warning when multiple agents configured (first wins)
- Add 2s timeout to global.health() probe
- Add TODO for skipped abort test
- Consolidate imports in registration.ts
- Fix TypeScript error: use deferred pattern for creationPromise

* fix(providers/opencode): address remaining PR review feedback

- Fix deferred pattern hang: wire both resolve and reject in deferred
  promise so startup errors propagate to callers (3137799074)
- Fix server close leak: decouple server.close() from cache identity
  check in releaseEmbeddedRuntime (3137799084)
- Update TODO reference to follow-up issue #1400 for abort test (3136883117)

* fix(providers/opencode): use direct HTTP fetch for health check

The SDK's global.health() method only exists in v2, but we import from
the root SDK which uses the old client. Switch to direct HTTP fetch to
/global/health endpoint for checking existing servers.

- Remove global.health from OpencodeClientLike interface
- Use fetch() directly with 2s timeout for health check
- Update tests to mock fetch for health check scenarios

* fix(workflows): bash quoting for linux compatibility

* refactor(providers/opencode): decompose provider into focused modules

Extract runtime, session, multi-agent, agent-config, agent-fs, and error
handling into separate files to reduce provider.ts complexity. Add inline
multi-agent e2e workflow and expand test coverage.

* Self AI Review suggestion.

* chore: update opencode e2e smoke test with hooks coverage + refresh docs

Add hook-node to e2e smoke workflow covering PreToolUse/PostToolUse hooks
(10 node types total). Switch smoke model to cpamc/minimax. Remove
deprecated baseUrl option and refresh feature support table in docs.

* chore(providers/opencode): improve abort error logging and multi-agent e2e workflow

* test(workflows): use default model for opencode e2e tests

Switch from cpamc/minimax to opencode/big-pickle (provider default)
for general e2e testing of OpenCode provider.

* fix: match homebrew formula to upstream/dev

* fix(providers/opencode): address code review findings

- Add CHANGELOG.md entry for assistants.opencode provider (#1703)
- Elevate silent debug catches to warn level with context (session, multi-agent, runtime)
- Preserve error cause chain in retry loop (provider.ts)
- Include retry count in final throw message
- Fix doc typo: cofnig -> config
- Update CLAUDE.md monorepo layout with community/opencode/

* chore: align SDK versions with origin/dev

* version downgrade fix.

* Add opencode-ai sdk

* fix: enable abort test and remove redundant isModelCompatible

- Enable skipped abort test with deterministic setTimeout timing
- Remove unused isOpencodeModelCompatible function from registration
- Remove isModelCompatible test from registry tests
- Update bundled defaults with archon-four-role-loop workflow

* chore: regenerate bun.lock to sync with package.json after rebase

CI was failing on 'lockfile had changes, but lockfile is frozen' — the
lockfile was missing the overrides entries (@hono/node-server, flatted,
follow-redirects, path-to-regexp, qs) and had a stale @archon/providers
version (0.3.9 → 0.3.12) after rebasing onto current dev.

Net diff: +11/-8 in bun.lock, no source changes.

* chore: regenerate bundled defaults to sync with current commands state

CI failed on 'bundled-defaults.generated.ts is stale' after the lockfile
fix unblocked the install step. The generated file was 1 line out of date
relative to current dev's command set (drift from rebases). Functional
diff is +1/-2 (a single trailing-newline difference in one embedded
command); full diff is large only because the file inlines all commands
as TypeScript strings.

This is mechanical — produced by 'bun run generate:bundled' with no other
changes.

---------

Co-authored-by: cropse <cropse0219@gmail.com>
Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
* fix: coalesce transient chat status updates

* feat(web): improve streaming thinking and tool readability
…es (#1523)

When a workflow run is approved/rejected via the Web UI but
`tryAutoResumeAfterGate` cannot auto-resume — because there is no
`parent_conversation_id`, the parent conversation is gone, or the parent
sits on a non-web platform (Slack/Telegram/GitHub/CLI) — the success
message said only "Send a message to continue" / "On-reject prompt will
run on resume". A web-UI user whose run originated from a terminal has
no obvious next step from that text and the run sits in `failed` status.

Both approve and reject (on_reject branch) now include the exact
`archon workflow resume <runId>` command in the non-auto-resumed
response, so the web-UI surface always carries an actionable next step.

The auto-resume happy path and the no-on_reject cancellation path are
unchanged. The Resume endpoint's CLI hints (covered by #1329) are not
touched.

Closes #1522.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* experiment(console): scaffold primitives-first web UI spike at /console

Greenfield spike of Archon's web UI built around four primitives — Project,
Run, Workflow, Worktree — to validate a simpler mental model before any
migration. Lives under packages/web/src/experiments/console/, mounted at
/console/* outside the shared Layout so it does not inherit the production
TopNav. ESLint no-restricted-imports scope forbids coupling to @/components,
@/contexts, @/hooks, @/routes, @/stores, @tanstack/react-query so the spike
stays extractable or disposable.

Surface:
- Project rail (Discord-style 44x44 tiles with deterministic hashed colors)
  with ALL scope toggle, remove-project via right-click, Add Project dialog.
- Runs view split into Active (rich cards for running/paused, pulsing blue
  LiveDot for running + amber for paused) and Recent (compact monospace rows
  for completed/failed/cancelled). Attention model: running is attention,
  completed is audit trail.
- DraftRunCard — inline "start a run" primitive that lives at the top of
  the Active list. Collapsed = thin + Start a new run row; expanded = full
  card with workflow picker + context textarea. Same shape as a paused
  approval card; N keybind expands.
- ApprovalPanel with ApprovalContext preview — shows the actual last
  agent message so users see the question being asked, not just the gate
  label. Supports capture_response gates and traditional approve/reject.
- Run detail page — header with live-ticking elapsed, StreamToolbar with
  Tool calls / System / Graph toggles persisted to localStorage, stream of
  StreamCards (message / tool / artifact / node_transition), state-
  sensitive ActionBar (cancel/resume/abandon/re-run). Relative timestamps
  (+MM:SS from run start) via a small StreamContext provider.
- RunGraphPanel sidebar — dagre TB layout, parallel nodes side-by-side,
  loop/approval/bash/command/script/prompt glyphs, status-derived from
  node_transition events, click a node to scroll-into-view.

Skill API (packages/web/src/experiments/console/skills/) is the single
mutation surface: listProjects/getProject/addProjectBy{Url,Path}/
removeProject/listWorkflows/getWorkflowGraph/listWorktrees/listRuns/
getRun/startRun/cancelRun/approveRun/rejectRun/resumeRun/abandonRun/
listMessages. Every UI action calls exactly one verb; internal
orchestrators (CLI, Claude Code skill, future LLM driver) hit the same
contract. startRun hides the legacy conversation coupling as a two-call
createConversation -> runWorkflow sequence.

State layer (store/cache.ts) is a Map + subs + useEntity hook. No React
Query, no Zustand, ~100 LOC. Polling fallback every 3s until SSE lands.

Warm theme scoped to .console-root (theme.css) — espresso surfaces,
tangerine accent reserved for CTAs, ocean-blue running, teal-green
completed/approve, amber paused, warm rose-red failed. Production theme
untouched.

Preview route at /console/_preview renders every status, every origin,
swatches for each token.

Milestones done: M1 scaffold, M2 skill+store+populated feed, M3 run
detail + event stream, M5 DraftRunCard, M3 polish (sticky toolbar,
compact tool cards, relative timestamps, empty/system filter, compact
user chips, graph sidebar). Pending: M4 SSE live updates, M6 polish.

* experiment(console): widen project rail with editable title + locator, fix invalidate-without-reload

Rail goes from 44x44 abbreviation tiles to a 240px sidebar of two-line rows:
small color dot + title + monospace locator (owner/repo from a git URL, last
two path segments otherwise). Title is editable per-project — double-click to
rename, Enter saves, Esc cancels, blank reverts to the API name. Override
persists in localStorage (console:displayName:<id>) via a small useDisplayName
hook so the spike stays self-contained. Right-click still removes.

Also fixes a latent bug in store/cache.ts: invalidate() and refetch() cleared
the cache and notified subscribers but never re-ran the loader, so add/remove
project and the run-action / approval flows all required a page reload to
reflect new state. useEntity now registers its loader and ensureLoad() refires
it on any cleared key that still has an active subscriber.

ProjectTile is left in place — still used by /console/_preview.

* experiment(console): fix startRun — pass platform id to dispatch, recover run id by polling

Two bugs were preventing workflows from launching from the spike:

1. The dispatch call was sending conv.id (DB UUID) where the route looks the
   conversation up via findConversationByPlatformId. The lookup silently
   returned null, the orchestrator dispatched against an unknown reference,
   and no workflow_run was ever created. Fix: pass conv.conversationId (the
   web-<ts>-<rand> platform id) to /api/workflows/:name/run. Keep conv.id (the
   DB UUID) for the parent-conversation match in the recovery step.

2. POST /api/workflows/:name/run returns { accepted, status } — never a run
   id, since the workflow_run row is written asynchronously inside the
   orchestrator after the HTTP response returns. The old extractRunId() always
   threw. Replace with pollForRun(): fetches /api/dashboard/runs filtered by
   codebaseId, matches on parent_conversation_id === conv.id, returns the
   first hit. Bound at 30s / 400ms interval to absorb cold-start worktree and
   isolation-env setup; timeout message points users to the active list since
   the run is almost certainly already running by then.

* experiment(console): make startRun optimistic — dispatch and let the runs feed surface the new run

Submit-button no longer blocks for up to 30s while the orchestrator spins up
worktrees and isolation envs. startRun now does just the two dispatch calls
and returns; the workflow_run row appears in the active list as ambient runs
polling picks it up. DraftRunCard fires an immediate invalidate('runs') after
dispatch to nudge the next refetch instead of waiting up to 3s for the next
poll tick.

Drops pollForRun + the runId return value — callers were the navigate-to-run-
detail path only, which traded one bad UX (30s spinner) for another (forced
context switch away from the runs list right after starting). The active card
that appears within a few seconds is a better affordance.

* experiment(console): port to Archon brand foundation — duotone gradient + Geist

Replace the warm espresso/tangerine palette with the cool charcoal +
brand-magenta-to-teal duotone from the Archon brand standalone. All changes
remain scoped under `.console-root` so the production /app surface is
untouched.

theme.css
  - Surfaces shift hue 40° → 265° (warm → cool charcoal)
  - Accent tokens point at --brand-magenta; --success uses --brand-teal so
    affirmative reads as brand
  - --brand-gradient + .brand-text / .brand-bar / .brand-bar-soft utilities
    added (gradient-soft is the translucent wash used for selected states)
  - --accent-ring set to 30% alpha magenta, matching the brand spec
  - Geist + Geist Mono loaded from Google Fonts on console route mount
    only; .console-root font-family override + higher-specificity .font-mono
    rule beat Tailwind v4's @theme inline literal

Components
  - ConsoleApp wordmark: .brand-text on "Archon"
  - DraftRunCard: 4px gradient strip as an absolute child (keeps the card's
    overflow:visible so the workflow picker dropdown can escape); Start run
    button background is the duotone bar
  - FilterChips: active filter shows a 2px gradient underline pill
  - ProjectRail: ALL projects pill now uses brand-bar-soft instead of the
    chunky 2px ring with offset

* experiment(console): brand the run detail page

The first brand pass cascaded surfaces + accents into the detail view via
tokens but never threaded the gradient itself through, leaving the timeline
visually flat. This adds three brand moments:

RunDetailHeader
  - 1px brand-gradient strip along the bottom edge (replaces the flat
    border-border line) anchors the detail view in the same way the
    DraftRunCard strip anchors the runs feed
  - Run id renders with .brand-text so the focal piece of mono data carries
    the duotone

StreamCard
  - YOU pill: accent-soft background + brand-magenta text
  - AGENT pill: success-soft background + brand-teal text
  Role pills now read as the brand duotone across every exchange — magenta
  for user (presence/authorship), teal for agent (execution/affirmative)

RunGraphPanel
  - Same 1px gradient strip under the GRAPH label; bumps the label color
    from tertiary to secondary so the panel header doesn't disappear

theme.css
  - Adds --running-soft / --success-soft / --warning-soft / --error-soft
    translucent companions for status colors (StreamCard now consumes
    --success-soft; the others are there for symmetry)

* fix(experiment/console): surface tool calls from workflow_events

Two independent bugs caused tool calls to never render on the run detail
page despite the toggle being on.

primitives/event.ts
  - Server emits `tool_called` / `tool_completed`; the normalizer matched
    `tool_started` (a name that's never written). Result: 43 tool_called
    events fell through to the text-fallback branch and rendered as
    junk-string placeholders elsewhere
  - Field names were also wrong: read `toolName` / `args` / `durationMs`
    instead of the snake_case `tool_name` / `tool_input` / `duration_ms`
    actually present in the JSONB payload, so the few tool_completed
    events that did match the branch produced empty entries that
    downstream filters dropped

components/RunStream.tsx
  - Even with the normalizer fixed, RunStream explicitly skipped
    `tool_call` events under the assumption that conversation metadata
    is canonical. That's true for Claude (the SDK persists into
    message.metadata.toolCalls) but false for Pi / Codex / bash nodes,
    which only emit workflow events. Now: if no message carries inline
    tool calls, the paired workflow tool events are surfaced instead.
    Pairing matches each tool_called to the next unclaimed tool_completed
    in the same step so the duration shows correctly.

routes/RunDetailPage.tsx
  - Toolbar `toolCallCount` mirrors the same source-of-truth rule so the
    "X tool calls" header counts the rendered events, not just the
    (empty) inline metadata

* fix(experiment/console): tab bar for Log/Graph, wire System toggle, fix subfoldered workflow 404

Detail page now has a Log / Graph tab pair instead of a fixed-width log
with an optional right-rail graph. Both views get the full main content
area (next to the project rail); switching between them is a toggle, not
a side-by-side compromise.

StreamToolbar
  - Hosts the tab pair (Log / Graph) on the left with the gradient
    underline indicating the active tab
  - "X messages · Y tool calls" + Tool calls / System checkboxes only
    render when the Log tab is active — irrelevant in Graph view

RunGraphPanel
  - Drops the fixed 420px aside chrome; renders as full-width content
  - Bigger node dimensions (160×40, 56/20 sep) for the larger canvas
  - Returns to centered overflow-auto when content exceeds viewport

RunDetailPage
  - `view: 'log' | 'graph'` state persisted to localStorage
  - Layout switches single-view; Log view drops the 820px max-width so
    the stream uses the full main area
  - Clicking a graph node switches to Log and scrolls to that node's
    transition

System toggle (the second half of the fix)
  - workflow_started / workflow_completed / workflow_failed were
    falling through to the text-fallback branch, rendering as junk
    `workflow_started — {payload}` strings
  - Added SystemEvent kind + explicit branch in `toRunEvent`; surfaced
    in RunStream as compact rows behind the System toggle
  - Error events also flow into the same system bucket

Graph 404 fix
  - The single-fetch `/api/workflows/:name` endpoint doesn't recurse
    into `.archon/workflows/<subdir>/`; subfoldered workflows like
    `maintainer/maintainer-review-pr.yaml` were unreachable
  - `getWorkflowGraph` now goes through the list endpoint (which does
    recurse) and filters by name. One extra row of JSON, but the graph
    now resolves for every workflow Archon knows about

* feat(experiment/console): live updates via SSE, drop 3s polling

Replaces the per-page 3s setInterval polling loops in RunsPage and
RunDetailPage with subscriptions to the server's existing SSE streams.
Events flow through the existing cache: an SSE message invalidates the
relevant cache keys, useEntity refetches authoritative state, the UI
re-renders. No partial in-memory event-payload merging — keeps the wire
shape decoupled from React state.

lib/sse.ts
  - useDashboardSSE subscribes to /api/stream/__dashboard__ and
    invalidates runs:* (and run:<id> if the event carries a runId) on
    workflow_status / dag_node events. Mounted from RunsPage.
  - useRunStreamSSE subscribes to /api/stream/<conversationPlatformId>
    and invalidates run:<id> + messages:<convId> on text / tool_call /
    tool_result / workflow_* events. A 100ms coalesce timer dedupes
    bursts from streamed text. No-ops while the conversation id is
    still null (e.g. before the run detail loads).

RunsPage
  - Drops the 3s setInterval that re-fetched listRuns; calls
    useDashboardSSE instead.

RunDetailPage
  - Drops the 3s setInterval that re-fetched getRun + listMessages;
    calls useRunStreamSSE with the platform conversation id.

EventSource auto-reconnects on transient failures, so no explicit
recovery logic is needed; permanent close happens at unmount.

* feat(experiment/console): make System toggle reveal real diagnostic content

The toggle was technically working but only added two thin rows
(workflow_started / workflow_completed) for Pi-driven runs that lack
system-role messages. Functional but invisible. This pass turns it into
the framework-chatter view it should always have been.

What System now reveals
  - Workflow lifecycle: workflow_started / workflow_completed /
    workflow_failed (existing, now styled to stand out)
  - Skipped-node reasons: when a node is skipped, an inline second line
    on the NodeDivider shows `reason when_condition · expr ...` — catches
    DAG-branching surprises without making the user open the YAML
  - Workflow dispatch metadata: assistant messages with
    `category: workflow_dispatch_status` (carrying a workflowDispatch
    blob) now collapse into a compact 'Workflow dispatch' system row
    displaying the workflow name, instead of being rendered as agent
    prose. Same for any message whose metadata.category starts with
    workflow_ or system_
  - Empty / no-signal messages: previously dropped by isMeaningful();
    now surface as 'Noise' rows so the timeline is gap-less and SDK
    plumbing chatter is visible

Styling
  - System rows now use brand-teal for the pill label + a translucent
    teal top hairline (instead of a flat charcoal border on all sides).
    Border colors land via inline style because the console's
    .console-root * { border-color: var(--border) } rule outweighs
    Tailwind utility-class color in the cascade; this finally makes
    border-success/30 and friends paint the intended hue too

Cleanup
  - StreamCard kind styles now own their full border (width + sides +
    color) rather than splitting between the base class and a partial
    override
  - message.ts exports isSystemCategory + WorkflowDispatchMeta so
    RunStream can keep the rendering decision local
  - event.ts NodeTransitionEvent carries skipReason + skipExpr;
    NodeDivider accepts them and renders only when showDetail is true

* feat(experiment/console): cost on cards + reject-with-reason expander

Cost on cards
  - Read `metadata.total_cost_usd` into a typed `Run.costUsd: number | null`
  - formatCost picks precision by magnitude: $24.35 / $0.023 / $0.0082
  - Surfaces on RecentRunRow (between elapsed and origin badge), on
    ActiveRunCard (between origin and elapsed), and on the run detail
    header (between origin and elapsed). Hidden when null
  - typeof === 'number' guard so demo runs without the field don't blow
    up at .toFixed()

Reject-with-reason
  - ApprovalPanel now has two distinct flows instead of one shared field
    + Approve / Continue: one click, single-line input above for an
      optional comment captured as $<node-id>.output
    + Reject: two-step. First click reveals a 3-row textarea with a red
      "REASON FOR REJECTING · REQUIRED" label; confirm only enables
      with non-empty text
  - Cmd+Enter confirms reject, Esc cancels back to idle
  - Reduces accidental rejects (which previously fired on any click of
    a single button when the input happened to be non-empty) and makes
    the reviewer's reasoning explicit and unavoidable

* feat(experiment/console): per-project env vars dialog

A gear icon on each project row in the rail (visible on hover / always
on the selected row) opens an EnvVarsDialog modal that lists, adds, and
removes per-project environment variables. Wires straight into the
existing GET/PUT/DELETE /api/codebases/:id/env endpoints.

Design notes
  - The server never returns values, only keys — the UI mirrors that
    constraint (no "reveal" affordance, no edit-in-place). To rotate a
    secret the user adds a new value at the same key; the server
    overwrites
  - Key input auto-uppercases for the conventional ENV_VAR_NAME look;
    value input uses type=password so it doesn't shoulder-surf
  - Cache invalidates on every dialog open so external edits (CLI, other
    web sessions) show up — without it the in-memory cache pinned the
    stale empty list across close/reopen
  - skill.listEnvVarKeys / setEnvVar / deleteEnvVar live in a new
    skills/envVars.ts module, exported through skills/index.ts to match
    the existing skill-verb surface

* feat(experiment/console): artifact tab with sidebar + viewer

Adds a third tab on the run detail page that lets you browse and read the
files a run wrote to disk — the new go-to surface for plans, reports, PR
diffs, and synthesis docs that workflows produce as their actual output.

Server: GET /api/runs/:runId/artifacts
  - Walks the run's artifact directory (recursively, dotfiles skipped)
  - Returns { files: [{ path, size, modifiedAt }] }
  - Needed because workflow_artifact events are empty for nearly every
    run we have — bash/script nodes write straight to $ARTIFACTS_DIR
    without emitting an event, so an event-driven file list shows nothing
  - Reuses the same owner/repo derivation + path-escape guards the
    existing /api/artifacts/:runId/* handler uses

Client: ArtifactPanel
  - 260px sidebar lists every file with size + parent-dir hint; clicking
    a row loads it into the main viewer
  - Viewer renders .md / .mdx through react-markdown + GFM + rehype-
    highlight (same stack the old UI used), everything else as
    pre-formatted monospace text
  - Auto-selects the first file on mount so the tab isn't empty
  - "open raw ↗" link in the file header for downloads or PR pasting
  - Empty-state copy points at $ARTIFACTS_DIR so users understand what
    fills the panel

StreamToolbar
  - Tabs now accept an optional count; Artifacts shows it ("ARTIFACTS 7")
    so users can tell at a glance whether a run produced anything

RunDetailPage
  - The artifact-list useEntity is hoisted above the early returns so
    React's hook order stays stable (the obvious-in-retrospect bug that
    hit the first attempt — early returns after running detail-related
    hooks meant the artifacts hook didn't fire on the loading render)
  - Cache key is K.artifacts(runId), shared between the tab badge and
    the panel so navigating to the tab doesn't refetch

* feat(console + server): file upload on DraftRunCard

Server
  - /api/workflows/:name/run now accepts multipart/form-data alongside the
    existing application/json. conversationId + message + files[] (max 5,
    ≤10 MB each). Body schema dropped from the OpenAPI route config so
    @hono/zod-openapi doesn't try to validate multipart against the JSON
    shape — same pattern sendMessageRoute uses. Handler manually branches
    on content-type
  - persistUploadedFiles helper lifted out of sendMessageRoute so both
    routes go through the same validate-write-rollback logic. Returns
    either { ok: true, savedFiles, uploadDir } or a structured error the
    caller forwards via apiError. sendMessageRoute is untouched for this
    pass; could be refactored to use the helper later
  - extraContext.attachedFiles + filesToCleanup are passed straight to
    dispatchToOrchestrator so cleanup happens inside the lock handler,
    after handleMessage completes — matches the freeform-message flow

Client
  - skill.startRun gains an optional files: File[]. With files, posts
    multipart (browser-set boundary); without, keeps the JSON path
  - DraftRunCard handles three input paths the chat input has always
    handled: drag-and-drop on the whole card, paste of clipboard images
    inside the textarea, and a paperclip button that opens the file
    picker. Same MAX_FILES=5 and MAX_FILE_BYTES=10 MB caps the server
    enforces, surfaced as inline errors
  - File chips render above the start row with name + size + remove (X).
    Drag-over shows a brand-gradient-soft overlay with a "drop files to
    attach" pill so the affordance is obvious without persistent chrome
  - Collapse / submit both clear the file list so reopening the card
    starts clean

* feat(experiment/console): open-in-IDE, rerun, SSE-drop safety net

Three tier-2 affordances that each accelerate the iteration loop without
adding chrome.

Open in IDE
  - vscode://file/<workingPath> button on ActiveRunCard (hover), every
    RecentRunRow (hover), and RunDetailHeader (always visible)
  - Hidden when /api/health reports is_docker=true. The first request
    defaults isDocker to true so a flash of broken links inside Docker
    never happens — matches the old UI's safer default
  - new lib/health.ts exposes useIsDocker() (cached via useEntity on the
    'health' key so all callers share one fetch) and openInIde(path)
    which normalises backslashes on Windows paths the same way the old
    Header.tsx did

Rerun
  - ↻ button on completed/failed/cancelled RecentRunRows. Navigates to
    /console/p/<id>?rerun=1&workflow=<name>&message=<userMessage> with
    URLSearchParams so spaces / unicode survive
  - DraftRunCard watches searchParams: when rerun=1 arrives (whether by
    fresh mount or by within-component navigation) it expands the card,
    fills the workflow picker + textarea, then strips the params via
    setSearchParams(..., { replace: true }) so a reload doesn't re-fire
  - Deliberately depends on [searchParams] not [] — the rerun click
    typically lands while DraftRunCard is already mounted (same project
    route, search-param-only change). The empty-deps version was the
    bug that made the first attempt look like nothing happened

SSE-drop safety net
  - 30s setInterval on RunDetailPage that invalidates K.run(runId) +
    K.messages(convId) while status is running or paused
  - Stops automatically the moment status flips terminal, so it's not
    polling proper — just a heartbeat refetch that catches dropped SSE
    streams (network hiccup, mobile sleep/wake) without us noticing
  - Replaces nothing — the existing useRunStreamSSE keeps streaming
    when the connection is alive; this is purely a "if we missed the
    terminal event, find it within 30s" insurance

* fix(experiment/console): project rail — selection visible, identity vs status, real path locator

Five compounding issues in the project rail, all addressed.

1. Routing param read (the load-bearing bug)
   ProjectRail mounts outside the inner <Routes> (it's sibling to <main>
   in ConsoleApp), so useParams() returns {} for it. `scope` was
   always 'all'; the ALL PROJECTS button was always aria-pressed=true;
   the selected ProjectRow never received selected:true and therefore
   never showed the ring or background. Fix: useLocation() + a regex
   pull on `/console/p/:id`.

2. Selection is now unmistakable
   Each row paints a 4px brand-gradient left strip + bg-surface-elevated +
   brighter title when selected. Replaces the magenta ring (which was
   invisible against the dark inset background even when it did fire).
   The gradient strip rounds at the corners via rounded-l-md so we
   don't need overflow-hidden on the row — which had been clipping the
   ⋯ menu dropdown.

3. Identity vs status disambiguated
   The hash-coloured dot was identity (project tile color) but read as
   a status indicator. Replaced with a 20×20 rounded square showing the
   project's first letter on the hash-coloured background — clearly a
   "this is which project" affordance, can't be confused with status.

4. Activity status, when it exists
   Right-side dot is now real: pulsing blue when the project has a
   running run, pulsing amber when paused, solid red when only failed
   runs are recent. Idle projects show nothing. Sources data from the
   shared K.runs('all') cache (so the dashboard SSE invalidation we
   already have keeps it live; no extra fetch). Priority: running >
   paused > failed-only, so a project with one running and one failed
   run reads as "running", not "broken".

5. Locator below the name = the actual local path
   formatProjectLocator now returns `~/path/to/project` (homedir
   shortened). The old `owner/repo` derivation was identical to the
   project name for github projects, so the row read as duplicated
   text. After rename, the path stays as a stable identity anchor —
   which is what the user wanted: "rename a project but still show the
   path below."

Bonus fixes
   ALL PROJECTS button: same selection treatment as project rows
   (strip + elevated bg), sentence case label ("All projects"), uses
   an `∗` avatar in a small square — visually consistent with rows.

   Remove project is now discoverable: ⋯ menu button on hover (always
   visible on the selected row), opens a small dropdown with "Remove
   project". Right-click still works for power users and now also
   opens the same menu.

   Add project hover treatment normalised to border-bright/surface-hover
   to match the rest of the rail (used to be magenta).

* refactor(experiment/console): drop avatar + activity dot from project rail

Both added noise more than signal:

  - The first-letter avatar carried no information for owner/repo names
    (we were rendering the owner's first letter). Removed it entirely
    rather than try to derive something cleverer
  - The right-side activity dot lit up red for any project with a
    failed run in recent history. That's a thing that happened, not
    something the user needs to act on from the rail. Removed

The rail row is now: optional gradient strip when selected, title,
path subtitle, hover actions (gear + ⋯). Selection is still
unmistakable via the brand strip + elevated background + brighter
title color. Width is reclaimed for the path (Widinglabs/sasha-demo's
full ~/Projects/mine/sasha now fits where it was truncated before).

Also drops the matching ∗ avatar from the "All projects" row for
consistency, and the K.runs('all') fetch + deriveActivityByProject
helper that only existed to feed the now-gone status dots.

* feat(console + old ui): real logo, drop spike chrome, cross-UI switch buttons

Console header
  - Replace text-only "Archon" with the actual shield mark from
    packages/web/public/favicon.png (the existing brand mark) +
    gradient wordmark
  - Drop the "spike" badge — the experiment is real enough now; the
    "console" tag stays as a "this is a separate surface" hint
  - Drop the stray "m2 populated" telemetry text in the right slot;
    replaced with a small "← Old UI" link so users always have an
    escape hatch back to the classic chrome

Old UI TopNav
  - Add a gradient "Try the new console →" CTA between the last tab
    and the version readout. Inline-styled with the brand
    magenta → violet → teal gradient because the old UI's token set
    doesn't include the brand-gradient variables (those live in the
    console-scoped theme.css)
  - Sized to read as a primary CTA without dominating the nav. Arrow
    nudges 2px on hover for an inviting affordance

* tweak(old ui): rename console CTA to 'Try the new console UI'

* fix(experiment/console + server): satisfy validate suite after rebase

Type-check
  - Demo run factories in RunsPage and PreviewPage now include
    costUsd: null so the test fixtures match the Run type that was
    extended with the new cost field
  - startRun's HttpError throw on multipart failure now passes the
    URL path as the 2nd arg (HttpError takes status/path/body) so
    the upload-error path constructs correctly

Server test
  - /api/workflows/:name/run only forwards the message metadata 4th
    arg to addMessage when files are present, so the JSON path keeps
    the 3-arg signature the existing api.workflow-runs.test asserted

Format
  - prettier --write on eslint.config.mjs and theme.css

Telegram-markdown blockquote tests are 3 pre-existing failures on dev
(verified by checking out dev's adapters/ before the run) — unrelated
to this PR's scope.

* fix(console): correct silent invalidate + recover errored entries (C1+C2)

The cache's invalidate(prefix) checked `key === prefix || key.startsWith(`${prefix}:`)`
so passing 'runs:' looked for 'runs::' — three callers (ApprovalPanel
approve/reject, RunActionBar cancel/resume/abandon) silently did nothing,
and the runs feed only refreshed on the next SSE event. Drop the trailing
colon at the three sites.

Separately, errored cache entries lived only in the `errors` Map, but
invalidate() walked `cache.keys()` only — so a failed fetch was stuck
until full page reload. Extend the walk to both maps so recovery works.

* fix(server): guard new artifacts route + register OpenAPI (C3+I1+I3+I4)

Convert GET /api/runs/:runId/artifacts from raw app.get() to
registerOpenApiRoute against a typed schema (ArtifactFile +
ListArtifactsResponse in workflow.schemas.ts). The route was the only
recently-added endpoint bypassing the project's OpenAPI rule
(CLAUDE.md L25) without a constraint that justifies it — the response
is plain JSON of a fixed shape. Generated types now include it, so
skills/runs.ts re-exports the schema type instead of maintaining a
parallel hand-written interface (I3).

Other guards on the same handler:
  - I1: defense-in-depth path-containment check on the resolved
    artifact directory. A maliciously crafted codebase name (`..` in
    owner/repo) would have escaped ARCHON_HOME; now blocked with a
    400 + artifacts.path_escape_blocked log
  - I4: getCodebase() now wrapped in try/catch, mirroring the
    getWorkflowRun() block above it. DB errors produce a logged 500
    instead of an unlogged crash
  - I3: stat() error swallow narrowed — ENOENT/EACCES are skipped
    (file deleted mid-walk, permission flip) but unknown errors now
    propagate to the outer artifacts.walk_failed log + 500 response,
    so we never return a half-list silently

* fix(console): real defects from review (CR-1..CR-5, CR-7, CR-9, I2, I5)

- AddProjectDialog: import FormEvent from 'react' instead of relying on
  the ambient React namespace which isn't actually imported here. Real
  type bug in strict-mode setups (CR-1)
- lib/sse: route EventSource opens through SSE_BASE_URL so dev bypasses
  the Vite proxy. The proxy buffers SSE; bare paths reintroduce the
  buffering useSSE already worked around in the old UI (CR-2)
- DraftRunCard: guard Enter-submit during IME composition. Without the
  e.nativeEvent.isComposing check, Japanese/Chinese/Korean candidate
  selection dispatches the run prematurely (CR-3)
- display-name: wrap localStorage in try/catch. Private-browsing modes
  throw SecurityError and crashed the rail row on mount (CR-4)
- ActiveRunCard: add role/tabIndex/onKeyDown so the card is operable
  with Enter/Space, matching RecentRunRow which already had this (CR-5)
- eslint.config: harden import-restriction patterns. * → ** so nested
  paths (@/components/layout/foo) can't slip past, and the @/lib/api
  restriction now applies to all named imports rather than only the
  default. Generated types from @/lib/api.generated are still allowed
  via a different module path (CR-7)
- NodeDivider: only emit the scroll-anchor id on 'started' transitions
  so multiple transitions for the same node don't produce duplicate
  ids in the DOM. The graph 'jump to node' still works (it lands on
  the entry point, which is the right target anyway) (CR-9)
- primitives/workflow: toWorkflow now preserves 'global' as a distinct
  source. Previously `raw.source === 'project' ? 'project' : 'bundled'`
  silently demoted home-scoped (~/.archon/workflows) workflows to the
  bundled badge + sort rank (I2)
- lib/sse: SSE onerror logs at console.warn when readyState is CLOSED,
  so dropped streams aren't completely silent (I5)

* fix(console): SPA nav + nullable project type + truncate multipart errors (CR-6, CR-8, S4)

- TopNav and ConsoleApp: swap <a href> for <Link to> on the cross-UI
  switch buttons. Same React app, same DOM tree, no need to trigger a
  full reload (CR-6)
- RunsPage and RunDetailPage: useEntity<Project | null> instead of
  useEntity<Project> with a Promise.resolve(null as unknown as Project)
  loader. Removes the type cast and keeps downstream readers honest
  about nullability — added explicit `if (detail === null)` guard in
  RunDetailPage where the type narrowed (CR-8)
- skills/startRun: multipart error path now truncates to 200 chars
  matching requestJson, so an HTML 502 body doesn't land in the error
  toast as raw markup (S4 from multi-agent review)

* test(server): 6 tests for GET /api/runs/:runId/artifacts (I6)

Cover the branches that can be tested without mocking fs/promises:
  - 400 for invalid run ids that fail the [A-Za-z0-9_-] regex guard
  - 404 when the workflow run does not exist
  - 200 + empty files when run has no codebase_id (orphan)
  - 200 + empty files when codebase name lacks owner/repo shape
  - 500 when the codebase DB lookup throws
  - 400 when the resolved artifact dir escapes ARCHON_HOME
    (defense-in-depth path-containment guard)

Multipart-dispatch unit testing would require mocking c.req.parseBody —
deferring; the end-to-end multipart round-trip was verified during
development against a real workflow with server-side
`run_workflow.files_uploaded` log + upload dir written under
~/.archon/artifacts/uploads/. The existing JSON-path tests continue
to assert addMessage is called with 3 args (not 4) for the JSON branch.

Tweaks to the test harness:
  - paths mock now exports getArchonHome and getRunArtifactsPath so the
    new handler can resolve a deterministic test path
  - getCodebase is now a top-level mockGetCodebase that supports
    .mockImplementationOnce per-test

* docs: register new artifacts endpoint + clean stale references (I7+C4+S5)

CLAUDE.md
  - Add GET /api/runs/:runId/artifacts to the API Endpoints section
  - Extend the directory tree to mention packages/web/src/experiments/
    (lint-guarded in-repo spike directory, currently hosting /console)
  - Update the registerOpenApiRoute rule to enumerate the two narrow
    exceptions: raw-content wildcard routes (e.g.
    /api/artifacts/:runId/*) and multipart-or-JSON routes (drop
    request.body from the route config; handler parses both)

docs-web/reference/api.md
  - Add the artifacts row to the Runs table + a 'List Run Artifacts'
    section with curl
  - Expand the 'Run a Workflow' example to show the new multipart
    branch alongside the existing JSON one

packages/web/src/experiments/console/README.md
  - Replace the dead /Users/rasmus/.claude/plans/quiet-twirling-bentley.md
    link with a Status section noting that milestone planning has been
    superseded by PR-template-driven feedback

packages/web/src/experiments/console/lib/format.ts
  - Drop the orphan JSDoc that described formatProjectLocator above the
    formatCost function

packages/web/src/experiments/console/theme.css
  - The 'maps --color-* to --base vars' line invented terminology that
    doesn't exist in Tailwind. Replace with the accurate version:
    @theme inline defines color tokens that reference plain CSS vars,
    redefining those vars inside .console-root cascades through every
    utility that reads them

packages/server/src/routes/api.ts
  - persistUploadedFiles docstring no longer claims to be shared by
    both message + workflow routes (only run uses it today;
    sendMessageRoute still inlines the same logic and could migrate
    in a separate pass)

store/cache.ts and routes/RunDetailPage.tsx
  - Drop the (M4) milestone references — the SSE wiring landed weeks
    ago; the comments now describe the actual lib/sse.ts coupling

* feat(console): neovim-style keymap for project / workflow / run selection

Adds a light-modal keymap so picking a project, picking a workflow, and
starting a run can all be driven from the keyboard:

- p anywhere: full-screen project palette (subsequence fuzzy match,
  ↑↓/Enter/Esc, listbox + combobox a11y)
- n in a project: opens the draft card and auto-summons the workflow
  picker; closing the picker hands focus to the context textarea
- ? anywhere: keyboard shortcuts overlay (esc/? to dismiss)
- runs feed: j/k move, gg/G jump, Enter open, Esc clear, / focus search,
  1-5 filter by status (with magenta selection ring + scroll-into-view)
- run detail: 1/2/3 tabs, t/s toggle tool / system rows, a/r approve /
  reject (paused only), Esc/h back to runs

Shared infrastructure in lib/keymap.ts: chord buffer with 500ms window,
input + modal-dialog guards so route bindings don't leak through when a
palette is open. Help catalogue lives in lib/shortcuts.ts and is kept
in sync per-page.
…slash commands (#1757)

* feat(slack): umbrella Slack UX upgrade — buttons, status, reactions, slash commands

Single Slack adapter PR pulling together the in-thread interactivity primitives
the team will need on a shared instance:

- Interactive Block Kit Approve/Reject buttons on approval gates
- Cancel button on a per-run status message edited in place as DAG nodes progress
- Lifecycle reactions on the triggering message (🔄 → ✅ / ❌)
- Native `/archon` and `/archon-workflow` slash commands (Socket Mode, no URL needed)
- `_part i/n_` annotations on long replies split across multiple messages
- Italic cost/token footer after direct-chat replies and on terminal workflow status

Approve/Reject/Cancel buttons call existing platform-agnostic operations
(approveWorkflow / rejectWorkflow / abandonWorkflow); no schema or workflow
engine changes. Authorization re-uses the existing SLACK_ALLOWED_USER_IDS
whitelist for button clicks and slash commands.

Per-user attribution in thread context is intentionally deferred to a separate
PR — it needs a user_id column on conversations/messages/workflow_runs and
orchestrator plumbing.

* fix(adapters): declare @archon/providers as workspace dep

CI's stricter package-resolution caught that @archon/adapters imports
@archon/providers/types (TokenUsage) without declaring the workspace
dependency. Locally bun resolved it transitively via @archon/core; CI's
clean install does not.

* fix(slack): address coderabbit review

- Drop ephemeral denial from slash command auth path so unauthorized
  users are silently rejected, matching the existing app_mention /
  message.im pattern. Posting a denial leaks that a bot is listening.
- Surface failureReason on cancelled runs too, not just failed. The
  type already documents this for both terminal states.
- Stop forwarding raw error messages to Slack when a cancel click
  fails. Backend / DB errors stay in server logs; user sees a generic
  "check the server logs or try again" line.

Adds a test for the cancelled-with-reason rendering.

* fix(slack): address 6-agent PR review

Critical:
- Declare @archon/workflows as an explicit workspace dep on @archon/adapters
  (same class of fix as the providers one). Resolves today via hoisting but
  breaks under stricter installs.
- Split workflow-bridge.test.ts into its own bun test invocation so its
  irreversible mock.module() calls on @archon/core and
  @archon/workflows/event-emitter cannot leak into the slack/telegram
  batch.
- Fix "trailing-edge" debounce comments — the implementation is
  leading-edge. Document the Slack chat.update rate limit as the 500ms
  rationale.

Important:
- Wire slackBridge.detach() into the server graceful shutdown path so the
  event subscription doesn't leak and a pending chat.update can't fire
  against a closed Bolt socket.
- Drop dead `comment` plumbing through handleApprovalDecision /
  applyResolutionEdit / buildApprovalResolutionBlocks — Block Kit buttons
  have no UI to capture it.
- Widen the action-handler try/catch to also cover applyResolutionEdit so
  block-builder or chat.update failures don't bubble as unhandled
  rejections.
- Cancel-click with missing run state now logs and posts an ephemeral
  acknowledgement (using the button message's channel/ts) so the user
  isn't left wondering whether the click registered.
- Use Bolt's BlockButtonAction / ButtonAction types directly on the
  app.action() registrations instead of the ad-hoc ActionBody /
  ActionElement aliases.

Test coverage:
- Slash command silent-rejection of unauthorized users.
- triggeringMessages 1000-entry FIFO eviction at the cap boundary.
- Slash command seed-post failure → ephemeral error + handler not called.
- Single-chunk message path skips the _part i/n_ footer.
- rejectWorkflow → { cancelled: true, maxAttemptsReached: true } branch.

Docs:
- architecture.md IPlatformAdapter listing includes sendResultFooter.
- approval-nodes.md mentions the Slack in-thread Approve button.
- CLAUDE.md test-isolation batch count for @archon/adapters updated to 6
  (was 3 — pre-existing drift, now also accounts for workflow-bridge).

Polish:
- removeReactionSafe gets the same intentional-fallback comment as
  addReactionSafe (no_reaction is a normal terminal-state interleave).
- IPlatformAdapter.sendResultFooter signature uses TokenUsage directly.
- Drop "for v1" tag on the unhandled-event comment.
- Remove what-comments from blocks.ts / blocks.test.ts / adapter.ts.
)

* fix(orchestrator): resume interactive workflows on chat platforms (#1741)

Interactive approval-gate and interactive-loop workflows started from
Slack, Telegram, Discord, or GitHub never resumed after the user
provided their answer — each approval response triggered a brand-new
workflow run from node 0 in a fresh worktree, re-asking the same
questions indefinitely. The cause was a `platform.getPlatformType() ===
'web'` gate that wrapped the entire resume-detection block in
`dispatchOrchestratorWorkflow`, leaving all chat platforms to
unconditionally fall through to a fresh `executeWorkflow`. The chat-side
`resumeRun` mechanism that previously handled this was removed in
#915 (natural-language approval routing) without lifting the resume
lookup out of the web branch.

Changes:
- Restructure dispatchOrchestratorWorkflow so resume detection
  (findResumableRunByParentConversation + hydrateResumableRun) runs for
  every platform; only the background-dispatch branch remains web-only
- Add codebaseId parameter to findResumableRunByParentConversation so
  persistent chat conversation IDs (Telegram chat_id, Slack thread)
  cannot resume a stale run from a different project
- Add tests for chat resume, codebase scoping, and fresh-run fallback

Fixes #1741

* test(orchestrator): strengthen mock coverage and add web non-interactive resume test

- Add hydrateResumableRun to executor mock in orchestrator.test.ts to
  mirror the real module exports and prevent opaque TypeErrors for future
  test contributors
- Add test asserting that a web non-interactive workflow with a resumable
  run resumes foreground rather than dispatching a fresh background run,
  pinning the priority order of the if/else if dispatch block

* simplify: inline single-use mock vars in orchestrator.test.ts
…1703) (#1746)

createCodebase() hardcoded 'claude' as the fallback when ai_assistant_type
was not provided. Now checks process.env.DEFAULT_AI_ASSISTANT first,
consistent with how getOrCreateConversation() resolves the default.

Falls back to 'claude' only when both the parameter and env var are unset.
…or and workflow runs (#1783)

* feat(core): plumb user_id from chat/forge adapters through orchestrator and workflow runs

Adds remote_agent_users + remote_agent_user_identities tables (Archon
identity + per-platform mapping, UNIQUE(platform, platform_user_id))
and threads a resolved user_id through HandleMessageContext into the
orchestrator, workflow executor, and isolation resolver. Every new
conversation, message, workflow_run, and isolation_environment row
created from Slack/Telegram/Discord/GitHub now carries attribution.

Slack additionally enriches first-sight users with their real name via
users.info (requires bot scope users:read — reinstall the app to grant).
Telegram/Discord derive display name from the inbound event payload.
GitHub resolves event.comment.user.login or event.sender.login on each
webhook. Resolution failure warn-logs and continues — never drops a
message.

Schema is additive and nullable everywhere: existing rows remain valid
with NULL, ON DELETE SET NULL on every new FK. Web POST /api/conversations
and the CLI continue to write NULL user_id; those surfaces become
attributed in a follow-up PR. Solo installs with GITHUB_TOKEN are
unchanged.

Race-safe create-on-first-sight: UNIQUE(platform, platform_user_id)
trips on concurrent first-sight webhooks; the losing transaction rolls
back and we re-SELECT the winner's identity. Orphaned identities (user
row deleted out from under them) are auto-repaired.

Foundation for the small-team Archon initiative. Follow-ups will swap
the shared GITHUB_TOKEN for a GitHub App and wire per-user GitHub
tokens via device flow.

* fix(core): address PR review — FK semantics, narrowed error handling, identity type union

Critical fix: SQLite migrateColumns ALTERs now include ON DELETE SET NULL on
all four new user_id / created_by_user_id FK columns. Upgraded SQLite DBs
previously inherited the default NO ACTION ≈ RESTRICT semantic, contradicting
the PR's documented "no destructive cascade on user deletion" guarantee.

Hardening on the new user-identity surface:

  - findOrCreateUserByPlatformIdentity narrows its race-recovery catch to true
    UNIQUE-constraint violations (PG sqlstate 23505 or SQLite "UNIQUE
    constraint failed" message). Any other error logs as user.create_failed
    and propagates — no more masking generic DB failures as recoverable races.

  - backfillDisplayName wraps its UPDATE in try/catch with a dedicated warn
    event. A failed opportunistic backfill must not silently fail the entire
    user resolution path; the caller already has the resolved user row.

  - repairOrphanedIdentity now logs user.identity_orphan_repair_failed on
    transaction failure (previously surfaced only as a generic resolve_failed
    upstream).

  - New IdentityPlatform literal union ('slack' | 'telegram' | 'discord' |
    'github' | 'web' | 'cli') replaces the unconstrained `platform: string`
    on UserIdentity and the findOrCreate signature. Typos now fail at compile
    time rather than silently breaking the UNIQUE(platform, platform_user_id)
    invariant.

  - user.create_started/_completed/_failed are now properly paired per the
    project event-naming convention.

Slack adapter:

  - users.info missing_scope WARN now gated by an instance flag so it fires
    once per adapter lifetime instead of once per unknown user. The
    misconfiguration is permanent — flooding logs after every restart in a
    100-user workspace was the wrong shape.

  - users_info_failed log strips err (which can include err.data with
    workspace metadata) in favor of structured errMessage / slackErrorCode
    / slackUserId fields. No PII through the log pipeline.

Server resolver:

  - resolveUserId exported for testability and now logs as a single static
    event server.user_resolve_failed (platform in structured fields) instead
    of the templated ${platform}.user_resolve_failed which collided with the
    GitHub adapter's own event name.

  - Dead `=== null` branch removed (TypeScript already narrows the type).

GitHub adapter:

  - User-identity resolution moved up to immediately after self-filtering
    + @mention checks. Now runs before the codebase-ensure and comment-
    history Octokit calls so resolution can't be silently skipped by an
    upstream Octokit failure (which was masking a missing-mock bug in the
    existing test suite).

Tests:

  - New packages/server/src/resolve-user-id.test.ts covers the never-throws
    contract that three adapter handlers depend on. 6 cases including the
    static-event-name regression.

  - GitHub adapter test now mocks @archon/core/db/users and covers the
    comment.user.login ?? sender.login attribution fallback in both
    directions, plus a never-throws case for resolution failure.

  - users.test.ts gains the asymmetric-backfill case, backfill-failure-
    does-not-block-resolution case, both PG and SQLite UNIQUE-error shapes
    for race recovery, and a non-UNIQUE-rethrows-without-recovery case
    that explicitly counts the query calls.

  - isolation-environments.test.ts adds a "ON CONFLICT does NOT update
    created_by_user_id" regression guard so a copy-paste in the SET clause
    can't silently transfer ownership across re-activations.

Comment cleanup: stripped the (PR-A) / (until PR-C) / pre-PR-A history
labels from production types, migrations, and source files. They were PR-
state markers that would rot on merge; the substantive WHY content stays.

Docs:

  - CLAUDE.md table count: 8 → 10; users and user_identities documented.
  - docs-web/reference/database.md: 8 → 10 with explicit ON DELETE semantics
    and a note that re-running 000_combined.sql is idempotent and picks up
    the new ALTERs.
  - docs-web/reference/architecture.md: 7-table diagram → 10-table; full
    schema block extended with the new tables and user_id columns.
  - docs-web/adapters/slack.md: users:read scope added to the Bot Token
    Scopes setup with a note about graceful degradation if omitted.

Skipped (with reason):

  - Converting User/UserIdentity to z.infer<typeof schema>: all sibling row
    interfaces in types/index.ts are hand-crafted; doing this for just the
    two new types creates inconsistency. A separate consistency pass should
    convert the whole file, not selectively.

  - Threading userId into the four web/CLI addMessage callsites: those
    surfaces don't have an auth flow yet, so threading now means passing
    `undefined` from every caller. Added explicit TODOs at each callsite
    pointing at the upcoming web/CLI auth work instead.
…tion routing (#1788)

Phase 2 of the team-foundation PRD. Replaces the bot's single shared GITHUB_TOKEN PAT with a registered GitHub App that supports multi-installation token routing from day one.

New @archon/core/github-auth/ module wrapping @octokit/auth-app with a three-level cache:

  - lookupCache:  owner/repo → installationId (1h TTL; evicted on 401)
  - tokenCache:   installationId → access token (1h GitHub TTL, refreshed 5min before expiry)
  - octokitCache: installationId → Octokit (per-installation auth strategy; evicted on 401 so the SDK's hidden internal token state can't keep serving the dead token)

GitHubAdapter takes a `GitHubAuth` discriminated union at construction. All 4 Octokit callsites (postComment / listComments / repos.get / pulls.get) plus the clone path route through resolveOctokit + a withTokenRefresh wrapper that calls invalidateRepo and retries once on 401. Webhook event.installation.id primes the lookup cache to skip a round-trip. Secondary self-filter compares against `<slug>[bot]` in App mode (via a botLogin getter, distinct from botMention) so PR-C's per-user tokens won't trip it.

Server bootstrap detects App vs PAT mode via env and fails fast if both are configured. In App mode it registers the provider on a module singleton consumed by createWorkflowDeps(), so the workflow executor's bash/script subprocesses inherit a fresh GH_TOKEN/GITHUB_TOKEN. New POST /internal/git-credential endpoint (App mode only) backs a POSIX git credential helper installed at clone time, covering workflows that outlive the 1h installation-token expiry. The public-bind guard runs BEFORE Bun.serve so a rejected config never opens the listening socket — opt-out via ARCHON_ALLOW_INTERNAL_ON_PUBLIC_BIND=1 for deployments where the reverse proxy already drops /internal/*.

Refactor + extracted helpers in server/src/github-auth-bootstrap.ts (selectGitHubAuthMode + parseGitCredentialPath) so the security-critical decisions are testable in isolation without spinning up Hono.

Backwards compat: solo installs running GITHUB_TOKEN only see zero functional change. All 54 existing PAT-mode adapter tests pass unchanged.

Tests added: 23 strictly-mocked auth-module tests (PRD Q7 — no live api.github.com in CI); 10 new App-mode adapter tests (multi-install routing, payload short-circuit, 401 retry + retry-on-retry propagation, AppNotInstalledError surfacing, clone-token resolution, post-clone credential helper install); 20 server-bootstrap unit tests (dual-mode fail-fast, /internal path validation incl. traversal + null bytes).

Closes phase 2 of .claude/PRPs/prds/github-app-and-user-identity.prd.md. Depends on #1783 (PR-A user-identity foundation).
…#1792)

createSchema() ran two CREATE INDEX statements referencing user_id on
remote_agent_conversations and remote_agent_workflow_runs. On databases
created before v0.4.0 those columns don't exist yet — they're added by
migrateColumns(), which runs AFTER createSchema(). The index creation
aborted the entire createSchema() exec block, the constructor threw, and
every subsequent operation failed with "no such column: user_id". New
installs were unaffected because the columns exist in the same schema.

Moves both CREATE INDEX statements into migrateColumns() so they run
after the matching ALTER TABLE. idx_user_identities_user_id stays in
createSchema() because user_identities is a new table whose user_id
column always exists.

Adds a regression test that seeds a pre-0.4.0 schema (no user_id columns
on conversations/workflow_runs/messages, no created_by_user_id on
isolation_environments) and asserts SqliteAdapter construction completes,
migrates the columns, and creates both indexes.

Caught by /test-release brew 0.4.0.
@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2132172c-1bd6-4133-a3d8-cec90939deae

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Wirasm Wirasm merged commit d83593a into main May 28, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.