feat(console): console promotion — pending-input banner, project-scoped agent chat, manage_run native tool (Claude + Pi) by Wirasm · Pull Request #1819 · coleam00/Archon

Wirasm · 2026-05-29T13:51:45Z

Summary

Problem: the console experiment is run-centric and read-only — paused runs awaiting human input are buried in the feed, there's no conversational surface under a project, and the agent has no way to operate runs (only the legacy /invoke-workflow text protocol). Live updates also flickered/never settled.
Why it matters: these are the steps that promote the console from experiment toward product. Gates must be unmissable, a project-scoped agent chat is the substrate for agent-driven run management, and the agent needs a real tool surface to list/launch/operate runs across every SDK (Claude now, Pi as the ~June-2026 primary).
What changed: (1) a pinned pending-input banner; (2) a project-scoped agent chat (Runs | Chat tab swap) with a docked workflow tray (1-vs-N collapse) that renders inline approvals, a "working" pill, and run-detail lifecycle lines; (3) a foundational live-cache fix (stale-while-revalidate + useSyncExternalStore); (4) the manage_run native tool — one tool with progressive disclosure (help), project scoping, and the full lifecycle surface (list/get/start/resume/cancel/abandon/approve/reject), provider-neutral and wired for Claude (Zod/MCP) + Pi (TypeBox/customTools); (5) design/spike docs.
What did NOT change (scope boundary): no DB schema/migration; no shipped-product web surface (everything UI lives in packages/web/src/experiments/console/). manage_run writes do state mutation only (CLI-equivalent) — no new auto-resume-after-gate machinery. No project memory, no multi-conversation sidebar, no file-attach composer. Codex/OpenCode get nativeTools: false (served-MCP is a later wave).

`manage_run` native tool

Provider-neutral NativeTool { name, description, inputSchema (JSON Schema), handler } rides on SendQueryOptions.nativeTools; the handler closes over live core context so @archon/providers never imports @archon/core. Per-provider adapters convert the canonical JSON Schema: Claude → Zod via createSdkMcpServer/tool() (mcp__archon__*, alwaysLoad); Pi → TypeBox via customTools (re-supplying Pi's default tools). One tool, an action discriminator, and a help action for progressive disclosure. Destructive actions are gated on confirm: true (tiered-confirmation policy). Every by-id action is project-scoped via getScopedRun.

chat agent ──calls──▶ manage_run(action, …)        [Claude: mcp__archon__*  | Pi: customTools]
                          │
              buildManageRunTool(ctx) handler  (packages/core/src/orchestrator/manage-run-tool.ts)
                          ├─ list/get      → db/workflows (project-scoped)
                          ├─ start         → dispatchBackgroundWorkflow
                          └─ resume/cancel/abandon/approve/reject → operations/workflow-operations

UX Journey

Before

/console/p/:projectId
  ProjectRail │ Heading · search · FilterChips
              │ Active runs / Recent runs        ← paused runs sit inline; easy to miss
              │ (no chat surface; AI chat lives only in legacy /chat)

After

/console/p/:projectId                 /console/p/:projectId/chat
  ProjectRail │ Heading  [ *Runs | Chat* ]  ← tab swap
              │ [ ⚠ NEEDS YOU banner ]      ← paused runs pinned above the fold
              │ Active / Recent runs        │  message stream (declutter + "working" pill)
                                            │  [ WorkflowDock: live runs, inline approve/reject ]
                                            │  composer  →  agent can call manage_run to
                                            │              list / start / operate runs

Architecture Diagram

After (delta)

RunsPage [~] ─ PendingInputBanner [+] + ProjectViewTabs [+]
ChatPage [+] === useEntity(K.messages) + useConversationSSE [+] + ChatStream [+] + ChatComposer [+]
            └─ WorkflowDock [+] === ApprovalContext + ApprovalPanel (reused, inline) + RunLifecycle [+]
store/cache.ts [~] ─ revalidate() stale-while-revalidate + useEntity via useSyncExternalStore
providers/types.ts [~] ─ NativeTool [+], capabilities.nativeTools [+]
claude/native-tools.ts [+]  pi/native-tools.ts [+]  (JSON-Schema → Zod / TypeBox)
orchestrator-agent.ts [~] ─ injects buildManageRunTool when codebase-scoped + provider supports it

Label Snapshot

Risk: risk: medium
Size: size: XL
Scope: web, core, providers, tests
Module: web:console, core:orchestrator, providers:claude, providers:pi

Change Metadata

Change type: feature
Primary scope: multi (web + core + providers)

Linked Issue

Related: console promotion effort (no single tracking issue)

Validation Evidence (required)

bun run type-check   # ✅ all 10 packages
bun run lint         # ✅ clean (--max-warnings 0)
bun run test         # ✅ providers suite green; core orchestrator + manage_run green

New unit tests: manage-run-tool.test.ts (18 — help/scope/confirm-gate/dispatch/error), claude/native-tools.test.ts (4), pi/native-tools.test.ts (4 — fail-fast + boolean acceptance).
Live-verified end-to-end: Claude invokes manage_run for list/get/start (dispatched archon-assist, ran to completion). Pi (minimax MiniMax-M2.7) invokes manage_run, runs help, and the boolean confirm round-trips through TypeBox as a real boolean (typeof === 'boolean', value === true).
Also repaired pre-existing branch test breakage (capability assertions + Pi defineTool mock) — see the fix(providers) commit.

Security Impact (required)

New permissions/capabilities? Yes — chat agents in a project-scoped conversation can now operate that project's workflow runs (list/start/resume/cancel/abandon/approve/reject). Mitigations: tool only attaches when a codebase is scoped and the provider supports native tools; every by-id action is project-scoped (getScopedRun blocks cross-project access); destructive actions require explicit confirm: true; writes use the same core ops as the existing CLI/REST surfaces (no new privilege).
New external network calls? No.
Secrets/tokens handling changed? No.
File system access scope changed? No.

Compatibility / Migration

Backward compatible? Yes.
Config/env changes? No.
Database migration needed? No.

Human Verification (required)

Verified scenarios: live console chat (Claude) calling manage_run list/start; Pi (minimax) tool round-trip incl. boolean confirm; live-update fix (reply renders, composer re-enables); inline dock approvals reuse the shared approval panel.
Edge cases checked: confirm-preview vs confirm-execute; cross-project run access blocked; unknown action/subtool; start with no dispatch context; converter fail-fast on non-object / number / empty-enum schemas.
What was not verified: Codex/OpenCode (intentionally nativeTools: false); agent-initiated approve does not auto-resume the run (state-only by design — the dock's human-approve path handles auto-resume).

Side Effects / Blast Radius (required)

Affected subsystems: @archon/web console experiment, @archon/core orchestrator (tool injection), @archon/providers (NativeTool contract + Claude/Pi adapters).
Potential unintended effects: console isolation is preserved (no prod-UI imports). The provider nativeTools capability + NativeTool contract are additive; existing providers unaffected.
Guardrails: tool gated on capability + codebase scope; per-action project scoping; confirm gate on destructive ops; handler catches all errors and returns text (never throws into the agent loop).

Rollback Plan (required)

Fast rollback: revert the branch (no migration, no persisted state). The manage_run injection is a single guarded block in orchestrator-agent.ts; the console UI is fully contained in experiments/console/.
Observable failure symptoms: agent reports manage_run error: …; console live updates stalling.

Risks and Mitigations

Risk: agent operating runs without a human in the loop → Mitigation: tiered confirm gate on destructive actions; approve/reject are state-only (no auto-resume); project scoping.
Risk: Pi customTools path is newer than Claude's → Mitigation: live-verified against minimax incl. the boolean round-trip; unit-tested converter; graceful fallback (providers without nativeTools simply don't receive the tool).

coderabbitai · 2026-05-29T13:52:00Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a project-scoped console chat (conversation primitives, HTTP skills, per-conversation SSE), new chat UI (composer, stream, tabs, ChatPage route), cache revalidation (stale-while-revalidate via revalidate/useSyncExternalStore), pending-input promotion/banner and run-card flag, plus native-tool support with a manage_run tool and provider MCP wiring.

Changes

Console Agent Chat Feature

Layer / File(s)	Summary
Design documentation and conversation data model `packages/web/src/experiments/console/agent-chat-spike.md`, `packages/web/src/experiments/console/primitives/conversation.ts`, `packages/web/src/experiments/console/store/keys.ts`	Spike doc and architecture notes; adds `ConversationSummary` and `toConversationSummary`; introduces `K.conversations(projectId)` cache key.
Conversation API skills and SSE infrastructure `packages/web/src/experiments/console/skills/conversations.ts`, `packages/web/src/experiments/console/skills/index.ts`, `packages/web/src/experiments/console/lib/sse.ts`	Adds `createConversation`, `listConversations`, `sendMessage` (JSON and multipart handling, structured error extraction) and `useConversationSSE` hook with debounced invalidation and `conversation_lock` handling; re-exports conversations.
Chat UI components (composer, stream, tabs, pending banner) `packages/web/src/experiments/console/components/ChatComposer.tsx`, `packages/web/src/experiments/console/components/ChatStream.tsx`, `packages/web/src/experiments/console/components/ProjectViewTabs.tsx`, `packages/web/src/experiments/console/components/PendingInputBanner.tsx`	`ChatComposer` (auto-grow, Enter-send, Shift+Enter newline, Escape blur), `ChatStream` (message filtering + optional tool traces), `ProjectViewTabs` (Runs/Chat links), and `PendingInputBanner`/`PendingInputCard` for promoted paused runs.
Chat page implementation and console route registration `packages/web/src/experiments/console/routes/ChatPage.tsx`, `packages/web/src/experiments/console/ConsoleApp.tsx`	`ChatPage` loads project/conversations/messages, registers `useConversationSSE`, manages `busy`/settle/recovery polling, auto-scroll, and send logic (creates conversation on first send); `ConsoleApp` mounts `/console/p/:projectId/chat`.
Pending input promotion (banner, active card, runs integration) `packages/web/src/experiments/console/components/PendingInputBanner.tsx`, `packages/web/src/experiments/console/components/ActiveRunCard.tsx`, `packages/web/src/experiments/console/routes/RunsPage.tsx`	Adds pinned pending-input banner and cards, extends `ActiveRunCard` with `inputPromoted?: boolean`, updates `RunsPage` to compute pending/dismissed/visible sets and derive `promotedRunIds` passed into `RunsFeed`.
Cache revalidation and external-store integration `packages/web/src/experiments/console/store/cache.ts`	Adds internal `revalidate(key)` with inflight de-duplication and error retention; `invalidate` now calls `revalidate`; `useEntity` refactored to `useSyncExternalStore` and `refetch()` triggers `revalidate`.
UI augmentations: lifecycle markers, workflow dock, working indicator, run detail wiring `packages/web/src/experiments/console/components/RunLifecycle.tsx`, `packages/web/src/experiments/console/components/WorkflowDock.tsx`, `packages/web/src/experiments/console/components/WorkingIndicator.tsx`, `packages/web/src/experiments/console/routes/RunDetailPage.tsx`	Adds `RunStartedLine` and `RunFinishedLine` markers, `WorkflowDock` tray for running/paused runs, `WorkingIndicator` component, and wraps `RunStream` with lifecycle markers.

Native Tooling and Manage-Run

Layer / File(s)	Summary
manage_run native tool `packages/core/src/orchestrator/manage-run-tool.ts`, `packages/core/src/orchestrator/orchestrator-agent.ts`	Adds `ManageRunContext` and `buildManageRunTool` (read-only `list`/`get` actions) and injects a per-project `manage_run` native tool into agent requests when provider `nativeTools` is supported.
Provider native tool types and Claude MCP builder `packages/providers/src/types.ts`, `packages/providers/src/claude/native-tools.ts`, `packages/providers/src/claude/provider.ts`, `packages/providers/*/capabilities.ts`, `packages/providers/package.json`	Adds `NativeTool` interface and `AgentRequestOptions.nativeTools`, introduces `buildArchonMcpServer` converting tool schemas to Zod shapes and registering MCP server for Claude, wires MCP server into Claude `sendQuery`, updates provider capability flags (`nativeTools` true/false), and adds `zod` dependency.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs:

coleam00/Archon#1747: Prior chat work and console groundwork that this change builds on (chat routing, SSE, and console components).

"🐰 In the console's quiet code-lit glade,
I stitched a chat so runs needn't fade,
Tabs hop in place, messages flow,
Paused runs now politely ask and show,
A cheerful spike — the agent's parade."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The PR title clearly and specifically summarizes the main changes: adding a pending-input banner, project-scoped agent chat, and the manage_run native tool for Claude and Pi.
Description check	✅ Passed	The PR description is comprehensive and well-structured, covering all key sections: problem statement, why it matters, what changed/didn't change, detailed architecture diagrams, UX journeys, validation evidence, security impact, compatibility, human verification, side effects, and rollback plans.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/console-promotion

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

packages/web/src/experiments/console/skills/conversations.ts (1)
37-41: ⚡ Quick win

Replace Parameters<...> indirection with an explicit raw type.

The Parameters<typeof toConversationSummary>[0][] annotation hides the actual shape behind type meta-programming. Exporting RawConversation and referencing it directly reads clearer and keeps the data contract explicit.
♻️ Proposed refactor

In primitives/conversation.ts:
-interface RawConversation {
+export interface RawConversation {
In this file:
-import { toConversationSummary, type ConversationSummary } from '../primitives/conversation';
+import {
+  toConversationSummary,
+  type ConversationSummary,
+  type RawConversation,
+} from '../primitives/conversation';
@@
-  const raw = await requestJson<Parameters<typeof toConversationSummary>[0][]>(
+  const raw = await requestJson<RawConversation[]>(
     `/api/conversations?codebaseId=${encodeURIComponent(projectId)}`
   );
As per coding guidelines: "prefer explicit branches and typed interfaces over hidden dynamic behavior."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/skills/conversations.ts` around lines 37
- 41, The listConversations function uses Parameters<typeof
toConversationSummary>[0][] which hides the incoming shape; instead export and
use an explicit RawConversation type from primitives/conversation.ts and update
listConversations to call
requestJson<RawConversation[]>('/api/conversations?codebaseId=...') and then
map(raw => toConversationSummary(raw)) — ensure primitives/conversation.ts
exports RawConversation and update the import in this file so the function
signature and requestJson generic use RawConversation[] while keeping the return
type Promise<ConversationSummary[]>.
packages/web/src/experiments/console/lib/sse.ts (1)
170-226: ⚡ Quick win

onLockChange dependency in useConversationSSE—current usage is stable, so churn is unlikely
ChatPage passes setLocked directly (useState setter), which is stable; so including onLockChange in the effect deps won’t cause the “reconnect on every render” churn for this caller. The useRef-stabilization approach is only worth it if other callers ever pass a non-memoized inline callback.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/lib/sse.ts` around lines 170 - 226, The
change that stabilizes onLockChange via a useRef is unnecessary because useState
setters (e.g., setLocked) are already stable; revert any refactor and keep the
effect dependency array as [conversationPlatformId, onLockChange] inside
useConversationSSE, ensuring es.onmessage still calls onLockChange?.(ev.locked)
directly and no useRef indirection is introduced.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/web/src/experiments/console/components/ChatComposer.tsx`:
- Around line 59-71: The textarea in ChatComposer.tsx never receives the
disabled prop, so when the component is locked or sending the field remains
editable; update the <textarea> to forward the disabled boolean (the same prop
used for the placeholder) so the DOM element truly becomes readonly and the CSS
class disabled:opacity-50 takes effect; locate the JSX using textareaRef, value,
setValue, grow, onKeyDown and MAX_HEIGHT and add disabled={disabled} to that
element (ensuring existing behavior for placeholder using disabledReason remains
unchanged).

In `@packages/web/src/experiments/console/skills/conversations.ts`:
- Around line 59-78: The multipart fetch call in conversations.ts (the fetch
invocation that builds a FormData and posts it) omits credentials, causing
parity mismatch with requestJson which sets credentials: 'same-origin'; update
the fetch call options for the POST (the fetch(url, { method: 'POST', body: form
})) to include credentials: 'same-origin' so same-origin cookies/CSRF are
preserved; apply the same change in the analogous startRun.ts multipart upload
code to keep behavior consistent and future-proof absolute/CORS URL changes.

---

Nitpick comments:
In `@packages/web/src/experiments/console/lib/sse.ts`:
- Around line 170-226: The change that stabilizes onLockChange via a useRef is
unnecessary because useState setters (e.g., setLocked) are already stable;
revert any refactor and keep the effect dependency array as
[conversationPlatformId, onLockChange] inside useConversationSSE, ensuring
es.onmessage still calls onLockChange?.(ev.locked) directly and no useRef
indirection is introduced.

In `@packages/web/src/experiments/console/skills/conversations.ts`:
- Around line 37-41: The listConversations function uses Parameters<typeof
toConversationSummary>[0][] which hides the incoming shape; instead export and
use an explicit RawConversation type from primitives/conversation.ts and update
listConversations to call
requestJson<RawConversation[]>('/api/conversations?codebaseId=...') and then
map(raw => toConversationSummary(raw)) — ensure primitives/conversation.ts
exports RawConversation and update the import in this file so the function
signature and requestJson generic use RawConversation[] while keeping the return
type Promise<ConversationSummary[]>.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0435aa53-8c38-4e66-9dfd-0e3a3ab58559

📥 Commits

Reviewing files that changed from the base of the PR and between ab1bee7 and b4de6a5.

📒 Files selected for processing (14)

packages/web/src/experiments/console/ConsoleApp.tsx
packages/web/src/experiments/console/agent-chat-spike.md
packages/web/src/experiments/console/components/ActiveRunCard.tsx
packages/web/src/experiments/console/components/ChatComposer.tsx
packages/web/src/experiments/console/components/ChatStream.tsx
packages/web/src/experiments/console/components/PendingInputBanner.tsx
packages/web/src/experiments/console/components/ProjectViewTabs.tsx
packages/web/src/experiments/console/lib/sse.ts
packages/web/src/experiments/console/primitives/conversation.ts
packages/web/src/experiments/console/routes/ChatPage.tsx
packages/web/src/experiments/console/routes/RunsPage.tsx
packages/web/src/experiments/console/skills/conversations.ts
packages/web/src/experiments/console/skills/index.ts
packages/web/src/experiments/console/store/keys.ts

coderabbitai · 2026-05-29T13:59:19Z

+        <textarea
+          ref={textareaRef}
+          value={value}
+          onChange={e => {
+            setValue(e.target.value);
+            grow(e.target);
+          }}
+          onKeyDown={onKeyDown}
+          rows={1}
+          placeholder={disabled ? (disabledReason ?? 'Waiting…') : 'Message the agent…'}
+          className="min-h-[40px] flex-1 resize-none rounded border border-border bg-surface-inset px-3 py-2 text-[13px] text-text-primary placeholder:text-text-tertiary focus:border-border-bright focus:outline-none disabled:opacity-50"
+          style={{ maxHeight: `${MAX_HEIGHT.toString()}px` }}
+        />


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Apply the disabled prop to the textarea.

The disabled prop is never forwarded to the <textarea>, so the disabled:opacity-50 class never activates and the field stays editable while locked || sending. Submission is blocked by submit(), but the user can still type into a "locked" composer with no visual feedback.

🛠️ Proposed fix

<textarea ref={textareaRef} value={value} + disabled={disabled} onChange={e => { setValue(e.target.value); grow(e.target); }}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

<textarea

ref={textareaRef}

value={value}

onChange={e => {

setValue(e.target.value);

grow(e.target);

}}

onKeyDown={onKeyDown}

rows={1}

placeholder={disabled ? (disabledReason ?? 'Waiting…') : 'Message the agent…'}

className="min-h-[40px] flex-1 resize-none rounded border border-border bg-surface-inset px-3 py-2 text-[13px] text-text-primary placeholder:text-text-tertiary focus:border-border-bright focus:outline-none disabled:opacity-50"

style={{ maxHeight: `${MAX_HEIGHT.toString()}px` }}

/>

<textarea

ref={textareaRef}

value={value}

disabled={disabled}

onChange={e => {

setValue(e.target.value);

grow(e.target);

}}

onKeyDown={onKeyDown}

rows={1}

placeholder={disabled ? (disabledReason ?? 'Waiting…') : 'Message the agent…'}

className="min-h-[40px] flex-1 resize-none rounded border border-border bg-surface-inset px-3 py-2 text-[13px] text-text-primary placeholder:text-text-tertiary focus:border-border-bright focus:outline-none disabled:opacity-50"

style={{ maxHeight: `${MAX_HEIGHT.toString()}px` }}

/>

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/experiments/console/components/ChatComposer.tsx` around lines 59 - 71, The textarea in ChatComposer.tsx never receives the disabled prop, so when the component is locked or sending the field remains editable; update the <textarea> to forward the disabled boolean (the same prop used for the placeholder) so the DOM element truly becomes readonly and the CSS class disabled:opacity-50 takes effect; locate the JSX using textareaRef, value, setValue, grow, onKeyDown and MAX_HEIGHT and add disabled={disabled} to that element (ensuring existing behavior for placeholder using disabledReason remains unchanged).

coderabbitai · 2026-05-29T13:59:19Z

+  // Multipart path: don't set Content-Type — the browser adds the boundary.
+  const form = new FormData();
+  form.append('message', message);
+  for (const file of files) {
+    form.append('files', file, file.name);
+  }
+  const res = await fetch(url, { method: 'POST', body: form });
+  if (!res.ok) {
+    const text = await res.text().catch(() => '');
+    let parsed: { error?: string } = {};
+    try {
+      parsed = JSON.parse(text) as { error?: string };
+    } catch {
+      /* not JSON */
+    }
+    const raw = parsed.error ?? (text.length > 0 ? text : `HTTP ${res.status.toString()}`);
+    const msg = raw.length > 200 ? `${raw.slice(0, 200)}...` : raw;
+    const path = new URL(url, window.location.origin).pathname;
+    throw new HttpError(res.status, path, msg);
+  }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Inspect requestJson to see what defaults the multipart fetch must replicate. fd 'http.ts' --type f | xargs rg -n -A40 'export (async )?function requestJson'

Repository: coleam00/Archon

Length of output: 814

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the exact multipart fetch code + nearby context. sed -n '1,200p' packages/web/src/experiments/console/skills/conversations.ts | nl -ba | sed -n '40,120p' # Find any other upload/multipart patterns in the same file. rg -n "new FormData\(|multipart|files'" packages/web/src/experiments/console/skills/conversations.ts -n # Inspect requestJson usage in nearby code for how headers/auth are typically handled. rg -n "requestJson\(" packages/web/src/experiments/console/skills/conversations.ts -n # Locate mergeHeaders usage/imports to see if any auth/CSRF defaults are expected. rg -n "mergeHeaders|`@archon`|CSRF|credentials" packages/web/src/experiments/console/skills/conversations.ts packages/web/src/experiments/console/skills -S || true

Repository: coleam00/Archon

Length of output: 102

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the exact multipart fetch code + nearby context with line numbers. cat -n packages/web/src/experiments/console/skills/conversations.ts | sed -n '40,140p' # Find any other multipart/FormData usage in the same file. rg -n "new FormData\(|files'" packages/web/src/experiments/console/skills/conversations.ts # Locate requestJson usage in nearby code for how it handles defaults. rg -n "requestJson\(" packages/web/src/experiments/console/skills/conversations.ts # Look for credentials/auth/CSRF patterns in this area. rg -n "credentials|CSRF|X-CSRF|Authorization|mergeHeaders" packages/web/src/experiments/console/skills/conversations.ts

Repository: coleam00/Archon

Length of output: 1667

🏁 Script executed:

#!/bin/bash set -euo pipefail # Dump the whole file (it appears small) with line numbers. wc -l packages/web/src/experiments/console/skills/conversations.ts cat -n packages/web/src/experiments/console/skills/conversations.ts # Find where requestJson is defined (from earlier evidence, likely in packages/.../http.ts). # Search for "mergeHeaders" and "export async function requestJson" definitions. rg -n "export async function requestJson" packages -S rg -n "function mergeHeaders|mergeHeaders" packages -S | head -n 50

Repository: coleam00/Archon

Length of output: 3877

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect requestJson + helpers fully cat -n packages/web/src/experiments/console/lib/http.ts # Find "startRun" multipart path reference and inspect it. fd -t f 'startRun' packages/web/src/experiments/console -x echo fd -t f 'startRun\.ts' packages/web/src/experiments/console | head -n 20 rg -n "Multipart|new FormData$|files'" packages/web/src/experiments/console -S rg -n "startRun" packages/web/src/experiments/console/skills -S || true # Specifically locate any other multipart fetch in the console experiments. rg -n "fetch\\(.*body: form\\b|new FormData\\(\$" packages/web/src/experiments/console -S

Repository: coleam00/Archon

Length of output: 4311

🏁 Script executed:

#!/bin/bash set -euo pipefail cat -n packages/web/src/experiments/console/skills/startRun.ts | sed -n '1,120p'

Repository: coleam00/Archon

Length of output: 3252

Multipart fetch parity with requestJson: only credentials differs (and it’s likely redundant for same-origin relative URLs).

requestJson injects only credentials: 'same-origin' (and Content-Type: application/json for non-FormData bodies). The multipart code in conversations.ts (and similarly in startRun.ts) omits credentials, but both use relative /api/... URLs, so the browser’s default should already be same-origin; auth/CSRF headers aren’t being dropped here.

For consistency and future safety (if URLs become absolute/CORS), set credentials: 'same-origin' on the multipart fetch.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/experiments/console/skills/conversations.ts` around lines 59 - 78, The multipart fetch call in conversations.ts (the fetch invocation that builds a FormData and posts it) omits credentials, causing parity mismatch with requestJson which sets credentials: 'same-origin'; update the fetch call options for the POST (the fetch(url, { method: 'POST', body: form })) to include credentials: 'same-origin' so same-origin cookies/CSRF are preserved; apply the same change in the analogous startRun.ts multipart upload code to keep behavior consistent and future-proof absolute/CORS URL changes.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/web/src/experiments/console/routes/ChatPage.tsx`:
- Around line 75-103: The settle timer is currently tied only to the trailing
assistant message and a weak sig (list.length:last.id), which lets stale timers
cancel active turns; fix by gating the timer to the active turn (derive and
store an activeTurnId when a trailing user message appears) and only arm/reset
the timer for that same activeTurnId; also strengthen settleSigRef to include a
content/version signal from the assistant message (e.g., last.content or
last.updatedAt/streamVersion along with last.id) so streamed updates advance the
sig and re-arm the timer correctly; update references: settleTimerRef,
settleSigRef, messages, setBusy, SETTLE_MS to use the activeTurnId guard and the
richer sig.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e11dfe71-9020-4405-922c-b2dced25ee33

📥 Commits

Reviewing files that changed from the base of the PR and between b4de6a5 and d272044.

📒 Files selected for processing (1)

packages/web/src/experiments/console/routes/ChatPage.tsx

coderabbitai · 2026-06-01T08:29:28Z

+  // Derive turn state from the trailing message: a user message means a reply
+  // is pending; once an assistant reply lands and stays stable for SETTLE_MS the
+  // turn is done. This also recovers a reload mid-turn (trailing user message).
+  const settleTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);
+  const settleSigRef = useRef('');
+  useEffect(() => {
+    const list = messages ?? [];
+    const last = list[list.length - 1];
+    if (last === undefined) return;
+    if (last.role === 'user') {
+      settleSigRef.current = '';
+      if (settleTimerRef.current !== null) {
+        clearTimeout(settleTimerRef.current);
+        settleTimerRef.current = null;
+      }
+      setBusy(true);
+      return;
+    }
+    // Trailing message is an assistant/system reply. Arm the settle timer once;
+    // re-arm only on real content change so identical poll refetches (same sig)
+    // don't reset it forever.
+    const sig = `${list.length}:${last.id}`;
+    if (sig === settleSigRef.current) return;
+    settleSigRef.current = sig;
+    if (settleTimerRef.current !== null) clearTimeout(settleTimerRef.current);
+    settleTimerRef.current = setTimeout(() => {
+      setBusy(false);
+    }, SETTLE_MS);
+  }, [messages]);


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tie the settle timer to the active turn, not just the trailing assistant message.

This timeout is armed for any trailing assistant/system message, even when the chat is already idle. If the page is sitting on an old assistant reply and the user sends again before that timeout fires, the stale timer can flip busy back to false mid-request. Also, sig only uses list.length and last.id, so streamed updates to the same assistant message never reset the timer. Please gate the settle timer on the current pending turn and base sig on a real content/version signal from last.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/experiments/console/routes/ChatPage.tsx` around lines 75 - 103, The settle timer is currently tied only to the trailing assistant message and a weak sig (list.length:last.id), which lets stale timers cancel active turns; fix by gating the timer to the active turn (derive and store an activeTurnId when a trailing user message appears) and only arm/reset the timer for that same activeTurnId; also strengthen settleSigRef to include a content/version signal from the assistant message (e.g., last.content or last.updatedAt/streamVersion along with last.id) so streamed updates advance the sig and re-arm the timer correctly; update references: settleTimerRef, settleSigRef, messages, setBusy, SETTLE_MS to use the activeTurnId guard and the richer sig.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/web/src/experiments/console/store/cache.ts`:
- Around line 182-195: The hook's useSyncExternalStore subscription only returns
cache.get(key) so changes to errors or inflight don't change the snapshot and
React won't re-render; update the snapshot to include a version or a tuple so
any mutation causes a new reference. Specifically, modify the getSnapshot used
in useSyncExternalStore (and the value your subscribe invalidates) to return
either a { data, error, loadingVersion } or a small epoch/version number that
you increment whenever cache, errors, or inflight change (referencing
useSyncExternalStore, subscribe, cache.get, errors.get, inflight, and
revalidate); increment that version on mutations and read it in getSnapshot so
error/loading changes produce new object identities and trigger re-renders.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6042461e-d7c7-40fe-9278-583635a4f1c5

📥 Commits

Reviewing files that changed from the base of the PR and between d272044 and 9904a42.

📒 Files selected for processing (1)

packages/web/src/experiments/console/store/cache.ts

coderabbitai · 2026-06-01T09:30:58Z

+  const data = useSyncExternalStore(
+    subscribe,
+    () => cache.get(key) as T | undefined,
+    () => cache.get(key) as T | undefined
+  );

  return {
-    data: cache.get(key) as T | undefined,
+    data,
    error: errors.get(key),
    loading: !cache.has(key) && inflight.has(key),
    refetch: (): void => {
-      cache.delete(key);
-      errors.delete(key);
-      notify(key);
-      ensureLoad(key);
+      revalidate(key);
    },
  };


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Error and loading state changes do not trigger re-renders.

useSyncExternalStore only re-renders when getSnapshot returns a different value (by Object.is). Since the snapshot reads only cache.get(key):

Initial load failure: cache stays undefined, snapshot unchanged → no re-render → stuck at loading: true forever

Revalidate failure: stale cache reference unchanged → no re-render → error never shown

error and loading are computed during render but never re-read if React doesn't re-render.

Consider tracking a version/epoch that increments on any mutation (cache, error, or inflight change), or store a { data, error } tuple per key so either change produces a new reference.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/experiments/console/store/cache.ts` around lines 182 - 195, The hook's useSyncExternalStore subscription only returns cache.get(key) so changes to errors or inflight don't change the snapshot and React won't re-render; update the snapshot to include a version or a tuple so any mutation causes a new reference. Specifically, modify the getSnapshot used in useSyncExternalStore (and the value your subscribe invalidates) to return either a { data, error, loadingVersion } or a small epoch/version number that you increment whenever cache, errors, or inflight change (referencing useSyncExternalStore, subscribe, cache.get, errors.get, inflight, and revalidate); increment that version on mutations and read it in getSnapshot so error/loading changes produce new object identities and trigger re-renders.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/web/src/experiments/console/routes/ChatPage.tsx (1)

172-190: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Hoist currentActivity’s useMemo above the projectId early return to keep hook order stable.

packages/web/src/experiments/console/routes/ChatPage.tsx calls useMemo after if (projectId === undefined) return ...;. If projectId can be undefined on one render and defined on another, React can throw a hooks-order error. Move messageList/useMemo above the early return (or otherwise ensure the component never returns before the hook).

🐛 Proposed fix — hoist above the early return

+  const messageList = messages ?? [];
+
+  // Current activity for the working indicator: the latest tool the agent
+  // invoked in the in-flight turn (walk back to the last user message).
+  const currentActivity = useMemo<string | null>(() => {
+    for (let i = messageList.length - 1; i >= 0; i--) {
+      const m = messageList[i];
+      if (m === undefined) continue;
+      if (m.role === 'user') break;
+      if (m.role === 'assistant' && m.toolCalls.length > 0) {
+        return m.toolCalls[m.toolCalls.length - 1]?.name ?? null;
+      }
+    }
+    return null;
+  }, [messageList]);
+
   if (projectId === undefined) {
     return <EmptyState title="No project selected." />;
   }
-
-  const messageList = messages ?? [];
-
-  // Current activity for the working indicator: the latest tool the agent
-  // invoked in the in-flight turn (walk back to the last user message).
-  const currentActivity = useMemo<string | null>(() => {
-    for (let i = messageList.length - 1; i >= 0; i--) {
-      const m = messageList[i];
-      if (m === undefined) continue;
-      if (m.role === 'user') break;
-      if (m.role === 'assistant' && m.toolCalls.length > 0) {
-        return m.toolCalls[m.toolCalls.length - 1]?.name ?? null;
-      }
-    }
-    return null;
-  }, [messageList]);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/routes/ChatPage.tsx` around lines 172 -
190, The useMemo that computes currentActivity (using messageList and useMemo)
is declared after the early return for projectId and can break hook order; move
the messageList and the useMemo/currentActivity declaration above the if
(projectId === undefined) return ... check in ChatPage.tsx so the hook always
runs in the same order, or ensure the component never returns before invoking
useMemo—update references to projectId as needed so the early-return remains
correct after hoisting.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/web/src/experiments/console/routes/ChatPage.tsx`:
- Around line 172-190: The useMemo that computes currentActivity (using
messageList and useMemo) is declared after the early return for projectId and
can break hook order; move the messageList and the useMemo/currentActivity
declaration above the if (projectId === undefined) return ... check in
ChatPage.tsx so the hook always runs in the same order, or ensure the component
never returns before invoking useMemo—update references to projectId as needed
so the early-return remains correct after hoisting.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f31941b-6fc2-4024-a66c-db83347f5f07

📥 Commits

Reviewing files that changed from the base of the PR and between 9904a42 and b867a34.

📒 Files selected for processing (7)

packages/web/src/experiments/console/components/ChatStream.tsx
packages/web/src/experiments/console/components/RunLifecycle.tsx
packages/web/src/experiments/console/components/WorkflowDock.tsx
packages/web/src/experiments/console/components/WorkingIndicator.tsx
packages/web/src/experiments/console/console-open-questions.md
packages/web/src/experiments/console/routes/ChatPage.tsx
packages/web/src/experiments/console/routes/RunDetailPage.tsx

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

packages/core/src/orchestrator/manage-run-tool.ts (1)

60-60: 💤 Low value

Logging emits terminal states without a paired _started.

manage_run.list_completed, manage_run.get_completed, and manage_run.failed are logged, but there is no corresponding _started event, so a failure can't be correlated to an initiation in the logs.

As per coding guidelines: event naming {domain}.{action}_{state} and "always pair _started with _completed or _failed".

Also applies to: 69-69, 94-94
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/orchestrator/manage-run-tool.ts` at line 60, The code logs
terminal events like "manage_run.list_completed", "manage_run.get_completed" and
"manage_run.failed" without emitting matching "manage_run.*_started" events; add
paired "_started" logs immediately before the corresponding operations so every
completed/failed log has a matching started log. Locate the places that
currently call log.info/log.error for "manage_run.list_completed",
"manage_run.get_completed", and "manage_run.failed" in manage-run-tool.ts and
insert corresponding log.info calls for "manage_run.list_started",
"manage_run.get_started", and "manage_run.started" (or
"manage_run.<action>_started") with the same metadata object (e.g., { action,
codebaseId: ctx.codebaseId }) so states can be correlated.

packages/web/src/experiments/console/components/WorkflowDock.tsx (2)

124-132: ⚡ Quick win

Button is interactive when navigation target is unavailable.

The button remains clickable even when run.projectId is null, resulting in a no-op click. The hover states still apply, creating a misleading affordance. Consider disabling the button or omitting it when projectId is unavailable.

Disable button when projectId is null

 <button
   type="button"
+  disabled={run.projectId === null}
   onClick={() => {
     if (run.projectId !== null) navigate(`/console/p/${run.projectId}/r/${run.id}`);
   }}
-  className="ml-auto shrink-0 font-mono text-[11px] text-text-tertiary transition-colors hover:text-text-primary"
+  className="ml-auto shrink-0 font-mono text-[11px] text-text-tertiary transition-colors hover:text-text-primary disabled:cursor-not-allowed disabled:opacity-50"
 >
   Open logs →
 </button>

Apply the same fix to the DockCard button on line 147.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/components/WorkflowDock.tsx` around
lines 124 - 132, The "Open logs →" button in WorkflowDock is interactive even
when run.projectId is null; update the button (component WorkflowDock) to be
disabled when run.projectId === null (add the disabled attribute and
aria-disabled) and conditionally remove/replace hover/cursor classes so hover
states and pointer don't apply when disabled; keep the onClick logic using
navigate(`/console/p/${run.projectId}/r/${run.id}`) but it will no longer be
reachable when disabled. Apply the identical change to the DockCard "Open logs"
button so both buttons use the same disabled logic and class toggling based on
run.projectId.

147-153: ⚡ Quick win

Same navigation affordance issue in DockCard.

This button has the same pattern as ApprovalDockCard above — interactive when projectId is null. Apply the same disabled-state fix here for consistency.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/components/WorkflowDock.tsx` around
lines 147 - 153, The button in WorkflowDock (same pattern as ApprovalDockCard)
is still clickable when run.projectId is null; update the button element so it
is disabled when run.projectId === null (add the disabled attribute and
aria-disabled), guard or remove the onClick handler when disabled (only call
navigate(`/console/p/${run.projectId}/r/${run.id}`) if run.projectId != null),
and apply the same visual disabled classes used in ApprovalDockCard (e.g.,
cursor-not-allowed/opacity styles) so the component (WorkflowDock / DockCard)
has consistent disabled-state affordance.

packages/core/src/orchestrator/orchestrator-agent.ts (1)

989-994: ⚡ Quick win

Use truthy check for codebase_id to match file patterns.

Line 992 uses conversation.codebase_id !== null, but the rest of this file consistently uses truthy checks for the same field (lines 542, 758, 952). A truthy check would handle both null and undefined consistently and align with the established pattern.

Align with existing pattern

-    if (conversation.codebase_id !== null && getProviderCapabilities(providerKey).nativeTools) {
+    if (conversation.codebase_id && getProviderCapabilities(providerKey).nativeTools) {
       requestOptions.nativeTools = [buildManageRunTool({ codebaseId: conversation.codebase_id })];
     }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/orchestrator/orchestrator-agent.ts` around lines 989 - 994,
The condition uses a strict null check for conversation.codebase_id but the file
consistently uses truthy checks; change the guard to use a truthy check (e.g.,
if (conversation.codebase_id) ) so undefined values are handled the same way as
null and keep behavior aligned with other checks (affecting the block that sets
requestOptions.nativeTools using buildManageRunTool and
getProviderCapabilities(providerKey)).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/providers/package.json`:
- Line 36: Verify end-to-end that the Zod v4 schemas defined in native-tools.ts
produce the expected tool input schemas when passed to tool(): run the runtime
path that constructs each tool (calls tool()) and assert the resulting
inputSchema/shape matches expectations; if you observe mismatches or known Zod‑4
conversion edge cases, either adjust the native-tools.ts definitions to a
compatible form or switch to the SDK’s recommended helper (or pin Zod to the
SDK‑compatible version) so tool() receives a supported schema shape. Ensure you
exercise every schema used by native-tools.ts and document which fix (helper vs.
downgrade) you applied.

---

Nitpick comments:
In `@packages/core/src/orchestrator/manage-run-tool.ts`:
- Line 60: The code logs terminal events like "manage_run.list_completed",
"manage_run.get_completed" and "manage_run.failed" without emitting matching
"manage_run.*_started" events; add paired "_started" logs immediately before the
corresponding operations so every completed/failed log has a matching started
log. Locate the places that currently call log.info/log.error for
"manage_run.list_completed", "manage_run.get_completed", and "manage_run.failed"
in manage-run-tool.ts and insert corresponding log.info calls for
"manage_run.list_started", "manage_run.get_started", and "manage_run.started"
(or "manage_run.<action>_started") with the same metadata object (e.g., {
action, codebaseId: ctx.codebaseId }) so states can be correlated.

In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 989-994: The condition uses a strict null check for
conversation.codebase_id but the file consistently uses truthy checks; change
the guard to use a truthy check (e.g., if (conversation.codebase_id) ) so
undefined values are handled the same way as null and keep behavior aligned with
other checks (affecting the block that sets requestOptions.nativeTools using
buildManageRunTool and getProviderCapabilities(providerKey)).

In `@packages/web/src/experiments/console/components/WorkflowDock.tsx`:
- Around line 124-132: The "Open logs →" button in WorkflowDock is interactive
even when run.projectId is null; update the button (component WorkflowDock) to
be disabled when run.projectId === null (add the disabled attribute and
aria-disabled) and conditionally remove/replace hover/cursor classes so hover
states and pointer don't apply when disabled; keep the onClick logic using
navigate(`/console/p/${run.projectId}/r/${run.id}`) but it will no longer be
reachable when disabled. Apply the identical change to the DockCard "Open logs"
button so both buttons use the same disabled logic and class toggling based on
run.projectId.
- Around line 147-153: The button in WorkflowDock (same pattern as
ApprovalDockCard) is still clickable when run.projectId is null; update the
button element so it is disabled when run.projectId === null (add the disabled
attribute and aria-disabled), guard or remove the onClick handler when disabled
(only call navigate(`/console/p/${run.projectId}/r/${run.id}`) if run.projectId
!= null), and apply the same visual disabled classes used in ApprovalDockCard
(e.g., cursor-not-allowed/opacity styles) so the component (WorkflowDock /
DockCard) has consistent disabled-state affordance.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f7bc8be6-5b82-4a74-9d30-09ee185ccb28

📥 Commits

Reviewing files that changed from the base of the PR and between b867a34 and a86ed2d.

⛔ Files ignored due to path filters (1)

bun.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

packages/core/src/orchestrator/manage-run-tool.ts
packages/core/src/orchestrator/orchestrator-agent.ts
packages/providers/package.json
packages/providers/src/claude/capabilities.ts
packages/providers/src/claude/native-tools.ts
packages/providers/src/claude/provider.ts
packages/providers/src/codex/capabilities.ts
packages/providers/src/community/copilot/capabilities.ts
packages/providers/src/community/opencode/capabilities.ts
packages/providers/src/community/pi/capabilities.ts
packages/providers/src/types.ts
packages/web/src/experiments/console/components/WorkflowDock.tsx

Wirasm · 2026-06-02T06:30:09Z

PR Review Summary — Multi-Agent (#1819)

7 specialized agents reviewed the diff (44 files, +2,667 / −56): code-reviewer, docs-impact, pr-test-analyzer, comment-analyzer, silent-failure-hunter, type-design-analyzer, code-simplifier. Issue IDs use C/I/S prefixes (not bare #N) to avoid spurious issue auto-links.

Merge blocker (mechanical): the branch is currently CONFLICTING with base dev. Rebase/resolve before merge regardless of the findings below.

Critical Issues (1)

ID	Agent	Issue	Location
C1	code-reviewer	`runId` schema description promises "Accepts the short (8-char) or full id" and `list` output prints only 8-char prefixes — but `getScopedRun` → `getWorkflowRun` does an exact UUID match (`WHERE id = $1`). The agent's natural `list → get/resume/cancel` flow with the prefix it just saw always returns "no run found". This breaks the tool's primary loop on the first interaction. Fix: either (a) print full IDs in `list` + correct the description, or (b) add prefix matching (`WHERE id LIKE $1

Important Issues (13)

ID	Agent	Issue	Location
I1	silent-failure	`startWorkflow` calls `dispatchBackgroundWorkflow` with no try/catch. Dispatch throws on codebase-not-found / isolation failure; the error reaches the agent via the generic `manage_run error:` outer catch, and the structured log omits `workflowName` — undiagnosable in prod. Wrap in its own try/catch, `log.error({ workflowName, codebaseId }, 'manage_run.start_dispatch_failed')`, return a distinct message.	`orchestrator-agent.ts:1008-1022`
I2	silent-failure	`buildArchonMcpServer` (Claude) / `buildPiNativeToolDefinitions` (Pi) convert schemas synchronously and can throw, but sit outside the retry try-block. A schema bug surfaces as a generic `query_error` with no tool name, and (Claude) is fed into retry classification that can't succeed. Wrap with a dedicated try/catch that logs `{ toolNames }` and rethrows without retry.	`claude/provider.ts:924-931`, `pi/provider.ts:473-476`
I3	code-reviewer	`import { z } from 'zod'` violates the CLAUDE.md convention ("import `z` from `@hono/zod-openapi`, not `zod` directly") and adds a direct `zod` dep to `@archon/providers`.	`claude/native-tools.ts:1` + `packages/providers/package.json`
I4	code-reviewer, comment, simplifier, type-design	Pi adapter hardcodes `label: 'Manage runs'` for every `NativeTool` it converts — wrong the moment a second tool is registered. Derive from `spec.name` (or add `label?: string` to `NativeTool`). (flagged independently by 4 agents)	`pi/native-tools.ts:57`
I5	silent-failure	`WorkflowDock` destructures only `data` from `useEntity` and discards `error` → on fetch failure the dock renders `null`, identical to "no active runs". No log, no indicator. `RunsPage` already handles this; the dock is the outlier.	`components/WorkflowDock.tsx`
I6	silent-failure	`ChatPage` ignores `error` for all three `useEntity` calls (`project`, `conversations`, `messages`). A failed `listConversations` looks like a fresh, empty project. Surface an error state like `RunsPage` does.	`routes/ChatPage.tsx`
I7	comment	Comment "setting `customTools` forces `noTools:'builtin'`" is factually wrong — per Pi SDK, `customTools` is additive; Archon pairs it with `noTools:'builtin'` explicitly. Misleading for future maintainers (appears twice).	`pi/options-translator.ts` (`buildDefaultPiTools` JSDoc), `pi/provider.ts` (inline)
I8	comment	`resume` help text says "mark it resumable" but `resumeWorkflow()` performs no state mutation (validation only). Model-visible text shapes agent behavior — say "validate eligibility; does not change state or trigger execution." Same wording leaks into the test description.	`manage-run-tool.ts` (`HELP_OVERVIEW`, `HELP_BY_ACTION.resume`) + `manage-run-tool.test.ts`
I9	docs, comment	`agent-chat-spike.md` header still says "exploration / mapping only — not building yet … no chat code is written yet" while this same PR ships ChatPage, WorkflowDock, the `manage_run` tool, etc. Update status to "shipped".	`experiments/console/agent-chat-spike.md`
I10	docs	`adding-a-community-provider.md` `YOUR_CAPABILITIES` template is missing `nativeTools: false` (new) and `agents: false` — both required by `ProviderCapabilities`; a provider scaffolded from it won't compile.	`docs-web/.../contributing/adding-a-community-provider.md:~64`
I11	docs	`agent-chat-spike.md` claims `manage_run` has "MCP (Claude/Codex/OpenCode) + Pi" backends, but Codex/OpenCode are `nativeTools: false` (deferred to Wave 2). Correct the registration-backends bullet.	`experiments/console/agent-chat-spike.md` §6
I12	pr-test	No provider-level wiring test that `nativeTools` in `SendQueryOptions` actually reaches the SDK call site (Claude `mcpServers.archon` / Pi `customTools`). The capability flag and the converter are each tested, but the seam between them is assumed. Highest-value missing test (1 per provider).	`claude/provider.test.ts`, `pi/provider.test.ts`
I13	pr-test, silent-failure	Write-path cross-project blocking is tested only for `get`, not any destructive action (Gap 1); and the error-handling test asserts the return string but not that `log.error` fired (H1) — a refactor dropping the log would stay green.	`manage-run-tool.test.ts`

Suggestions (S)

Types (type-design-analyzer — verdict ADEQUATE, 6/10):

S1 inputSchema: Record<string, unknown> erases the "must be an object schema" invariant — malformed schemas compile and only fail at registration time. Introduce a narrow NativeToolInputSchema ({ type:'object', properties, required? }) and apply with satisfies. (types.ts:1207)
S2 ACTIONS const array and the INPUT_SCHEMA.action.enum are two sources of truth that must be hand-synced. Derive the enum from ACTIONS. (manage-run-tool.ts)
S3 getScopedRun returns WorkflowRun | string (error-as-string sentinel). A { ok: true; run } | { ok: false; message } discriminant is safer. (low priority)

Tests (pr-test-analyzer — verdict ADEQUATE):

S4 approve interactive_loop branch and reject rework (cancelled:false / maxAttemptsReached) branches untested — they own distinct user-visible messages (Gaps 2,3, rating 7).
S5 cache.ts revalidate/invalidate stale-while-revalidate invariant has zero tests, despite being the PR's other core correctness claim. Pure-TS (non-hook) exports are unit-testable without React (Gap 7).
S6 Orchestrator injection guard negative case (codebase_id === null → not injected) untested (Gap 5).

Error handling / UX (silent-failure-hunter):

S7 SSE onerror in all three hooks uses console.warn (invisible in prod) and never surfaces a permanently-closed stream to the user — stale-while-revalidate then serves the last value forever with no indicator. (lib/sse.ts)
S8 getScopedRun "no run found … in this project" is indistinguishable from cross-project denial; consider "not found (or not accessible in this project)". (M1)

Simplifications (code-simplifier — 6 found, all functionality-preserving):

S9 Extract the 7× repeated typeof input.x === 'string' ? … : '' into a private str(input, key) helper. (manage-run-tool.ts)
S10 ensureLoad + revalidate share an identical .then/.catch/.finally chain → extract runLoader(key, loader). (cache.ts:32-101)
S11 Duplicated isString guard across both converters → package-internal shared helper (stays within @archon/providers, no boundary break).
S12 WorkingIndicator activity !== null && !== undefined && !== '' → truthiness check. Note: simplifier suggested extracting the common converter skeleton is not advised — different packages/libraries; current parallel-but-explicit code is the right KISS tradeoff.

Docs (docs-impact — low urgency):

S13 CLAUDE.md orchestrator + @archon/providers sections don't mention manage_run / NativeTool / SendQueryOptions.nativeTools. Add a line each.
S14 ai-assistants.md capability matrices for Pi and OpenCode lack a nativeTools row.

Strengths

Project scoping is correct and complete — getScopedRun blocks cross-project read and write access, with a log.warn breadcrumb; not silently swallowed. The cross-project test is behavioral, not implementation-coupled.
Confirm-gate design is sound — destructive actions gated on confirm: true; resume correctly excluded per the "recoverable operations" carve-out.
Error containment is right — the handler's outer catch logs { err, action, codebaseId } via Pino then returns text; never throws into the agent loop.
Converters fail-fast (throw on non-object / empty-enum / unsupported types) rather than silently degrading — matches the project's Fail-Fast principle; the tests assert throws, not no-ops.
Pi customTools correctly re-supplies the default tool set so the agent doesn't lose bash/read/edit/write.
Console isolation preserved — no new imports of production web modules; stale-while-revalidate fix correctly eliminates the flicker.
Excellent design JSDoc on buildManageRunTool, the NativeTool contract, and getScopedRun — captures the "providers never import core" intent.
Test isolation handled — manage-run-tool.test.ts runs in its own bun test invocation per the documented mock-pollution rule.

Documentation Issues (consolidated)

agent-chat-spike.md — stale "not building yet" status + wrong Codex/OpenCode backend claim (I9, I11)
adding-a-community-provider.md — template missing nativeTools/agents, won't compile (I10)
CLAUDE.md + ai-assistants.md — no mention of the new native-tool surface (S13, S14)

Verdict: NEEDS FIXES

Resolve the merge conflict, fix C1 (the short-ID bug breaks the tool's core loop), and address the error-handling/observability items I1, I2, I5, I6 plus the doc contradictions I9, I10, I11 and the cheap consensus fix I4. The type/test suggestions (S1–S6) are worthwhile follow-ups but needn't block. No security regressions found — scoping, confirm-gate, and capability-gated injection are correctly implemented.

Recommended order

Rebase onto dev (resolve conflict).
C1 — full-ID or prefix-match (blocks the agent loop).
I1, I2 — error context/logging at the dispatch & schema-conversion seams.
I4, I7, I8, I9, I10, I11 — one-line correctness + doc fixes.
I5, I6 — surface useEntity errors in the console UI.
I12, I13 — close the test seams.
Consider S1/S2 (type tightening) and S5 (cache.ts tests) as fast-follows.

_{Advisory review — no files were modified.}

Runs paused on a human gate (approval node or an agent question) now surface in an attention banner pinned above the runs feed, on both the All-projects and project-scoped views. It stays visible regardless of the active filter or scroll position until the user acts on it or dismisses it. The banner reuses the existing ApprovalContext + ApprovalPanel surfaces, so the action flow is identical to the inline paused card. While a run is shown in the banner its feed card collapses to a pointer to avoid a duplicate live panel; dismissing the banner restores the inline panel so the run can still be actioned from the feed. Pending runs are derived from the unfiltered run set so they surface even when the feed is filtered or searched. Dismiss is session-only — the run is still paused, so it re-surfaces on reload — and dismissed ids are pruned once a run leaves the paused state so it re-surfaces if it pauses again.

Design/exploration doc for promoting the console toward a project-scoped agent chat (tab swap with the Run view). Captures the reuse map vs. the old chat UI, the project-scoped chat assembly (mostly existing console pieces + 3 new conversation skills), and the run-management tooling direction. Key finding: no single tool mechanism spans all providers. MCP covers Claude/Codex/OpenCode; in-process custom tools cover Claude + Pi only. Since Pi becomes the main runner ~June 2026 and has no MCP, the tool handler is built once behind a provider-neutral spec with two registration backends (MCP + Pi customTools). Rollout leads with the in-process providers (Claude + Pi), then served MCP for Codex/OpenCode. Mapping only — no chat code yet. Project memory deferred.

Add a Runs | Chat tab swap under each project. The chat is plain conversational AI assembled from existing console pieces: messages load via useEntity(K.messages), a new useConversationSSE keeps them live by invalidating on text/tool events (and surfaces conversation_lock to disable the composer), and rendering reuses MessageItem/ToolCallItem/StreamCard. A new skills/conversations.ts (create/list/send) is the sanctioned home for the conversation concept the console previously hid in startRun.ts. The console's invalidate-refetch SSE model removes the need for the old chat's client-side streaming state machine. No backend changes; scoped entirely to packages/web/src/experiments/console. Run management by the agent (manage_run tool) is the planned follow-on. Also records the resolved tooling decisions (tiered confirm, JSON-Schema canonical schema, deferred Wave-2 transport) in agent-chat-spike.md §10.

Drive the "awaiting reply" state from message content + the send action instead of the per-conversation SSE lock event. The recovery poll is no longer gated on a `conversation_lock` event that the cross-origin dev SSE can drop or never deliver — so the composer can't stick on "Waiting…" and replies are polled for regardless of SSE health. Adds a settle window (re-enable once the trailing message is a stable assistant reply) and a hard cap so a turn that never produces a reply can't disable the composer forever. SSE remains a best-effort accelerator for snappy live updates. Note: live updates still depend on the shared useEntity refetch path, which has a separate reliability issue under investigation (cache refetch can stall after the first invalidation while mounted). Persisted state + on-mount rendering are correct; navigating/reloading always shows the latest.

…ernalStore) Two compounding bugs made the console's live updates fail — a refetched value landed in the cache but never appeared on screen until a remount (navigation / reload), so the chat reply never streamed in and the composer stuck on "Waiting…". 1. invalidate() deleted the cache key before refetching, so every SSE/poll tick flashed the subscriber to undefined/empty before the data returned. At poll cadence the list flickered between full and empty and the derived busy/settle/scroll logic thrashed on the empty frames. Replace the delete-then-load with revalidate(): refetch in place and swap on resolve, keeping the previous value visible (stale-while-revalidate). Unsubscribed keys still drop so a later mount fetches fresh. 2. useEntity subscribed with a manual useState(n => n + 1), which can commit a stale render against an externally-mutated store — the render function ran with the fresh value but the DOM kept an older tree. Switch to useSyncExternalStore for tear-free, consistent reads. Verified end-to-end in the browser: agent replies now render live and the composer re-enables without navigating. Fixes the foundation the chat working indicator and workflow dock depend on.

Three UX passes now that live updates work: 1. Chat declutter — ChatStream hides raw tool-call cards and framework/empty rows by default; a single "Agent is working · <tool>" pill stands in while a turn is active (WorkingIndicator), toggling the inline trace on click. 2. Workflow dock — a pinned tray below the chat stream listing the project's in-progress runs (WorkflowDock): hidden at 0, the richer card at 1, a collapsible "Running workflows (N) ▸" with status dots at 2+. Each card links to the run-detail logs. Live via the dashboard SSE. 3. Run-detail lifecycle lines — the log view now bookends the stream with "▶ Workflow <name> started" and a terminal summary "✓ Completed in <duration> · <cost>" (RunLifecycle), so the logs read as a complete story rather than a tool dump. Node transitions already render. Verified live: chat shows clean prose with the working pill; run log shows the started + finished lines.

… analysis

Paused-on-approval runs in the dock now render the agent's question + Approve/Reject inline (reusing ApprovalContext + ApprovalPanel, same approve/reject + comment-injection as elsewhere), so you can resolve a gate without leaving the chat. Approval cards are always shown (a paused run needs you); plain running runs keep the 1-vs-N collapse. Resolving invalidates the runs cache so the card clears live.

…nly) Give the project-scoped chat agent a native tool to see this project's workflow runs, the first step of agent-as-operator. Cross-provider plumbing: - New `NativeTool` on the provider contract ({name, description, JSON-Schema inputSchema, in-process handler}) + a `nativeTools` capability flag. The handler closes over live context, so @archon/providers never imports core — the tool crosses the boundary as data on SendQueryOptions. - Claude adapter (native-tools.ts): a JSON-Schema→Zod-shape converter (narrow, fail-fast) feeding createSdkMcpServer()/tool(); registered in sendQuery as an in-process `archon` MCP server (mcp__archon__*, alwaysLoad for Haiku), mirroring the file-based mcp branch. Adds zod to @archon/providers. - Capabilities: nativeTools = true for claude + pi, false for codex/opencode/ copilot (no in-process tool path). The tool (core/orchestrator/manage-run-tool.ts): - manage_run with read actions list/get — list shows recent runs (id, workflow, status, step); get shows one run's detail. Errors caught, returned as text. - Injected on the direct-chat path only when a codebase is scoped and the provider supports native tools. Verified live: a project chat asks "list the runs" → agent calls mcp__archon__manage_run → handler lists 20 runs → agent answers with the count and the most-recent run's status. Start + the Pi adapter + destructive writes are follow-ups.

Adds the `start` action to manage_run: the agent resolves a workflow by name (4-tier resolveWorkflowName) and dispatches it via dispatchBackgroundWorkflow (its own isolated worktree, fire-and-forget). The dispatch context (platform, conversation, cwd, codebase, available workflows, userId) is assembled in the orchestrator and passed as a `startWorkflow` callback so the tool stays decoupled from the dispatch machinery. Unknown/ambiguous names return a friendly text result; start is rejected when the dispatch context is absent. Verified live: a project chat asks the agent to start archon-assist → agent calls manage_run({action:'start', workflow, message}) → workflow dispatches, runs in a worktree, and completes; the agent reports it started.

Pi adapter for the provider-neutral NativeTool, completing Wave 1 across both in-process providers (Claude + Pi, the current and ~June-2026 primary). - native-tools.ts: JSON-Schema→TypeBox converter (same narrow string/enum subset, fail-fast) feeding `defineTool`; each tool's text handler result becomes the ToolDefinition's content. - provider.ts: merge the native ToolDefinitions into `customTools`. Because setting customTools forces noTools:'builtin' (which drops Pi's defaults), the base tool set is re-supplied via the new buildDefaultPiTools() helper, so the agent keeps bash/read/edit/write alongside manage_run. - Lazy-loaded via the existing dynamic-import block (avoids Pi's config.js eager-load). Type-checks across the repo; the tool + core handler are identical to the verified Claude path. Live Pi verification needs Pi configured as the assistant (not set up in this dev env).

…e, lifecycle writes, Pi Expand the manage_run native tool from read-only (list/get/start) to the full run-management surface, with progressive disclosure and project scoping. - help: action returns the action catalog; help + subtool returns per-action detail, so the model's tool surface stays small (progressive disclosure). - Lifecycle writes resume/cancel/abandon/approve/reject go through the shared core workflow-operations functions — identical semantics to the CLI and command-handler (state mutation; no server coupling). - Tiered confirmation: destructive actions (cancel/abandon/approve/reject) require confirm:true; without it the tool returns a preview and asks the agent to confirm with the user (no mid-turn UI primitive required). - getScopedRun: every by-id action verifies the run belongs to the chat's project, so an agent scoped to project A cannot read or mutate project B's runs (generalizes the earlier get-only scope concern). - Schema converters (Claude Zod + Pi TypeBox) gain boolean support for the confirm param. Verified live on Claude and Pi (minimax MiniMax-M2.7): both invoke the tool, help works, and the boolean confirm round-trips through TypeBox as a real boolean. Unit tests: handler dispatch/scope/confirm-gate (18) plus converter fail-fast and boolean-acceptance (Claude 4, Pi 4).

…ility These failed at the branch HEAD; the prior commits were only validated with type-check and lint, not the test suite. - claude/provider.test.ts, codex/provider.test.ts, registry.test.ts asserted the full ProviderCapabilities set via toEqual but were never updated when nativeTools was added to the interface — add nativeTools to the expectations (claude true, codex false, mock false). - pi/provider.test.ts: the @earendil-works/pi-coding-agent mock was missing defineTool, a value import pulled in by ./native-tools (added when manage_run was wired into Pi). Bun enforces the binding for value imports, so every Pi sendQuery threw "Export named 'defineTool' not found" — 56 tests failing. Add a defineTool stub to the mock.

Wirasm · 2026-06-03T11:00:57Z

PR Review Summary — multi-agent pass

Six specialized agents reviewed the diff (base origin/dev): code quality, docs impact, tests, comments, error handling, type design. Findings below use C#/I#/S# prefixes (not #N, to avoid GitHub auto-linking unrelated issues). The two Critical items were hand-verified against the source.

Branch state: MERGEABLE, 1 commit behind dev (the Better Auth feature 69bf4d00). That commit touches auth/, db/users, and unrelated schemas — zero file overlap with this PR — so no rebase is required for a clean merge.

Critical Issues (2) — block merge

ID	Issue	Location
C1	By-id actions silently fail on short IDs. `getScopedRun` → `getWorkflowRun` does `WHERE id = $1` (exact UUID match), but `list`/`get` output only ever prints `r.id.slice(0, 8)` and the input schema + help advertise "Accepts the short (8-char) or full id." The full UUID is never surfaced to the model, so every by-id action (`get`/`resume`/`cancel`/`abandon`/`approve`/`reject`) driven from `list` output returns "no run found." Verified.	`packages/core/src/orchestrator/manage-run-tool.ts` (`getScopedRun`, `INPUT_SCHEMA`, `handleList`) + `db/workflows.ts:165`
C2	`import { z } from 'zod'` violates the repo-wide Zod convention ("Import `z` from `@hono/zod-openapi`, not from `zod` directly") and adds a second direct `"zod": "^4.0.0"` dep to `packages/providers/package.json`, risking zod-instance drift. Verified.	`packages/providers/src/claude/native-tools.ts:1`, `packages/providers/package.json:36`

C1 fix: add prefix matching to getScopedRun (WHERE id LIKE $1 || '%' with a length guard + ambiguity check), or surface full IDs and drop the "short (8-char)" claim from the schema/help.
C2 fix: import z/ZodTypeAny from @hono/zod-openapi and drop the direct dep, or record an explicit @archon/providers exception in CLAUDE.md (the zod here only feeds the Claude SDK tool() builder, never OpenAPI generation — but the convention is currently absolute).

Important Issues (5) — address before merge

ID	Issue	Location
I1	`resume` action never resumes. `resumeWorkflow` only validates eligibility + returns the run (no mutation, no re-exec), yet help says "mark a failed/paused run resumable" and the response says "ready to resume." The CLI `workflow resume` does re-run — so the agent action diverges from the CLI semantics it claims to mirror. Reword to "validate eligibility," and confirm the no-op-resume is the intended scope.	`manage-run-tool.ts` `handleWrite` case `resume`, `HELP_*`
I2	`cancel` help overpromises. `cancel`/`abandon` both call `abandonWorkflow` (DB status → `cancelled`); a live worktree process is not stopped. Help says "stop a running run." Reword to "marks the run cancelled — a running process may continue until its current step completes."	`manage-run-tool.ts` `HELP_*`, `handleWrite`
I3	`ChatPage` ignores `error` from all three `useEntity` calls (`conversations`/`messages`/`project`). Revalidation failures (network blip, server restart) leave stale/empty data with no user feedback, and can strand the composer's `busy`/settle state. Surface the error.	`packages/web/src/experiments/console/routes/ChatPage.tsx`
I4	`startWorkflow` dispatch failure logs without context. When `dispatchBackgroundWorkflow` throws, the outer `manage_run` catch returns it as text (good) but logs only `{action, codebaseId}` — no `workflow` name. Add a local catch with `workflowName` in the log context.	`packages/core/src/orchestrator/orchestrator-agent.ts` `startWorkflow` closure
I5	Confirm-gate test coverage is partial. Only `cancel`-without-confirm is tested; `abandon`/`approve`/`reject` previews are untested, as are the `approve` `interactive_loop` branch, the `reject` `cancelled:false` rework branch, and the orchestrator injection guard (`codebase_id != null && capabilities.nativeTools`). A future edit dropping an action from `DESTRUCTIVE_ACTIONS` would pass CI. Add the three preview tests + two branch tests.	`manage-run-tool.test.ts`, `orchestrator-agent.test.ts`

Suggestions (6)

ID	Suggestion	Location
S1	Confirm preview says "This is not undoable" for all destructive actions, including `approve`/`reject` — misleading, since approve is the routine continuation path. Split into cancel/abandon ("irreversible") vs approve/reject ("a human gate; confirm with the user").	`manage-run-tool.ts` `handleWrite` preview
S2	Pass the verified `run.id` (full UUID from `getScopedRun`) into `resumeWorkflow`/`abandonWorkflow`/`approveWorkflow`/`rejectWorkflow` instead of the raw `runId` — removes a redundant re-fetch and the small TOCTOU window.	`manage-run-tool.ts` `handleWrite`
S3	Pi `defineTool` hardcodes `label: 'Manage runs'` for every native tool — a latent bug when a 2nd tool is added. Derive from `spec.name` or add an optional `label` to `NativeTool`.	`packages/providers/src/community/pi/native-tools.ts`
S4	Type design: `NativeTool.inputSchema: Record<string, unknown>` is structurally untyped (converters throw at `sendQuery` time, not construction); `(input.action as Action)` is an unsound cast where a `const`-array `isAction()` guard would suffice. Introduce a minimal `NativeToolInputSchema` (`type:'object'`, restricted leaf types).	`packages/providers/src/types.ts`, `manage-run-tool.ts`
S5	Add `runId` to the `manage_run.failed` catch-log context (currently `{action, codebaseId}` only — by-id failures have no run trace).	`manage-run-tool.ts` handler catch
S6	Stale comments: `cache.ts` says "~120 LOC" (now 196 and growing — drop the number); `useDashboardSSE` says "Mount once at a high level (RunsPage)" but `WorkflowDock` (via `ChatPage`) now also mounts it; `NativeTool` JSDoc says "handler must catch its own errors" when the `buildManageRunTool` wrapper is the actual safety net.	`console/store/cache.ts`, `console/lib/sse.ts`, `providers/src/types.ts`

Documentation

console-open-questions.md is stale against its own PR: §2 says manage_run is "still unbuilt / highest-value next build" (this PR builds Wave 1), and §5's "approvals from chat" question is answered by the inline-dock-approvals commit. Update both.
experiments/console/README.md routes list omits the new /console/p/:projectId/chat.
CLAUDE.md / ai-assistants.md: GET /api/providers capabilities now includes nativeTools; the Pi capability table has no row for it. Worth a one-line note that nativeTools: true (Claude + Pi) gates the manage_run tool. (Also pre-existing: the @archon/providers description omits community/opencode/ and community/copilot/.)

Strengths

Cross-project isolation is correct and tested — getScopedRun blocks codebase_id mismatch, logs at warn with both ids, and has a real negative-case test.
Confirm-gate is structurally sound — fires for all four DESTRUCTIVE_ACTIONS before any mutation; start correctly omits it.
Schema converters fail fast — both Claude (Zod) and Pi (TypeBox) throw on non-object / unsupported-type / empty-enum schemas, with symmetric tests.
Pi default-tool re-supply is correct — avoids silently stripping Pi's built-in tools when customTools is set.
"Catch-all → return text" is a documented, intentional, tested contract (the dedicated "thrown DB error returned as text" test), not an accidental swallow.
useSyncExternalStore + stale-while-revalidate is a proper fix for the flicker/stale-render bug, with a comment explaining the prior failure mode.
The fix(providers) commit's Pi defineTool mock repair fixed ~56 pre-existing failing tests.

Verdict: NEEDS FIXES

C1 (broken by-id action surface) and C2 (Zod convention) before merge; I1–I5 are quick and high-value (mostly wording + a few tests + two error-surfacing fixes). Architecture (provider-neutral NativeTool, dependency-inversion via closure, capability gating, project scoping) is sound — the blockers are an ID-matching gap and a convention violation, both small.

Note: the silent-failure-hunter's flagged empty catch {} in orchestrator-agent.ts handleUpdateProject is pre-existing code outside this diff — worth a follow-up but not a blocker here.

🤖 Multi-agent review (code · docs · tests · comments · errors · types)

…anage_run + chat Critical: - manage_run by-id actions now resolve the short ids the tool actually shows. getScopedRun looks runs up by id prefix scoped to the codebase (new findWorkflowRunsByIdPrefix), so the 8-char ids in list/get output work for get/resume/cancel/abandon/approve/reject; ambiguous prefixes are reported and project scoping is enforced in the query. Writes pass the verified full id to the operations. - Document the @archon/providers direct-zod import exception in CLAUDE.md and at the import site (Claude SDK tool() needs a Zod shape; this SDK-deps-only leaf package must not pull in Hono via @hono/zod-openapi). Important: - resume help/response no longer claim it "marks resumable" — it only validates eligibility and does not restart the run. - cancel help no longer claims it "stops" a running process — it marks the run cancelled; an executing step may finish first. - ChatPage surfaces conversation/message (re)load errors instead of silently showing stale or empty data. - startWorkflow logs dispatch failures with the workflow name and returns a clear error to the agent. - Completed confirm-gate test coverage: preview-without-confirm for abandon/approve/reject, approve interactive-loop branch, reject rework branch, and an ambiguous-prefix case. Suggestions: - Differentiated confirm-preview wording: cancel/abandon (irreversible) vs approve/reject (human gate). - Dropped the unsound `as Action` cast; added runId to the failure log context. - Pi native-tool label derived from the tool name, not hardcoded "Manage runs". - Corrected NativeTool JSDoc, cache.ts LOC note, and useDashboardSSE mount note. Docs: - console-open-questions.md §2/§5 + priority list (manage_run Wave 1 and inline approvals shipped in this PR); console README /chat route; ai-assistants Pi capability row for in-process native tools. Deferred (proportionality): a typed NativeToolInputSchema (runtime validation already covers the single-tool surface) and an orchestrator-level injection integration test (two-clause guard whose halves are already covered).

Wirasm · 2026-06-03T11:37:42Z

Review findings addressed — pushed `a2b21898`

Worked through the multi-agent review; everything actionable is fixed except two deliberate deferrals (below). Validation: type-check ✅ (10 pkgs), lint ✅, providers suite ✅, core suite ✅, manage-run-tool.test.ts 24/24 ✅.

Critical

C1 — manage_run by-id actions now resolve the short ids the tool actually shows. New findWorkflowRunsByIdPrefix(prefix, codebaseId) does a project-scoped prefix lookup; getScopedRun uses it (ambiguous prefixes reported, project isolation enforced in the query), and writes pass the verified full id to the operations.
C2 — Documented the @archon/providers direct-zod exception in CLAUDE.md + at the import site (Claude SDK tool() needs a Zod shape; this SDK-deps-only leaf must not pull in Hono). Kept the import rather than dragging @hono/zod-openapi into the leaf package.

Important

I1 — resume help/response no longer claim "marks resumable"; it validates eligibility and does not restart the run.
I2 — cancel help no longer claims it "stops" a process; it marks the run cancelled (an executing step may finish first).
I3 — ChatPage surfaces conversation/message (re)load errors instead of silently showing stale/empty data.
I4 — startWorkflow logs dispatch failures with the workflow name and returns a clear error.
I5 — Confirm-gate coverage completed: preview-without-confirm for abandon/approve/reject, approve interactive-loop branch, reject rework branch, ambiguous-prefix case.

Suggestions

S1 confirm-preview wording differentiated (cancel/abandon irreversible vs approve/reject human gate) · S2 writes use verified run.id · S3 Pi label derived from tool name · S4 dropped the unsound as Action cast · S5 runId added to the failure log · S6 corrected NativeTool JSDoc, cache.ts LOC note, useDashboardSSE mount note.

Docs — console-open-questions.md §2/§5 + priority list (manage_run Wave 1 + inline approvals shipped here); console README /chat route; ai-assistants.md Pi row for in-process native tools; CLAUDE.md GET /api/providers nativeTools note.

Deferred (proportionality)

A typed NativeToolInputSchema (S4 type-design) — runtime validation already covers the single-tool surface; a partially-constraining type adds churn for marginal gain.
An orchestrator-level injection integration test (I5 Gap 4) — the guard is two trivial clauses whose halves are already covered (buildManageRunTool unit-tested; capability flags tested in provider tests); a full orchestrator integration test is high-cost/brittle for the value.

Note: the handleUpdateProject empty-catch the error hunter flagged is pre-existing code outside this diff — left for a separate follow-up.

…l providers Now that #1819's native manage_run tool is in dev, the CLI run-management pointer (buildRunManagementSection) is redundant for Claude/Pi and would steer them onto a bash path needing `archon` on PATH. Gate it: the orchestrator appends it ONLY for project-scoped chats on providers WITHOUT nativeTools (Codex/OpenCode/Copilot); Claude/Pi get the native tool and are nudged to it. - Move the append out of the pure buildOrchestratorSystemAppend into orchestrator-agent.ts next to the nativeTools check; look caps up lazily (scoped chats only). - prompt-builder tests: assert buildOrchestratorSystemAppend does NOT include the section (orchestrator gates it); keep the buildRunManagementSection content test. - orchestrator-agent test: add buildRunManagementSection to the ./prompt-builder mock (the new export it now calls — otherwise the full --filter run replaces the module and the call is undefined). - Document the console-dock limitation: detached CLI runs appear in the dock (listed by project) but don't stream live and post no chat card, because they run out-of-process — use the native tool for the live experience.

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread packages/providers/package.json

Wirasm changed the title ~~feat(console): pending-input banner + project-scoped agent chat view~~ feat(console): console promotion — pending-input banner, project-scoped agent chat, manage_run native tool (Claude + Pi) Jun 1, 2026

Wirasm added 13 commits June 2, 2026 13:18

docs(console): open questions — compaction/new-conversation + UI gaps…

3f3e89f

… analysis

Wirasm force-pushed the feat/console-promotion branch from 5f890d8 to 0217bdf Compare June 2, 2026 10:21

Wirasm merged commit c92cbef into dev Jun 4, 2026
4 checks passed

Wirasm mentioned this pull request Jun 5, 2026

fix(web/console): gate /console behind auth + correct run-event normalizer #1878

Merged

coderabbitai Bot mentioned this pull request Jun 5, 2026

feat(web/console): chat workflow-completion card + SSE-lock + jump-to-bottom #1885

Merged

This was referenced Jun 7, 2026

feat(web/console): chat surface restyle to Direction B (Modern Console) #1903

Merged

feat(web): make /console the default UI, re-root classic UI under /legacy #1915

Merged

Conversation

Wirasm commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

manage_run native tool

UX Journey

Before

After

Architecture Diagram

After (delta)

Label Snapshot

Change Metadata

Linked Issue

Validation Evidence (required)

Security Impact (required)

Compatibility / Migration

Human Verification (required)

Side Effects / Blast Radius (required)

Rollback Plan (required)

Risks and Mitigations

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Wirasm commented Jun 2, 2026

PR Review Summary — Multi-Agent (#1819)

Critical Issues (1)

Important Issues (13)

Suggestions (S)

Strengths

Documentation Issues (consolidated)

Verdict: NEEDS FIXES

Recommended order

Uh oh!

Wirasm commented Jun 3, 2026

PR Review Summary — multi-agent pass

Critical Issues (2) — block merge

Important Issues (5) — address before merge

Suggestions (6)

Documentation

Strengths

Verdict: NEEDS FIXES

Uh oh!

Wirasm commented Jun 3, 2026

Review findings addressed — pushed a2b21898

Uh oh!

Uh oh!

Wirasm commented May 29, 2026 •

edited

Loading

`manage_run` native tool

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Review findings addressed — pushed `a2b21898`