Skip to content

feat(console): console promotion — pending-input banner, project-scoped agent chat, manage_run native tool (Claude + Pi)#1819

Merged
Wirasm merged 14 commits into
devfrom
feat/console-promotion
Jun 4, 2026
Merged

feat(console): console promotion — pending-input banner, project-scoped agent chat, manage_run native tool (Claude + Pi)#1819
Wirasm merged 14 commits into
devfrom
feat/console-promotion

Conversation

@Wirasm

@Wirasm Wirasm commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Problem: the console experiment is run-centric and read-only — paused runs awaiting human input are buried in the feed, there's no conversational surface under a project, and the agent has no way to operate runs (only the legacy /invoke-workflow text protocol). Live updates also flickered/never settled.
  • Why it matters: these are the steps that promote the console from experiment toward product. Gates must be unmissable, a project-scoped agent chat is the substrate for agent-driven run management, and the agent needs a real tool surface to list/launch/operate runs across every SDK (Claude now, Pi as the ~June-2026 primary).
  • What changed: (1) a pinned pending-input banner; (2) a project-scoped agent chat (Runs | Chat tab swap) with a docked workflow tray (1-vs-N collapse) that renders inline approvals, a "working" pill, and run-detail lifecycle lines; (3) a foundational live-cache fix (stale-while-revalidate + useSyncExternalStore); (4) the manage_run native tool — one tool with progressive disclosure (help), project scoping, and the full lifecycle surface (list/get/start/resume/cancel/abandon/approve/reject), provider-neutral and wired for Claude (Zod/MCP) + Pi (TypeBox/customTools); (5) design/spike docs.
  • What did NOT change (scope boundary): no DB schema/migration; no shipped-product web surface (everything UI lives in packages/web/src/experiments/console/). manage_run writes do state mutation only (CLI-equivalent) — no new auto-resume-after-gate machinery. No project memory, no multi-conversation sidebar, no file-attach composer. Codex/OpenCode get nativeTools: false (served-MCP is a later wave).

manage_run native tool

Provider-neutral NativeTool { name, description, inputSchema (JSON Schema), handler } rides on SendQueryOptions.nativeTools; the handler closes over live core context so @archon/providers never imports @archon/core. Per-provider adapters convert the canonical JSON Schema: Claude → Zod via createSdkMcpServer/tool() (mcp__archon__*, alwaysLoad); Pi → TypeBox via customTools (re-supplying Pi's default tools). One tool, an action discriminator, and a help action for progressive disclosure. Destructive actions are gated on confirm: true (tiered-confirmation policy). Every by-id action is project-scoped via getScopedRun.

chat agent ──calls──▶ manage_run(action, …)        [Claude: mcp__archon__*  | Pi: customTools]
                          │
              buildManageRunTool(ctx) handler  (packages/core/src/orchestrator/manage-run-tool.ts)
                          ├─ list/get      → db/workflows (project-scoped)
                          ├─ start         → dispatchBackgroundWorkflow
                          └─ resume/cancel/abandon/approve/reject → operations/workflow-operations

UX Journey

Before

/console/p/:projectId
  ProjectRail │ Heading · search · FilterChips
              │ Active runs / Recent runs        ← paused runs sit inline; easy to miss
              │ (no chat surface; AI chat lives only in legacy /chat)

After

/console/p/:projectId                 /console/p/:projectId/chat
  ProjectRail │ Heading  [ *Runs | Chat* ]  ← tab swap
              │ [ ⚠ NEEDS YOU banner ]      ← paused runs pinned above the fold
              │ Active / Recent runs        │  message stream (declutter + "working" pill)
                                            │  [ WorkflowDock: live runs, inline approve/reject ]
                                            │  composer  →  agent can call manage_run to
                                            │              list / start / operate runs

Architecture Diagram

After (delta)

RunsPage [~] ─ PendingInputBanner [+] + ProjectViewTabs [+]
ChatPage [+] === useEntity(K.messages) + useConversationSSE [+] + ChatStream [+] + ChatComposer [+]
            └─ WorkflowDock [+] === ApprovalContext + ApprovalPanel (reused, inline) + RunLifecycle [+]
store/cache.ts [~] ─ revalidate() stale-while-revalidate + useEntity via useSyncExternalStore
providers/types.ts [~] ─ NativeTool [+], capabilities.nativeTools [+]
claude/native-tools.ts [+]  pi/native-tools.ts [+]  (JSON-Schema → Zod / TypeBox)
orchestrator-agent.ts [~] ─ injects buildManageRunTool when codebase-scoped + provider supports it

Label Snapshot

  • Risk: risk: medium
  • Size: size: XL
  • Scope: web, core, providers, tests
  • Module: web:console, core:orchestrator, providers:claude, providers:pi

Change Metadata

  • Change type: feature
  • Primary scope: multi (web + core + providers)

Linked Issue

  • Related: console promotion effort (no single tracking issue)

Validation Evidence (required)

bun run type-check   # ✅ all 10 packages
bun run lint         # ✅ clean (--max-warnings 0)
bun run test         # ✅ providers suite green; core orchestrator + manage_run green
  • New unit tests: manage-run-tool.test.ts (18 — help/scope/confirm-gate/dispatch/error), claude/native-tools.test.ts (4), pi/native-tools.test.ts (4 — fail-fast + boolean acceptance).
  • Live-verified end-to-end: Claude invokes manage_run for list/get/start (dispatched archon-assist, ran to completion). Pi (minimax MiniMax-M2.7) invokes manage_run, runs help, and the boolean confirm round-trips through TypeBox as a real boolean (typeof === 'boolean', value === true).
  • Also repaired pre-existing branch test breakage (capability assertions + Pi defineTool mock) — see the fix(providers) commit.

Security Impact (required)

  • New permissions/capabilities? Yes — chat agents in a project-scoped conversation can now operate that project's workflow runs (list/start/resume/cancel/abandon/approve/reject). Mitigations: tool only attaches when a codebase is scoped and the provider supports native tools; every by-id action is project-scoped (getScopedRun blocks cross-project access); destructive actions require explicit confirm: true; writes use the same core ops as the existing CLI/REST surfaces (no new privilege).
  • New external network calls? No.
  • Secrets/tokens handling changed? No.
  • File system access scope changed? No.

Compatibility / Migration

  • Backward compatible? Yes.
  • Config/env changes? No.
  • Database migration needed? No.

Human Verification (required)

  • Verified scenarios: live console chat (Claude) calling manage_run list/start; Pi (minimax) tool round-trip incl. boolean confirm; live-update fix (reply renders, composer re-enables); inline dock approvals reuse the shared approval panel.
  • Edge cases checked: confirm-preview vs confirm-execute; cross-project run access blocked; unknown action/subtool; start with no dispatch context; converter fail-fast on non-object / number / empty-enum schemas.
  • What was not verified: Codex/OpenCode (intentionally nativeTools: false); agent-initiated approve does not auto-resume the run (state-only by design — the dock's human-approve path handles auto-resume).

Side Effects / Blast Radius (required)

  • Affected subsystems: @archon/web console experiment, @archon/core orchestrator (tool injection), @archon/providers (NativeTool contract + Claude/Pi adapters).
  • Potential unintended effects: console isolation is preserved (no prod-UI imports). The provider nativeTools capability + NativeTool contract are additive; existing providers unaffected.
  • Guardrails: tool gated on capability + codebase scope; per-action project scoping; confirm gate on destructive ops; handler catches all errors and returns text (never throws into the agent loop).

Rollback Plan (required)

  • Fast rollback: revert the branch (no migration, no persisted state). The manage_run injection is a single guarded block in orchestrator-agent.ts; the console UI is fully contained in experiments/console/.
  • Observable failure symptoms: agent reports manage_run error: …; console live updates stalling.

Risks and Mitigations

  • Risk: agent operating runs without a human in the loop → Mitigation: tiered confirm gate on destructive actions; approve/reject are state-only (no auto-resume); project scoping.
  • Risk: Pi customTools path is newer than Claude's → Mitigation: live-verified against minimax incl. the boolean round-trip; unit-tested converter; graceful fallback (providers without nativeTools simply don't receive the tool).

@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a project-scoped console chat (conversation primitives, HTTP skills, per-conversation SSE), new chat UI (composer, stream, tabs, ChatPage route), cache revalidation (stale-while-revalidate via revalidate/useSyncExternalStore), pending-input promotion/banner and run-card flag, plus native-tool support with a manage_run tool and provider MCP wiring.

Changes

Console Agent Chat Feature

Layer / File(s) Summary
Design documentation and conversation data model
packages/web/src/experiments/console/agent-chat-spike.md, packages/web/src/experiments/console/primitives/conversation.ts, packages/web/src/experiments/console/store/keys.ts
Spike doc and architecture notes; adds ConversationSummary and toConversationSummary; introduces K.conversations(projectId) cache key.
Conversation API skills and SSE infrastructure
packages/web/src/experiments/console/skills/conversations.ts, packages/web/src/experiments/console/skills/index.ts, packages/web/src/experiments/console/lib/sse.ts
Adds createConversation, listConversations, sendMessage (JSON and multipart handling, structured error extraction) and useConversationSSE hook with debounced invalidation and conversation_lock handling; re-exports conversations.
Chat UI components (composer, stream, tabs, pending banner)
packages/web/src/experiments/console/components/ChatComposer.tsx, packages/web/src/experiments/console/components/ChatStream.tsx, packages/web/src/experiments/console/components/ProjectViewTabs.tsx, packages/web/src/experiments/console/components/PendingInputBanner.tsx
ChatComposer (auto-grow, Enter-send, Shift+Enter newline, Escape blur), ChatStream (message filtering + optional tool traces), ProjectViewTabs (Runs/Chat links), and PendingInputBanner/PendingInputCard for promoted paused runs.
Chat page implementation and console route registration
packages/web/src/experiments/console/routes/ChatPage.tsx, packages/web/src/experiments/console/ConsoleApp.tsx
ChatPage loads project/conversations/messages, registers useConversationSSE, manages busy/settle/recovery polling, auto-scroll, and send logic (creates conversation on first send); ConsoleApp mounts /console/p/:projectId/chat.
Pending input promotion (banner, active card, runs integration)
packages/web/src/experiments/console/components/PendingInputBanner.tsx, packages/web/src/experiments/console/components/ActiveRunCard.tsx, packages/web/src/experiments/console/routes/RunsPage.tsx
Adds pinned pending-input banner and cards, extends ActiveRunCard with inputPromoted?: boolean, updates RunsPage to compute pending/dismissed/visible sets and derive promotedRunIds passed into RunsFeed.
Cache revalidation and external-store integration
packages/web/src/experiments/console/store/cache.ts
Adds internal revalidate(key) with inflight de-duplication and error retention; invalidate now calls revalidate; useEntity refactored to useSyncExternalStore and refetch() triggers revalidate.
UI augmentations: lifecycle markers, workflow dock, working indicator, run detail wiring
packages/web/src/experiments/console/components/RunLifecycle.tsx, packages/web/src/experiments/console/components/WorkflowDock.tsx, packages/web/src/experiments/console/components/WorkingIndicator.tsx, packages/web/src/experiments/console/routes/RunDetailPage.tsx
Adds RunStartedLine and RunFinishedLine markers, WorkflowDock tray for running/paused runs, WorkingIndicator component, and wraps RunStream with lifecycle markers.

Native Tooling and Manage-Run

Layer / File(s) Summary
manage_run native tool
packages/core/src/orchestrator/manage-run-tool.ts, packages/core/src/orchestrator/orchestrator-agent.ts
Adds ManageRunContext and buildManageRunTool (read-only list/get actions) and injects a per-project manage_run native tool into agent requests when provider nativeTools is supported.
Provider native tool types and Claude MCP builder
packages/providers/src/types.ts, packages/providers/src/claude/native-tools.ts, packages/providers/src/claude/provider.ts, packages/providers/*/capabilities.ts, packages/providers/package.json
Adds NativeTool interface and AgentRequestOptions.nativeTools, introduces buildArchonMcpServer converting tool schemas to Zod shapes and registering MCP server for Claude, wires MCP server into Claude sendQuery, updates provider capability flags (nativeTools true/false), and adds zod dependency.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs:

  • coleam00/Archon#1747: Prior chat work and console groundwork that this change builds on (chat routing, SSE, and console components).

"🐰 In the console's quiet code-lit glade,
I stitched a chat so runs needn't fade,
Tabs hop in place, messages flow,
Paused runs now politely ask and show,
A cheerful spike — the agent's parade."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The PR title clearly and specifically summarizes the main changes: adding a pending-input banner, project-scoped agent chat, and the manage_run native tool for Claude and Pi.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering all key sections: problem statement, why it matters, what changed/didn't change, detailed architecture diagrams, UX journeys, validation evidence, security impact, compatibility, human verification, side effects, and rollback plans.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/console-promotion

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
packages/web/src/experiments/console/skills/conversations.ts (1)

37-41: ⚡ Quick win

Replace Parameters<...> indirection with an explicit raw type.

The Parameters<typeof toConversationSummary>[0][] annotation hides the actual shape behind type meta-programming. Exporting RawConversation and referencing it directly reads clearer and keeps the data contract explicit.

♻️ Proposed refactor

In primitives/conversation.ts:

-interface RawConversation {
+export interface RawConversation {

In this file:

-import { toConversationSummary, type ConversationSummary } from '../primitives/conversation';
+import {
+  toConversationSummary,
+  type ConversationSummary,
+  type RawConversation,
+} from '../primitives/conversation';
@@
-  const raw = await requestJson<Parameters<typeof toConversationSummary>[0][]>(
+  const raw = await requestJson<RawConversation[]>(
     `/api/conversations?codebaseId=${encodeURIComponent(projectId)}`
   );

As per coding guidelines: "prefer explicit branches and typed interfaces over hidden dynamic behavior."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/skills/conversations.ts` around lines 37
- 41, The listConversations function uses Parameters<typeof
toConversationSummary>[0][] which hides the incoming shape; instead export and
use an explicit RawConversation type from primitives/conversation.ts and update
listConversations to call
requestJson<RawConversation[]>('/api/conversations?codebaseId=...') and then
map(raw => toConversationSummary(raw)) — ensure primitives/conversation.ts
exports RawConversation and update the import in this file so the function
signature and requestJson generic use RawConversation[] while keeping the return
type Promise<ConversationSummary[]>.
packages/web/src/experiments/console/lib/sse.ts (1)

170-226: ⚡ Quick win

onLockChange dependency in useConversationSSE—current usage is stable, so churn is unlikely
ChatPage passes setLocked directly (useState setter), which is stable; so including onLockChange in the effect deps won’t cause the “reconnect on every render” churn for this caller. The useRef-stabilization approach is only worth it if other callers ever pass a non-memoized inline callback.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/lib/sse.ts` around lines 170 - 226, The
change that stabilizes onLockChange via a useRef is unnecessary because useState
setters (e.g., setLocked) are already stable; revert any refactor and keep the
effect dependency array as [conversationPlatformId, onLockChange] inside
useConversationSSE, ensuring es.onmessage still calls onLockChange?.(ev.locked)
directly and no useRef indirection is introduced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/web/src/experiments/console/components/ChatComposer.tsx`:
- Around line 59-71: The textarea in ChatComposer.tsx never receives the
disabled prop, so when the component is locked or sending the field remains
editable; update the <textarea> to forward the disabled boolean (the same prop
used for the placeholder) so the DOM element truly becomes readonly and the CSS
class disabled:opacity-50 takes effect; locate the JSX using textareaRef, value,
setValue, grow, onKeyDown and MAX_HEIGHT and add disabled={disabled} to that
element (ensuring existing behavior for placeholder using disabledReason remains
unchanged).

In `@packages/web/src/experiments/console/skills/conversations.ts`:
- Around line 59-78: The multipart fetch call in conversations.ts (the fetch
invocation that builds a FormData and posts it) omits credentials, causing
parity mismatch with requestJson which sets credentials: 'same-origin'; update
the fetch call options for the POST (the fetch(url, { method: 'POST', body: form
})) to include credentials: 'same-origin' so same-origin cookies/CSRF are
preserved; apply the same change in the analogous startRun.ts multipart upload
code to keep behavior consistent and future-proof absolute/CORS URL changes.

---

Nitpick comments:
In `@packages/web/src/experiments/console/lib/sse.ts`:
- Around line 170-226: The change that stabilizes onLockChange via a useRef is
unnecessary because useState setters (e.g., setLocked) are already stable;
revert any refactor and keep the effect dependency array as
[conversationPlatformId, onLockChange] inside useConversationSSE, ensuring
es.onmessage still calls onLockChange?.(ev.locked) directly and no useRef
indirection is introduced.

In `@packages/web/src/experiments/console/skills/conversations.ts`:
- Around line 37-41: The listConversations function uses Parameters<typeof
toConversationSummary>[0][] which hides the incoming shape; instead export and
use an explicit RawConversation type from primitives/conversation.ts and update
listConversations to call
requestJson<RawConversation[]>('/api/conversations?codebaseId=...') and then
map(raw => toConversationSummary(raw)) — ensure primitives/conversation.ts
exports RawConversation and update the import in this file so the function
signature and requestJson generic use RawConversation[] while keeping the return
type Promise<ConversationSummary[]>.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0435aa53-8c38-4e66-9dfd-0e3a3ab58559

📥 Commits

Reviewing files that changed from the base of the PR and between ab1bee7 and b4de6a5.

📒 Files selected for processing (14)
  • packages/web/src/experiments/console/ConsoleApp.tsx
  • packages/web/src/experiments/console/agent-chat-spike.md
  • packages/web/src/experiments/console/components/ActiveRunCard.tsx
  • packages/web/src/experiments/console/components/ChatComposer.tsx
  • packages/web/src/experiments/console/components/ChatStream.tsx
  • packages/web/src/experiments/console/components/PendingInputBanner.tsx
  • packages/web/src/experiments/console/components/ProjectViewTabs.tsx
  • packages/web/src/experiments/console/lib/sse.ts
  • packages/web/src/experiments/console/primitives/conversation.ts
  • packages/web/src/experiments/console/routes/ChatPage.tsx
  • packages/web/src/experiments/console/routes/RunsPage.tsx
  • packages/web/src/experiments/console/skills/conversations.ts
  • packages/web/src/experiments/console/skills/index.ts
  • packages/web/src/experiments/console/store/keys.ts

Comment on lines +59 to +71
<textarea
ref={textareaRef}
value={value}
onChange={e => {
setValue(e.target.value);
grow(e.target);
}}
onKeyDown={onKeyDown}
rows={1}
placeholder={disabled ? (disabledReason ?? 'Waiting…') : 'Message the agent…'}
className="min-h-[40px] flex-1 resize-none rounded border border-border bg-surface-inset px-3 py-2 text-[13px] text-text-primary placeholder:text-text-tertiary focus:border-border-bright focus:outline-none disabled:opacity-50"
style={{ maxHeight: `${MAX_HEIGHT.toString()}px` }}
/>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Apply the disabled prop to the textarea.

The disabled prop is never forwarded to the <textarea>, so the disabled:opacity-50 class never activates and the field stays editable while locked || sending. Submission is blocked by submit(), but the user can still type into a "locked" composer with no visual feedback.

🛠️ Proposed fix
         <textarea
           ref={textareaRef}
           value={value}
+          disabled={disabled}
           onChange={e => {
             setValue(e.target.value);
             grow(e.target);
           }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<textarea
ref={textareaRef}
value={value}
onChange={e => {
setValue(e.target.value);
grow(e.target);
}}
onKeyDown={onKeyDown}
rows={1}
placeholder={disabled ? (disabledReason ?? 'Waiting…') : 'Message the agent…'}
className="min-h-[40px] flex-1 resize-none rounded border border-border bg-surface-inset px-3 py-2 text-[13px] text-text-primary placeholder:text-text-tertiary focus:border-border-bright focus:outline-none disabled:opacity-50"
style={{ maxHeight: `${MAX_HEIGHT.toString()}px` }}
/>
<textarea
ref={textareaRef}
value={value}
disabled={disabled}
onChange={e => {
setValue(e.target.value);
grow(e.target);
}}
onKeyDown={onKeyDown}
rows={1}
placeholder={disabled ? (disabledReason ?? 'Waiting…') : 'Message the agent…'}
className="min-h-[40px] flex-1 resize-none rounded border border-border bg-surface-inset px-3 py-2 text-[13px] text-text-primary placeholder:text-text-tertiary focus:border-border-bright focus:outline-none disabled:opacity-50"
style={{ maxHeight: `${MAX_HEIGHT.toString()}px` }}
/>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/components/ChatComposer.tsx` around
lines 59 - 71, The textarea in ChatComposer.tsx never receives the disabled
prop, so when the component is locked or sending the field remains editable;
update the <textarea> to forward the disabled boolean (the same prop used for
the placeholder) so the DOM element truly becomes readonly and the CSS class
disabled:opacity-50 takes effect; locate the JSX using textareaRef, value,
setValue, grow, onKeyDown and MAX_HEIGHT and add disabled={disabled} to that
element (ensuring existing behavior for placeholder using disabledReason remains
unchanged).

Comment on lines +59 to +78
// Multipart path: don't set Content-Type — the browser adds the boundary.
const form = new FormData();
form.append('message', message);
for (const file of files) {
form.append('files', file, file.name);
}
const res = await fetch(url, { method: 'POST', body: form });
if (!res.ok) {
const text = await res.text().catch(() => '');
let parsed: { error?: string } = {};
try {
parsed = JSON.parse(text) as { error?: string };
} catch {
/* not JSON */
}
const raw = parsed.error ?? (text.length > 0 ? text : `HTTP ${res.status.toString()}`);
const msg = raw.length > 200 ? `${raw.slice(0, 200)}...` : raw;
const path = new URL(url, window.location.origin).pathname;
throw new HttpError(res.status, path, msg);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect requestJson to see what defaults the multipart fetch must replicate.
fd 'http.ts' --type f | xargs rg -n -A40 'export (async )?function requestJson'

Repository: coleam00/Archon

Length of output: 814


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the exact multipart fetch code + nearby context.
sed -n '1,200p' packages/web/src/experiments/console/skills/conversations.ts | nl -ba | sed -n '40,120p'

# Find any other upload/multipart patterns in the same file.
rg -n "new FormData\(|multipart|files'" packages/web/src/experiments/console/skills/conversations.ts -n

# Inspect requestJson usage in nearby code for how headers/auth are typically handled.
rg -n "requestJson\(" packages/web/src/experiments/console/skills/conversations.ts -n

# Locate mergeHeaders usage/imports to see if any auth/CSRF defaults are expected.
rg -n "mergeHeaders|`@archon`|CSRF|credentials" packages/web/src/experiments/console/skills/conversations.ts packages/web/src/experiments/console/skills -S || true

Repository: coleam00/Archon

Length of output: 102


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the exact multipart fetch code + nearby context with line numbers.
cat -n packages/web/src/experiments/console/skills/conversations.ts | sed -n '40,140p'

# Find any other multipart/FormData usage in the same file.
rg -n "new FormData\(|files'" packages/web/src/experiments/console/skills/conversations.ts

# Locate requestJson usage in nearby code for how it handles defaults.
rg -n "requestJson\(" packages/web/src/experiments/console/skills/conversations.ts

# Look for credentials/auth/CSRF patterns in this area.
rg -n "credentials|CSRF|X-CSRF|Authorization|mergeHeaders" packages/web/src/experiments/console/skills/conversations.ts

Repository: coleam00/Archon

Length of output: 1667


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Dump the whole file (it appears small) with line numbers.
wc -l packages/web/src/experiments/console/skills/conversations.ts
cat -n packages/web/src/experiments/console/skills/conversations.ts

# Find where requestJson is defined (from earlier evidence, likely in packages/.../http.ts).
# Search for "mergeHeaders" and "export async function requestJson" definitions.
rg -n "export async function requestJson" packages -S
rg -n "function mergeHeaders|mergeHeaders" packages -S | head -n 50

Repository: coleam00/Archon

Length of output: 3877


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect requestJson + helpers fully
cat -n packages/web/src/experiments/console/lib/http.ts

# Find "startRun" multipart path reference and inspect it.
fd -t f 'startRun' packages/web/src/experiments/console -x echo
fd -t f 'startRun\.ts' packages/web/src/experiments/console | head -n 20
rg -n "Multipart|new FormData\(|files'" packages/web/src/experiments/console -S
rg -n "startRun" packages/web/src/experiments/console/skills -S || true

# Specifically locate any other multipart fetch in the console experiments.
rg -n "fetch\\(.*body: form\\b|new FormData\\(\\)" packages/web/src/experiments/console -S

Repository: coleam00/Archon

Length of output: 4311


🏁 Script executed:

#!/bin/bash
set -euo pipefail
cat -n packages/web/src/experiments/console/skills/startRun.ts | sed -n '1,120p'

Repository: coleam00/Archon

Length of output: 3252


Multipart fetch parity with requestJson: only credentials differs (and it’s likely redundant for same-origin relative URLs).

requestJson injects only credentials: 'same-origin' (and Content-Type: application/json for non-FormData bodies). The multipart code in conversations.ts (and similarly in startRun.ts) omits credentials, but both use relative /api/... URLs, so the browser’s default should already be same-origin; auth/CSRF headers aren’t being dropped here.

For consistency and future safety (if URLs become absolute/CORS), set credentials: 'same-origin' on the multipart fetch.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/skills/conversations.ts` around lines 59
- 78, The multipart fetch call in conversations.ts (the fetch invocation that
builds a FormData and posts it) omits credentials, causing parity mismatch with
requestJson which sets credentials: 'same-origin'; update the fetch call options
for the POST (the fetch(url, { method: 'POST', body: form })) to include
credentials: 'same-origin' so same-origin cookies/CSRF are preserved; apply the
same change in the analogous startRun.ts multipart upload code to keep behavior
consistent and future-proof absolute/CORS URL changes.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/web/src/experiments/console/routes/ChatPage.tsx`:
- Around line 75-103: The settle timer is currently tied only to the trailing
assistant message and a weak sig (list.length:last.id), which lets stale timers
cancel active turns; fix by gating the timer to the active turn (derive and
store an activeTurnId when a trailing user message appears) and only arm/reset
the timer for that same activeTurnId; also strengthen settleSigRef to include a
content/version signal from the assistant message (e.g., last.content or
last.updatedAt/streamVersion along with last.id) so streamed updates advance the
sig and re-arm the timer correctly; update references: settleTimerRef,
settleSigRef, messages, setBusy, SETTLE_MS to use the activeTurnId guard and the
richer sig.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e11dfe71-9020-4405-922c-b2dced25ee33

📥 Commits

Reviewing files that changed from the base of the PR and between b4de6a5 and d272044.

📒 Files selected for processing (1)
  • packages/web/src/experiments/console/routes/ChatPage.tsx

Comment on lines +75 to +103
// Derive turn state from the trailing message: a user message means a reply
// is pending; once an assistant reply lands and stays stable for SETTLE_MS the
// turn is done. This also recovers a reload mid-turn (trailing user message).
const settleTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);
const settleSigRef = useRef('');
useEffect(() => {
const list = messages ?? [];
const last = list[list.length - 1];
if (last === undefined) return;
if (last.role === 'user') {
settleSigRef.current = '';
if (settleTimerRef.current !== null) {
clearTimeout(settleTimerRef.current);
settleTimerRef.current = null;
}
setBusy(true);
return;
}
// Trailing message is an assistant/system reply. Arm the settle timer once;
// re-arm only on real content change so identical poll refetches (same sig)
// don't reset it forever.
const sig = `${list.length}:${last.id}`;
if (sig === settleSigRef.current) return;
settleSigRef.current = sig;
if (settleTimerRef.current !== null) clearTimeout(settleTimerRef.current);
settleTimerRef.current = setTimeout(() => {
setBusy(false);
}, SETTLE_MS);
}, [messages]);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tie the settle timer to the active turn, not just the trailing assistant message.

This timeout is armed for any trailing assistant/system message, even when the chat is already idle. If the page is sitting on an old assistant reply and the user sends again before that timeout fires, the stale timer can flip busy back to false mid-request. Also, sig only uses list.length and last.id, so streamed updates to the same assistant message never reset the timer. Please gate the settle timer on the current pending turn and base sig on a real content/version signal from last.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/routes/ChatPage.tsx` around lines 75 -
103, The settle timer is currently tied only to the trailing assistant message
and a weak sig (list.length:last.id), which lets stale timers cancel active
turns; fix by gating the timer to the active turn (derive and store an
activeTurnId when a trailing user message appears) and only arm/reset the timer
for that same activeTurnId; also strengthen settleSigRef to include a
content/version signal from the assistant message (e.g., last.content or
last.updatedAt/streamVersion along with last.id) so streamed updates advance the
sig and re-arm the timer correctly; update references: settleTimerRef,
settleSigRef, messages, setBusy, SETTLE_MS to use the activeTurnId guard and the
richer sig.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/web/src/experiments/console/store/cache.ts`:
- Around line 182-195: The hook's useSyncExternalStore subscription only returns
cache.get(key) so changes to errors or inflight don't change the snapshot and
React won't re-render; update the snapshot to include a version or a tuple so
any mutation causes a new reference. Specifically, modify the getSnapshot used
in useSyncExternalStore (and the value your subscribe invalidates) to return
either a { data, error, loadingVersion } or a small epoch/version number that
you increment whenever cache, errors, or inflight change (referencing
useSyncExternalStore, subscribe, cache.get, errors.get, inflight, and
revalidate); increment that version on mutations and read it in getSnapshot so
error/loading changes produce new object identities and trigger re-renders.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6042461e-d7c7-40fe-9278-583635a4f1c5

📥 Commits

Reviewing files that changed from the base of the PR and between d272044 and 9904a42.

📒 Files selected for processing (1)
  • packages/web/src/experiments/console/store/cache.ts

Comment on lines +182 to 195
const data = useSyncExternalStore(
subscribe,
() => cache.get(key) as T | undefined,
() => cache.get(key) as T | undefined
);

return {
data: cache.get(key) as T | undefined,
data,
error: errors.get(key),
loading: !cache.has(key) && inflight.has(key),
refetch: (): void => {
cache.delete(key);
errors.delete(key);
notify(key);
ensureLoad(key);
revalidate(key);
},
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Error and loading state changes do not trigger re-renders.

useSyncExternalStore only re-renders when getSnapshot returns a different value (by Object.is). Since the snapshot reads only cache.get(key):

  1. Initial load failure: cache stays undefined, snapshot unchanged → no re-render → stuck at loading: true forever
  2. Revalidate failure: stale cache reference unchanged → no re-render → error never shown

error and loading are computed during render but never re-read if React doesn't re-render.

Consider tracking a version/epoch that increments on any mutation (cache, error, or inflight change), or store a { data, error } tuple per key so either change produces a new reference.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/store/cache.ts` around lines 182 - 195,
The hook's useSyncExternalStore subscription only returns cache.get(key) so
changes to errors or inflight don't change the snapshot and React won't
re-render; update the snapshot to include a version or a tuple so any mutation
causes a new reference. Specifically, modify the getSnapshot used in
useSyncExternalStore (and the value your subscribe invalidates) to return either
a { data, error, loadingVersion } or a small epoch/version number that you
increment whenever cache, errors, or inflight change (referencing
useSyncExternalStore, subscribe, cache.get, errors.get, inflight, and
revalidate); increment that version on mutations and read it in getSnapshot so
error/loading changes produce new object identities and trigger re-renders.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/web/src/experiments/console/routes/ChatPage.tsx (1)

172-190: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Hoist currentActivity’s useMemo above the projectId early return to keep hook order stable.

packages/web/src/experiments/console/routes/ChatPage.tsx calls useMemo after if (projectId === undefined) return ...;. If projectId can be undefined on one render and defined on another, React can throw a hooks-order error. Move messageList/useMemo above the early return (or otherwise ensure the component never returns before the hook).

🐛 Proposed fix — hoist above the early return
+  const messageList = messages ?? [];
+
+  // Current activity for the working indicator: the latest tool the agent
+  // invoked in the in-flight turn (walk back to the last user message).
+  const currentActivity = useMemo<string | null>(() => {
+    for (let i = messageList.length - 1; i >= 0; i--) {
+      const m = messageList[i];
+      if (m === undefined) continue;
+      if (m.role === 'user') break;
+      if (m.role === 'assistant' && m.toolCalls.length > 0) {
+        return m.toolCalls[m.toolCalls.length - 1]?.name ?? null;
+      }
+    }
+    return null;
+  }, [messageList]);
+
   if (projectId === undefined) {
     return <EmptyState title="No project selected." />;
   }
-
-  const messageList = messages ?? [];
-
-  // Current activity for the working indicator: the latest tool the agent
-  // invoked in the in-flight turn (walk back to the last user message).
-  const currentActivity = useMemo<string | null>(() => {
-    for (let i = messageList.length - 1; i >= 0; i--) {
-      const m = messageList[i];
-      if (m === undefined) continue;
-      if (m.role === 'user') break;
-      if (m.role === 'assistant' && m.toolCalls.length > 0) {
-        return m.toolCalls[m.toolCalls.length - 1]?.name ?? null;
-      }
-    }
-    return null;
-  }, [messageList]);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/routes/ChatPage.tsx` around lines 172 -
190, The useMemo that computes currentActivity (using messageList and useMemo)
is declared after the early return for projectId and can break hook order; move
the messageList and the useMemo/currentActivity declaration above the if
(projectId === undefined) return ... check in ChatPage.tsx so the hook always
runs in the same order, or ensure the component never returns before invoking
useMemo—update references to projectId as needed so the early-return remains
correct after hoisting.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/web/src/experiments/console/routes/ChatPage.tsx`:
- Around line 172-190: The useMemo that computes currentActivity (using
messageList and useMemo) is declared after the early return for projectId and
can break hook order; move the messageList and the useMemo/currentActivity
declaration above the if (projectId === undefined) return ... check in
ChatPage.tsx so the hook always runs in the same order, or ensure the component
never returns before invoking useMemo—update references to projectId as needed
so the early-return remains correct after hoisting.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f31941b-6fc2-4024-a66c-db83347f5f07

📥 Commits

Reviewing files that changed from the base of the PR and between 9904a42 and b867a34.

📒 Files selected for processing (7)
  • packages/web/src/experiments/console/components/ChatStream.tsx
  • packages/web/src/experiments/console/components/RunLifecycle.tsx
  • packages/web/src/experiments/console/components/WorkflowDock.tsx
  • packages/web/src/experiments/console/components/WorkingIndicator.tsx
  • packages/web/src/experiments/console/console-open-questions.md
  • packages/web/src/experiments/console/routes/ChatPage.tsx
  • packages/web/src/experiments/console/routes/RunDetailPage.tsx

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
packages/core/src/orchestrator/manage-run-tool.ts (1)

60-60: 💤 Low value

Logging emits terminal states without a paired _started.

manage_run.list_completed, manage_run.get_completed, and manage_run.failed are logged, but there is no corresponding _started event, so a failure can't be correlated to an initiation in the logs.

As per coding guidelines: event naming {domain}.{action}_{state} and "always pair _started with _completed or _failed".

Also applies to: 69-69, 94-94

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/orchestrator/manage-run-tool.ts` at line 60, The code logs
terminal events like "manage_run.list_completed", "manage_run.get_completed" and
"manage_run.failed" without emitting matching "manage_run.*_started" events; add
paired "_started" logs immediately before the corresponding operations so every
completed/failed log has a matching started log. Locate the places that
currently call log.info/log.error for "manage_run.list_completed",
"manage_run.get_completed", and "manage_run.failed" in manage-run-tool.ts and
insert corresponding log.info calls for "manage_run.list_started",
"manage_run.get_started", and "manage_run.started" (or
"manage_run.<action>_started") with the same metadata object (e.g., { action,
codebaseId: ctx.codebaseId }) so states can be correlated.
packages/web/src/experiments/console/components/WorkflowDock.tsx (2)

124-132: ⚡ Quick win

Button is interactive when navigation target is unavailable.

The button remains clickable even when run.projectId is null, resulting in a no-op click. The hover states still apply, creating a misleading affordance. Consider disabling the button or omitting it when projectId is unavailable.

Disable button when projectId is null
 <button
   type="button"
+  disabled={run.projectId === null}
   onClick={() => {
     if (run.projectId !== null) navigate(`/console/p/${run.projectId}/r/${run.id}`);
   }}
-  className="ml-auto shrink-0 font-mono text-[11px] text-text-tertiary transition-colors hover:text-text-primary"
+  className="ml-auto shrink-0 font-mono text-[11px] text-text-tertiary transition-colors hover:text-text-primary disabled:cursor-not-allowed disabled:opacity-50"
 >
   Open logs →
 </button>

Apply the same fix to the DockCard button on line 147.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/components/WorkflowDock.tsx` around
lines 124 - 132, The "Open logs →" button in WorkflowDock is interactive even
when run.projectId is null; update the button (component WorkflowDock) to be
disabled when run.projectId === null (add the disabled attribute and
aria-disabled) and conditionally remove/replace hover/cursor classes so hover
states and pointer don't apply when disabled; keep the onClick logic using
navigate(`/console/p/${run.projectId}/r/${run.id}`) but it will no longer be
reachable when disabled. Apply the identical change to the DockCard "Open logs"
button so both buttons use the same disabled logic and class toggling based on
run.projectId.

147-153: ⚡ Quick win

Same navigation affordance issue in DockCard.

This button has the same pattern as ApprovalDockCard above — interactive when projectId is null. Apply the same disabled-state fix here for consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/experiments/console/components/WorkflowDock.tsx` around
lines 147 - 153, The button in WorkflowDock (same pattern as ApprovalDockCard)
is still clickable when run.projectId is null; update the button element so it
is disabled when run.projectId === null (add the disabled attribute and
aria-disabled), guard or remove the onClick handler when disabled (only call
navigate(`/console/p/${run.projectId}/r/${run.id}`) if run.projectId != null),
and apply the same visual disabled classes used in ApprovalDockCard (e.g.,
cursor-not-allowed/opacity styles) so the component (WorkflowDock / DockCard)
has consistent disabled-state affordance.
packages/core/src/orchestrator/orchestrator-agent.ts (1)

989-994: ⚡ Quick win

Use truthy check for codebase_id to match file patterns.

Line 992 uses conversation.codebase_id !== null, but the rest of this file consistently uses truthy checks for the same field (lines 542, 758, 952). A truthy check would handle both null and undefined consistently and align with the established pattern.

Align with existing pattern
-    if (conversation.codebase_id !== null && getProviderCapabilities(providerKey).nativeTools) {
+    if (conversation.codebase_id && getProviderCapabilities(providerKey).nativeTools) {
       requestOptions.nativeTools = [buildManageRunTool({ codebaseId: conversation.codebase_id })];
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/orchestrator/orchestrator-agent.ts` around lines 989 - 994,
The condition uses a strict null check for conversation.codebase_id but the file
consistently uses truthy checks; change the guard to use a truthy check (e.g.,
if (conversation.codebase_id) ) so undefined values are handled the same way as
null and keep behavior aligned with other checks (affecting the block that sets
requestOptions.nativeTools using buildManageRunTool and
getProviderCapabilities(providerKey)).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/providers/package.json`:
- Line 36: Verify end-to-end that the Zod v4 schemas defined in native-tools.ts
produce the expected tool input schemas when passed to tool(): run the runtime
path that constructs each tool (calls tool()) and assert the resulting
inputSchema/shape matches expectations; if you observe mismatches or known Zod‑4
conversion edge cases, either adjust the native-tools.ts definitions to a
compatible form or switch to the SDK’s recommended helper (or pin Zod to the
SDK‑compatible version) so tool() receives a supported schema shape. Ensure you
exercise every schema used by native-tools.ts and document which fix (helper vs.
downgrade) you applied.

---

Nitpick comments:
In `@packages/core/src/orchestrator/manage-run-tool.ts`:
- Line 60: The code logs terminal events like "manage_run.list_completed",
"manage_run.get_completed" and "manage_run.failed" without emitting matching
"manage_run.*_started" events; add paired "_started" logs immediately before the
corresponding operations so every completed/failed log has a matching started
log. Locate the places that currently call log.info/log.error for
"manage_run.list_completed", "manage_run.get_completed", and "manage_run.failed"
in manage-run-tool.ts and insert corresponding log.info calls for
"manage_run.list_started", "manage_run.get_started", and "manage_run.started"
(or "manage_run.<action>_started") with the same metadata object (e.g., {
action, codebaseId: ctx.codebaseId }) so states can be correlated.

In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 989-994: The condition uses a strict null check for
conversation.codebase_id but the file consistently uses truthy checks; change
the guard to use a truthy check (e.g., if (conversation.codebase_id) ) so
undefined values are handled the same way as null and keep behavior aligned with
other checks (affecting the block that sets requestOptions.nativeTools using
buildManageRunTool and getProviderCapabilities(providerKey)).

In `@packages/web/src/experiments/console/components/WorkflowDock.tsx`:
- Around line 124-132: The "Open logs →" button in WorkflowDock is interactive
even when run.projectId is null; update the button (component WorkflowDock) to
be disabled when run.projectId === null (add the disabled attribute and
aria-disabled) and conditionally remove/replace hover/cursor classes so hover
states and pointer don't apply when disabled; keep the onClick logic using
navigate(`/console/p/${run.projectId}/r/${run.id}`) but it will no longer be
reachable when disabled. Apply the identical change to the DockCard "Open logs"
button so both buttons use the same disabled logic and class toggling based on
run.projectId.
- Around line 147-153: The button in WorkflowDock (same pattern as
ApprovalDockCard) is still clickable when run.projectId is null; update the
button element so it is disabled when run.projectId === null (add the disabled
attribute and aria-disabled), guard or remove the onClick handler when disabled
(only call navigate(`/console/p/${run.projectId}/r/${run.id}`) if run.projectId
!= null), and apply the same visual disabled classes used in ApprovalDockCard
(e.g., cursor-not-allowed/opacity styles) so the component (WorkflowDock /
DockCard) has consistent disabled-state affordance.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f7bc8be6-5b82-4a74-9d30-09ee185ccb28

📥 Commits

Reviewing files that changed from the base of the PR and between b867a34 and a86ed2d.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • packages/core/src/orchestrator/manage-run-tool.ts
  • packages/core/src/orchestrator/orchestrator-agent.ts
  • packages/providers/package.json
  • packages/providers/src/claude/capabilities.ts
  • packages/providers/src/claude/native-tools.ts
  • packages/providers/src/claude/provider.ts
  • packages/providers/src/codex/capabilities.ts
  • packages/providers/src/community/copilot/capabilities.ts
  • packages/providers/src/community/opencode/capabilities.ts
  • packages/providers/src/community/pi/capabilities.ts
  • packages/providers/src/types.ts
  • packages/web/src/experiments/console/components/WorkflowDock.tsx

Comment thread packages/providers/package.json
@Wirasm Wirasm changed the title feat(console): pending-input banner + project-scoped agent chat view feat(console): console promotion — pending-input banner, project-scoped agent chat, manage_run native tool (Claude + Pi) Jun 1, 2026
@Wirasm

Wirasm commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

PR Review Summary — Multi-Agent (#1819)

7 specialized agents reviewed the diff (44 files, +2,667 / −56): code-reviewer, docs-impact, pr-test-analyzer, comment-analyzer, silent-failure-hunter, type-design-analyzer, code-simplifier. Issue IDs use C/I/S prefixes (not bare #N) to avoid spurious issue auto-links.

Merge blocker (mechanical): the branch is currently CONFLICTING with base dev. Rebase/resolve before merge regardless of the findings below.


Critical Issues (1)

ID Agent Issue Location
C1 code-reviewer runId schema description promises "Accepts the short (8-char) or full id" and list output prints only 8-char prefixes — but getScopedRungetWorkflowRun does an exact UUID match (WHERE id = $1). The agent's natural list → get/resume/cancel flow with the prefix it just saw always returns "no run found". This breaks the tool's primary loop on the first interaction. Fix: either (a) print full IDs in list + correct the description, or (b) add prefix matching (`WHERE id LIKE $1

Important Issues (13)

ID Agent Issue Location
I1 silent-failure startWorkflow calls dispatchBackgroundWorkflow with no try/catch. Dispatch throws on codebase-not-found / isolation failure; the error reaches the agent via the generic manage_run error: outer catch, and the structured log omits workflowName — undiagnosable in prod. Wrap in its own try/catch, log.error({ workflowName, codebaseId }, 'manage_run.start_dispatch_failed'), return a distinct message. orchestrator-agent.ts:1008-1022
I2 silent-failure buildArchonMcpServer (Claude) / buildPiNativeToolDefinitions (Pi) convert schemas synchronously and can throw, but sit outside the retry try-block. A schema bug surfaces as a generic query_error with no tool name, and (Claude) is fed into retry classification that can't succeed. Wrap with a dedicated try/catch that logs { toolNames } and rethrows without retry. claude/provider.ts:924-931, pi/provider.ts:473-476
I3 code-reviewer import { z } from 'zod' violates the CLAUDE.md convention ("import z from @hono/zod-openapi, not zod directly") and adds a direct zod dep to @archon/providers. claude/native-tools.ts:1 + packages/providers/package.json
I4 code-reviewer, comment, simplifier, type-design Pi adapter hardcodes label: 'Manage runs' for every NativeTool it converts — wrong the moment a second tool is registered. Derive from spec.name (or add label?: string to NativeTool). (flagged independently by 4 agents) pi/native-tools.ts:57
I5 silent-failure WorkflowDock destructures only data from useEntity and discards error → on fetch failure the dock renders null, identical to "no active runs". No log, no indicator. RunsPage already handles this; the dock is the outlier. components/WorkflowDock.tsx
I6 silent-failure ChatPage ignores error for all three useEntity calls (project, conversations, messages). A failed listConversations looks like a fresh, empty project. Surface an error state like RunsPage does. routes/ChatPage.tsx
I7 comment Comment "setting customTools forces noTools:'builtin'" is factually wrong — per Pi SDK, customTools is additive; Archon pairs it with noTools:'builtin' explicitly. Misleading for future maintainers (appears twice). pi/options-translator.ts (buildDefaultPiTools JSDoc), pi/provider.ts (inline)
I8 comment resume help text says "mark it resumable" but resumeWorkflow() performs no state mutation (validation only). Model-visible text shapes agent behavior — say "validate eligibility; does not change state or trigger execution." Same wording leaks into the test description. manage-run-tool.ts (HELP_OVERVIEW, HELP_BY_ACTION.resume) + manage-run-tool.test.ts
I9 docs, comment agent-chat-spike.md header still says "exploration / mapping only — not building yet … no chat code is written yet" while this same PR ships ChatPage, WorkflowDock, the manage_run tool, etc. Update status to "shipped". experiments/console/agent-chat-spike.md
I10 docs adding-a-community-provider.md YOUR_CAPABILITIES template is missing nativeTools: false (new) and agents: false — both required by ProviderCapabilities; a provider scaffolded from it won't compile. docs-web/.../contributing/adding-a-community-provider.md:~64
I11 docs agent-chat-spike.md claims manage_run has "MCP (Claude/Codex/OpenCode) + Pi" backends, but Codex/OpenCode are nativeTools: false (deferred to Wave 2). Correct the registration-backends bullet. experiments/console/agent-chat-spike.md §6
I12 pr-test No provider-level wiring test that nativeTools in SendQueryOptions actually reaches the SDK call site (Claude mcpServers.archon / Pi customTools). The capability flag and the converter are each tested, but the seam between them is assumed. Highest-value missing test (1 per provider). claude/provider.test.ts, pi/provider.test.ts
I13 pr-test, silent-failure Write-path cross-project blocking is tested only for get, not any destructive action (Gap 1); and the error-handling test asserts the return string but not that log.error fired (H1) — a refactor dropping the log would stay green. manage-run-tool.test.ts

Suggestions (S)

Types (type-design-analyzer — verdict ADEQUATE, 6/10):

  • S1 inputSchema: Record<string, unknown> erases the "must be an object schema" invariant — malformed schemas compile and only fail at registration time. Introduce a narrow NativeToolInputSchema ({ type:'object', properties, required? }) and apply with satisfies. (types.ts:1207)
  • S2 ACTIONS const array and the INPUT_SCHEMA.action.enum are two sources of truth that must be hand-synced. Derive the enum from ACTIONS. (manage-run-tool.ts)
  • S3 getScopedRun returns WorkflowRun | string (error-as-string sentinel). A { ok: true; run } | { ok: false; message } discriminant is safer. (low priority)

Tests (pr-test-analyzer — verdict ADEQUATE):

  • S4 approve interactive_loop branch and reject rework (cancelled:false / maxAttemptsReached) branches untested — they own distinct user-visible messages (Gaps 2,3, rating 7).
  • S5 cache.ts revalidate/invalidate stale-while-revalidate invariant has zero tests, despite being the PR's other core correctness claim. Pure-TS (non-hook) exports are unit-testable without React (Gap 7).
  • S6 Orchestrator injection guard negative case (codebase_id === null → not injected) untested (Gap 5).

Error handling / UX (silent-failure-hunter):

  • S7 SSE onerror in all three hooks uses console.warn (invisible in prod) and never surfaces a permanently-closed stream to the user — stale-while-revalidate then serves the last value forever with no indicator. (lib/sse.ts)
  • S8 getScopedRun "no run found … in this project" is indistinguishable from cross-project denial; consider "not found (or not accessible in this project)". (M1)

Simplifications (code-simplifier — 6 found, all functionality-preserving):

  • S9 Extract the 7× repeated typeof input.x === 'string' ? … : '' into a private str(input, key) helper. (manage-run-tool.ts)
  • S10 ensureLoad + revalidate share an identical .then/.catch/.finally chain → extract runLoader(key, loader). (cache.ts:32-101)
  • S11 Duplicated isString guard across both converters → package-internal shared helper (stays within @archon/providers, no boundary break).
  • S12 WorkingIndicator activity !== null && !== undefined && !== '' → truthiness check. Note: simplifier suggested extracting the common converter skeleton is not advised — different packages/libraries; current parallel-but-explicit code is the right KISS tradeoff.

Docs (docs-impact — low urgency):

  • S13 CLAUDE.md orchestrator + @archon/providers sections don't mention manage_run / NativeTool / SendQueryOptions.nativeTools. Add a line each.
  • S14 ai-assistants.md capability matrices for Pi and OpenCode lack a nativeTools row.

Strengths

  • Project scoping is correct and completegetScopedRun blocks cross-project read and write access, with a log.warn breadcrumb; not silently swallowed. The cross-project test is behavioral, not implementation-coupled.
  • Confirm-gate design is sound — destructive actions gated on confirm: true; resume correctly excluded per the "recoverable operations" carve-out.
  • Error containment is right — the handler's outer catch logs { err, action, codebaseId } via Pino then returns text; never throws into the agent loop.
  • Converters fail-fast (throw on non-object / empty-enum / unsupported types) rather than silently degrading — matches the project's Fail-Fast principle; the tests assert throws, not no-ops.
  • Pi customTools correctly re-supplies the default tool set so the agent doesn't lose bash/read/edit/write.
  • Console isolation preserved — no new imports of production web modules; stale-while-revalidate fix correctly eliminates the flicker.
  • Excellent design JSDoc on buildManageRunTool, the NativeTool contract, and getScopedRun — captures the "providers never import core" intent.
  • Test isolation handledmanage-run-tool.test.ts runs in its own bun test invocation per the documented mock-pollution rule.

Documentation Issues (consolidated)

  • agent-chat-spike.md — stale "not building yet" status + wrong Codex/OpenCode backend claim (I9, I11)
  • adding-a-community-provider.md — template missing nativeTools/agents, won't compile (I10)
  • CLAUDE.md + ai-assistants.md — no mention of the new native-tool surface (S13, S14)

Verdict: NEEDS FIXES

Resolve the merge conflict, fix C1 (the short-ID bug breaks the tool's core loop), and address the error-handling/observability items I1, I2, I5, I6 plus the doc contradictions I9, I10, I11 and the cheap consensus fix I4. The type/test suggestions (S1–S6) are worthwhile follow-ups but needn't block. No security regressions found — scoping, confirm-gate, and capability-gated injection are correctly implemented.

Recommended order

  1. Rebase onto dev (resolve conflict).
  2. C1 — full-ID or prefix-match (blocks the agent loop).
  3. I1, I2 — error context/logging at the dispatch & schema-conversion seams.
  4. I4, I7, I8, I9, I10, I11 — one-line correctness + doc fixes.
  5. I5, I6 — surface useEntity errors in the console UI.
  6. I12, I13 — close the test seams.
  7. Consider S1/S2 (type tightening) and S5 (cache.ts tests) as fast-follows.

Advisory review — no files were modified.

Wirasm added 13 commits June 2, 2026 13:18
Runs paused on a human gate (approval node or an agent question) now
surface in an attention banner pinned above the runs feed, on both the
All-projects and project-scoped views. It stays visible regardless of the
active filter or scroll position until the user acts on it or dismisses it.

The banner reuses the existing ApprovalContext + ApprovalPanel surfaces, so
the action flow is identical to the inline paused card. While a run is shown
in the banner its feed card collapses to a pointer to avoid a duplicate live
panel; dismissing the banner restores the inline panel so the run can still
be actioned from the feed.

Pending runs are derived from the unfiltered run set so they surface even
when the feed is filtered or searched. Dismiss is session-only — the run is
still paused, so it re-surfaces on reload — and dismissed ids are pruned once
a run leaves the paused state so it re-surfaces if it pauses again.
Design/exploration doc for promoting the console toward a project-scoped
agent chat (tab swap with the Run view). Captures the reuse map vs. the old
chat UI, the project-scoped chat assembly (mostly existing console pieces +
3 new conversation skills), and the run-management tooling direction.

Key finding: no single tool mechanism spans all providers. MCP covers
Claude/Codex/OpenCode; in-process custom tools cover Claude + Pi only. Since
Pi becomes the main runner ~June 2026 and has no MCP, the tool handler is
built once behind a provider-neutral spec with two registration backends
(MCP + Pi customTools). Rollout leads with the in-process providers
(Claude + Pi), then served MCP for Codex/OpenCode.

Mapping only — no chat code yet. Project memory deferred.
Add a Runs | Chat tab swap under each project. The chat is plain
conversational AI assembled from existing console pieces: messages load via
useEntity(K.messages), a new useConversationSSE keeps them live by
invalidating on text/tool events (and surfaces conversation_lock to disable
the composer), and rendering reuses MessageItem/ToolCallItem/StreamCard. A new
skills/conversations.ts (create/list/send) is the sanctioned home for the
conversation concept the console previously hid in startRun.ts.

The console's invalidate-refetch SSE model removes the need for the old chat's
client-side streaming state machine. No backend changes; scoped entirely to
packages/web/src/experiments/console. Run management by the agent (manage_run
tool) is the planned follow-on.

Also records the resolved tooling decisions (tiered confirm, JSON-Schema
canonical schema, deferred Wave-2 transport) in agent-chat-spike.md §10.
Drive the "awaiting reply" state from message content + the send action
instead of the per-conversation SSE lock event. The recovery poll is no
longer gated on a `conversation_lock` event that the cross-origin dev SSE can
drop or never deliver — so the composer can't stick on "Waiting…" and replies
are polled for regardless of SSE health. Adds a settle window (re-enable once
the trailing message is a stable assistant reply) and a hard cap so a turn
that never produces a reply can't disable the composer forever. SSE remains a
best-effort accelerator for snappy live updates.

Note: live updates still depend on the shared useEntity refetch path, which has
a separate reliability issue under investigation (cache refetch can stall after
the first invalidation while mounted). Persisted state + on-mount rendering are
correct; navigating/reloading always shows the latest.
…ernalStore)

Two compounding bugs made the console's live updates fail — a refetched value
landed in the cache but never appeared on screen until a remount (navigation /
reload), so the chat reply never streamed in and the composer stuck on
"Waiting…".

1. invalidate() deleted the cache key before refetching, so every SSE/poll tick
   flashed the subscriber to undefined/empty before the data returned. At poll
   cadence the list flickered between full and empty and the derived
   busy/settle/scroll logic thrashed on the empty frames. Replace the
   delete-then-load with revalidate(): refetch in place and swap on resolve,
   keeping the previous value visible (stale-while-revalidate). Unsubscribed
   keys still drop so a later mount fetches fresh.

2. useEntity subscribed with a manual useState(n => n + 1), which can commit a
   stale render against an externally-mutated store — the render function ran
   with the fresh value but the DOM kept an older tree. Switch to
   useSyncExternalStore for tear-free, consistent reads.

Verified end-to-end in the browser: agent replies now render live and the
composer re-enables without navigating. Fixes the foundation the chat working
indicator and workflow dock depend on.
Three UX passes now that live updates work:

1. Chat declutter — ChatStream hides raw tool-call cards and framework/empty
   rows by default; a single "Agent is working · <tool>" pill stands in while a
   turn is active (WorkingIndicator), toggling the inline trace on click.

2. Workflow dock — a pinned tray below the chat stream listing the project's
   in-progress runs (WorkflowDock): hidden at 0, the richer card at 1, a
   collapsible "Running workflows (N) ▸" with status dots at 2+. Each card links
   to the run-detail logs. Live via the dashboard SSE.

3. Run-detail lifecycle lines — the log view now bookends the stream with
   "▶ Workflow <name> started" and a terminal summary
   "✓ Completed in <duration> · <cost>" (RunLifecycle), so the logs read as a
   complete story rather than a tool dump. Node transitions already render.

Verified live: chat shows clean prose with the working pill; run log shows the
started + finished lines.
Paused-on-approval runs in the dock now render the agent's question +
Approve/Reject inline (reusing ApprovalContext + ApprovalPanel, same
approve/reject + comment-injection as elsewhere), so you can resolve a gate
without leaving the chat. Approval cards are always shown (a paused run needs
you); plain running runs keep the 1-vs-N collapse. Resolving invalidates the
runs cache so the card clears live.
…nly)

Give the project-scoped chat agent a native tool to see this project's
workflow runs, the first step of agent-as-operator.

Cross-provider plumbing:
- New `NativeTool` on the provider contract ({name, description, JSON-Schema
  inputSchema, in-process handler}) + a `nativeTools` capability flag. The
  handler closes over live context, so @archon/providers never imports core —
  the tool crosses the boundary as data on SendQueryOptions.
- Claude adapter (native-tools.ts): a JSON-Schema→Zod-shape converter (narrow,
  fail-fast) feeding createSdkMcpServer()/tool(); registered in sendQuery as an
  in-process `archon` MCP server (mcp__archon__*, alwaysLoad for Haiku),
  mirroring the file-based mcp branch. Adds zod to @archon/providers.
- Capabilities: nativeTools = true for claude + pi, false for codex/opencode/
  copilot (no in-process tool path).

The tool (core/orchestrator/manage-run-tool.ts):
- manage_run with read actions list/get — list shows recent runs (id, workflow,
  status, step); get shows one run's detail. Errors caught, returned as text.
- Injected on the direct-chat path only when a codebase is scoped and the
  provider supports native tools.

Verified live: a project chat asks "list the runs" → agent calls
mcp__archon__manage_run → handler lists 20 runs → agent answers with the count
and the most-recent run's status. Start + the Pi adapter + destructive writes
are follow-ups.
Adds the `start` action to manage_run: the agent resolves a workflow by name
(4-tier resolveWorkflowName) and dispatches it via dispatchBackgroundWorkflow
(its own isolated worktree, fire-and-forget). The dispatch context (platform,
conversation, cwd, codebase, available workflows, userId) is assembled in the
orchestrator and passed as a `startWorkflow` callback so the tool stays
decoupled from the dispatch machinery. Unknown/ambiguous names return a
friendly text result; start is rejected when the dispatch context is absent.

Verified live: a project chat asks the agent to start archon-assist → agent
calls manage_run({action:'start', workflow, message}) → workflow dispatches,
runs in a worktree, and completes; the agent reports it started.
Pi adapter for the provider-neutral NativeTool, completing Wave 1 across both
in-process providers (Claude + Pi, the current and ~June-2026 primary).

- native-tools.ts: JSON-Schema→TypeBox converter (same narrow string/enum
  subset, fail-fast) feeding `defineTool`; each tool's text handler result
  becomes the ToolDefinition's content.
- provider.ts: merge the native ToolDefinitions into `customTools`. Because
  setting customTools forces noTools:'builtin' (which drops Pi's defaults), the
  base tool set is re-supplied via the new buildDefaultPiTools() helper, so the
  agent keeps bash/read/edit/write alongside manage_run.
- Lazy-loaded via the existing dynamic-import block (avoids Pi's config.js
  eager-load).

Type-checks across the repo; the tool + core handler are identical to the
verified Claude path. Live Pi verification needs Pi configured as the assistant
(not set up in this dev env).
…e, lifecycle writes, Pi

Expand the manage_run native tool from read-only (list/get/start) to the full
run-management surface, with progressive disclosure and project scoping.

- help: action returns the action catalog; help + subtool returns per-action
  detail, so the model's tool surface stays small (progressive disclosure).
- Lifecycle writes resume/cancel/abandon/approve/reject go through the shared
  core workflow-operations functions — identical semantics to the CLI and
  command-handler (state mutation; no server coupling).
- Tiered confirmation: destructive actions (cancel/abandon/approve/reject)
  require confirm:true; without it the tool returns a preview and asks the
  agent to confirm with the user (no mid-turn UI primitive required).
- getScopedRun: every by-id action verifies the run belongs to the chat's
  project, so an agent scoped to project A cannot read or mutate project B's
  runs (generalizes the earlier get-only scope concern).
- Schema converters (Claude Zod + Pi TypeBox) gain boolean support for the
  confirm param.

Verified live on Claude and Pi (minimax MiniMax-M2.7): both invoke the tool,
help works, and the boolean confirm round-trips through TypeBox as a real
boolean. Unit tests: handler dispatch/scope/confirm-gate (18) plus converter
fail-fast and boolean-acceptance (Claude 4, Pi 4).
…ility

These failed at the branch HEAD; the prior commits were only validated with
type-check and lint, not the test suite.

- claude/provider.test.ts, codex/provider.test.ts, registry.test.ts asserted
  the full ProviderCapabilities set via toEqual but were never updated when
  nativeTools was added to the interface — add nativeTools to the expectations
  (claude true, codex false, mock false).
- pi/provider.test.ts: the @earendil-works/pi-coding-agent mock was missing
  defineTool, a value import pulled in by ./native-tools (added when manage_run
  was wired into Pi). Bun enforces the binding for value imports, so every Pi
  sendQuery threw "Export named 'defineTool' not found" — 56 tests failing.
  Add a defineTool stub to the mock.
@Wirasm Wirasm force-pushed the feat/console-promotion branch from 5f890d8 to 0217bdf Compare June 2, 2026 10:21
@Wirasm

Wirasm commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

PR Review Summary — multi-agent pass

Six specialized agents reviewed the diff (base origin/dev): code quality, docs impact, tests, comments, error handling, type design. Findings below use C#/I#/S# prefixes (not #N, to avoid GitHub auto-linking unrelated issues). The two Critical items were hand-verified against the source.

Branch state: MERGEABLE, 1 commit behind dev (the Better Auth feature 69bf4d00). That commit touches auth/, db/users, and unrelated schemas — zero file overlap with this PR — so no rebase is required for a clean merge.


Critical Issues (2) — block merge

ID Issue Location
C1 By-id actions silently fail on short IDs. getScopedRungetWorkflowRun does WHERE id = $1 (exact UUID match), but list/get output only ever prints r.id.slice(0, 8) and the input schema + help advertise "Accepts the short (8-char) or full id." The full UUID is never surfaced to the model, so every by-id action (get/resume/cancel/abandon/approve/reject) driven from list output returns "no run found." Verified. packages/core/src/orchestrator/manage-run-tool.ts (getScopedRun, INPUT_SCHEMA, handleList) + db/workflows.ts:165
C2 import { z } from 'zod' violates the repo-wide Zod convention ("Import z from @hono/zod-openapi, not from zod directly") and adds a second direct "zod": "^4.0.0" dep to packages/providers/package.json, risking zod-instance drift. Verified. packages/providers/src/claude/native-tools.ts:1, packages/providers/package.json:36

C1 fix: add prefix matching to getScopedRun (WHERE id LIKE $1 || '%' with a length guard + ambiguity check), or surface full IDs and drop the "short (8-char)" claim from the schema/help.
C2 fix: import z/ZodTypeAny from @hono/zod-openapi and drop the direct dep, or record an explicit @archon/providers exception in CLAUDE.md (the zod here only feeds the Claude SDK tool() builder, never OpenAPI generation — but the convention is currently absolute).


Important Issues (5) — address before merge

ID Issue Location
I1 resume action never resumes. resumeWorkflow only validates eligibility + returns the run (no mutation, no re-exec), yet help says "mark a failed/paused run resumable" and the response says "ready to resume." The CLI workflow resume does re-run — so the agent action diverges from the CLI semantics it claims to mirror. Reword to "validate eligibility," and confirm the no-op-resume is the intended scope. manage-run-tool.ts handleWrite case resume, HELP_*
I2 cancel help overpromises. cancel/abandon both call abandonWorkflow (DB status → cancelled); a live worktree process is not stopped. Help says "stop a running run." Reword to "marks the run cancelled — a running process may continue until its current step completes." manage-run-tool.ts HELP_*, handleWrite
I3 ChatPage ignores error from all three useEntity calls (conversations/messages/project). Revalidation failures (network blip, server restart) leave stale/empty data with no user feedback, and can strand the composer's busy/settle state. Surface the error. packages/web/src/experiments/console/routes/ChatPage.tsx
I4 startWorkflow dispatch failure logs without context. When dispatchBackgroundWorkflow throws, the outer manage_run catch returns it as text (good) but logs only {action, codebaseId} — no workflow name. Add a local catch with workflowName in the log context. packages/core/src/orchestrator/orchestrator-agent.ts startWorkflow closure
I5 Confirm-gate test coverage is partial. Only cancel-without-confirm is tested; abandon/approve/reject previews are untested, as are the approve interactive_loop branch, the reject cancelled:false rework branch, and the orchestrator injection guard (codebase_id != null && capabilities.nativeTools). A future edit dropping an action from DESTRUCTIVE_ACTIONS would pass CI. Add the three preview tests + two branch tests. manage-run-tool.test.ts, orchestrator-agent.test.ts

Suggestions (6)

ID Suggestion Location
S1 Confirm preview says "This is not undoable" for all destructive actions, including approve/reject — misleading, since approve is the routine continuation path. Split into cancel/abandon ("irreversible") vs approve/reject ("a human gate; confirm with the user"). manage-run-tool.ts handleWrite preview
S2 Pass the verified run.id (full UUID from getScopedRun) into resumeWorkflow/abandonWorkflow/approveWorkflow/rejectWorkflow instead of the raw runId — removes a redundant re-fetch and the small TOCTOU window. manage-run-tool.ts handleWrite
S3 Pi defineTool hardcodes label: 'Manage runs' for every native tool — a latent bug when a 2nd tool is added. Derive from spec.name or add an optional label to NativeTool. packages/providers/src/community/pi/native-tools.ts
S4 Type design: NativeTool.inputSchema: Record<string, unknown> is structurally untyped (converters throw at sendQuery time, not construction); (input.action as Action) is an unsound cast where a const-array isAction() guard would suffice. Introduce a minimal NativeToolInputSchema (type:'object', restricted leaf types). packages/providers/src/types.ts, manage-run-tool.ts
S5 Add runId to the manage_run.failed catch-log context (currently {action, codebaseId} only — by-id failures have no run trace). manage-run-tool.ts handler catch
S6 Stale comments: cache.ts says "~120 LOC" (now 196 and growing — drop the number); useDashboardSSE says "Mount once at a high level (RunsPage)" but WorkflowDock (via ChatPage) now also mounts it; NativeTool JSDoc says "handler must catch its own errors" when the buildManageRunTool wrapper is the actual safety net. console/store/cache.ts, console/lib/sse.ts, providers/src/types.ts

Documentation

  • console-open-questions.md is stale against its own PR: §2 says manage_run is "still unbuilt / highest-value next build" (this PR builds Wave 1), and §5's "approvals from chat" question is answered by the inline-dock-approvals commit. Update both.
  • experiments/console/README.md routes list omits the new /console/p/:projectId/chat.
  • CLAUDE.md / ai-assistants.md: GET /api/providers capabilities now includes nativeTools; the Pi capability table has no row for it. Worth a one-line note that nativeTools: true (Claude + Pi) gates the manage_run tool. (Also pre-existing: the @archon/providers description omits community/opencode/ and community/copilot/.)

Strengths

  • Cross-project isolation is correct and testedgetScopedRun blocks codebase_id mismatch, logs at warn with both ids, and has a real negative-case test.
  • Confirm-gate is structurally sound — fires for all four DESTRUCTIVE_ACTIONS before any mutation; start correctly omits it.
  • Schema converters fail fast — both Claude (Zod) and Pi (TypeBox) throw on non-object / unsupported-type / empty-enum schemas, with symmetric tests.
  • Pi default-tool re-supply is correct — avoids silently stripping Pi's built-in tools when customTools is set.
  • "Catch-all → return text" is a documented, intentional, tested contract (the dedicated "thrown DB error returned as text" test), not an accidental swallow.
  • useSyncExternalStore + stale-while-revalidate is a proper fix for the flicker/stale-render bug, with a comment explaining the prior failure mode.
  • The fix(providers) commit's Pi defineTool mock repair fixed ~56 pre-existing failing tests.

Verdict: NEEDS FIXES

C1 (broken by-id action surface) and C2 (Zod convention) before merge; I1–I5 are quick and high-value (mostly wording + a few tests + two error-surfacing fixes). Architecture (provider-neutral NativeTool, dependency-inversion via closure, capability gating, project scoping) is sound — the blockers are an ID-matching gap and a convention violation, both small.

Note: the silent-failure-hunter's flagged empty catch {} in orchestrator-agent.ts handleUpdateProject is pre-existing code outside this diff — worth a follow-up but not a blocker here.

🤖 Multi-agent review (code · docs · tests · comments · errors · types)

…anage_run + chat

Critical:
- manage_run by-id actions now resolve the short ids the tool actually shows.
  getScopedRun looks runs up by id prefix scoped to the codebase (new
  findWorkflowRunsByIdPrefix), so the 8-char ids in list/get output work for
  get/resume/cancel/abandon/approve/reject; ambiguous prefixes are reported and
  project scoping is enforced in the query. Writes pass the verified full id to
  the operations.
- Document the @archon/providers direct-zod import exception in CLAUDE.md and at
  the import site (Claude SDK tool() needs a Zod shape; this SDK-deps-only leaf
  package must not pull in Hono via @hono/zod-openapi).

Important:
- resume help/response no longer claim it "marks resumable" — it only validates
  eligibility and does not restart the run.
- cancel help no longer claims it "stops" a running process — it marks the run
  cancelled; an executing step may finish first.
- ChatPage surfaces conversation/message (re)load errors instead of silently
  showing stale or empty data.
- startWorkflow logs dispatch failures with the workflow name and returns a
  clear error to the agent.
- Completed confirm-gate test coverage: preview-without-confirm for
  abandon/approve/reject, approve interactive-loop branch, reject rework branch,
  and an ambiguous-prefix case.

Suggestions:
- Differentiated confirm-preview wording: cancel/abandon (irreversible) vs
  approve/reject (human gate).
- Dropped the unsound `as Action` cast; added runId to the failure log context.
- Pi native-tool label derived from the tool name, not hardcoded "Manage runs".
- Corrected NativeTool JSDoc, cache.ts LOC note, and useDashboardSSE mount note.

Docs:
- console-open-questions.md §2/§5 + priority list (manage_run Wave 1 and inline
  approvals shipped in this PR); console README /chat route; ai-assistants Pi
  capability row for in-process native tools.

Deferred (proportionality): a typed NativeToolInputSchema (runtime validation
already covers the single-tool surface) and an orchestrator-level injection
integration test (two-clause guard whose halves are already covered).
@Wirasm

Wirasm commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

Review findings addressed — pushed a2b21898

Worked through the multi-agent review; everything actionable is fixed except two deliberate deferrals (below). Validation: type-check ✅ (10 pkgs), lint ✅, providers suite ✅, core suite ✅, manage-run-tool.test.ts 24/24 ✅.

Critical

  • C1manage_run by-id actions now resolve the short ids the tool actually shows. New findWorkflowRunsByIdPrefix(prefix, codebaseId) does a project-scoped prefix lookup; getScopedRun uses it (ambiguous prefixes reported, project isolation enforced in the query), and writes pass the verified full id to the operations.
  • C2 — Documented the @archon/providers direct-zod exception in CLAUDE.md + at the import site (Claude SDK tool() needs a Zod shape; this SDK-deps-only leaf must not pull in Hono). Kept the import rather than dragging @hono/zod-openapi into the leaf package.

Important

  • I1resume help/response no longer claim "marks resumable"; it validates eligibility and does not restart the run.
  • I2cancel help no longer claims it "stops" a process; it marks the run cancelled (an executing step may finish first).
  • I3ChatPage surfaces conversation/message (re)load errors instead of silently showing stale/empty data.
  • I4startWorkflow logs dispatch failures with the workflow name and returns a clear error.
  • I5 — Confirm-gate coverage completed: preview-without-confirm for abandon/approve/reject, approve interactive-loop branch, reject rework branch, ambiguous-prefix case.

Suggestions

  • S1 confirm-preview wording differentiated (cancel/abandon irreversible vs approve/reject human gate) · S2 writes use verified run.id · S3 Pi label derived from tool name · S4 dropped the unsound as Action cast · S5 runId added to the failure log · S6 corrected NativeTool JSDoc, cache.ts LOC note, useDashboardSSE mount note.

Docsconsole-open-questions.md §2/§5 + priority list (manage_run Wave 1 + inline approvals shipped here); console README /chat route; ai-assistants.md Pi row for in-process native tools; CLAUDE.md GET /api/providers nativeTools note.

Deferred (proportionality)

  • A typed NativeToolInputSchema (S4 type-design) — runtime validation already covers the single-tool surface; a partially-constraining type adds churn for marginal gain.
  • An orchestrator-level injection integration test (I5 Gap 4) — the guard is two trivial clauses whose halves are already covered (buildManageRunTool unit-tested; capability flags tested in provider tests); a full orchestrator integration test is high-cost/brittle for the value.

Note: the handleUpdateProject empty-catch the error hunter flagged is pre-existing code outside this diff — left for a separate follow-up.

@Wirasm Wirasm merged commit c92cbef into dev Jun 4, 2026
4 checks passed
Wirasm added a commit that referenced this pull request Jun 4, 2026
…l providers

Now that #1819's native manage_run tool is in dev, the CLI run-management
pointer (buildRunManagementSection) is redundant for Claude/Pi and would steer
them onto a bash path needing `archon` on PATH. Gate it: the orchestrator
appends it ONLY for project-scoped chats on providers WITHOUT nativeTools
(Codex/OpenCode/Copilot); Claude/Pi get the native tool and are nudged to it.

- Move the append out of the pure buildOrchestratorSystemAppend into
  orchestrator-agent.ts next to the nativeTools check; look caps up lazily
  (scoped chats only).
- prompt-builder tests: assert buildOrchestratorSystemAppend does NOT include
  the section (orchestrator gates it); keep the buildRunManagementSection
  content test.
- orchestrator-agent test: add buildRunManagementSection to the ./prompt-builder
  mock (the new export it now calls — otherwise the full --filter run replaces
  the module and the call is undefined).
- Document the console-dock limitation: detached CLI runs appear in the dock
  (listed by project) but don't stream live and post no chat card, because they
  run out-of-process — use the native tool for the live experience.
Wirasm added a commit that referenced this pull request Jun 4, 2026
…l providers

Now that #1819's native manage_run tool is in dev, the CLI run-management
pointer (buildRunManagementSection) is redundant for Claude/Pi and would steer
them onto a bash path needing `archon` on PATH. Gate it: the orchestrator
appends it ONLY for project-scoped chats on providers WITHOUT nativeTools
(Codex/OpenCode/Copilot); Claude/Pi get the native tool and are nudged to it.

- Move the append out of the pure buildOrchestratorSystemAppend into
  orchestrator-agent.ts next to the nativeTools check; look caps up lazily
  (scoped chats only).
- prompt-builder tests: assert buildOrchestratorSystemAppend does NOT include
  the section (orchestrator gates it); keep the buildRunManagementSection
  content test.
- orchestrator-agent test: add buildRunManagementSection to the ./prompt-builder
  mock (the new export it now calls — otherwise the full --filter run replaces
  the module and the call is undefined).
- Document the console-dock limitation: detached CLI runs appear in the dock
  (listed by project) but don't stream live and post no chat card, because they
  run out-of-process — use the native tool for the live experience.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant