Skip to content

AgentRuntime: route per-call tool approval requests to messaging channels #2616

@alexey-pelykh

Description

@alexey-pelykh

Problem

Today's headless modes for the 4 CLI runtimes (claude --print, gemini --output-format stream-json, codex exec --json, opencode run --format json) all auto-run any tool the agent invokes — there's no permission interception, and no way for an operator running RemoteClaw via chat to approve or deny per-call.

When a runtime is migrated to its richer permission-emitting mode (Codex app-server, OpenCode serve/acp, Gemini --acp, Claude --input-format stream-json with control_request), middleware needs to:

  1. Capture the permission request from the CLI subprocess
  2. Surface it as a chat message with platform-native UI (inline keyboard / Block Kit / ActionRow / quick replies)
  3. Wait for user response (seconds to minutes)
  4. Route the decision back to the subprocess via its native API

Acceptance criteria

Schema (in src/middleware/types.ts)

  • New AgentApprovalRequestEvent discriminant on the AgentEvent union:

    export type AgentApprovalRequestEvent = {
      type: "approval_request";
      approvalId: string;              // ULID, ≤ 26 chars (fits Telegram 64-byte callback_data)
      approvalSource: "runtime_tool";  // discriminant; future values reserved
      subject:
        | { kind: "tool_use"; toolName: string; toolId: string;
            rawInput: Record<string, unknown>;
            displayArgs: Record<string, unknown> }   // pre-redacted at runtime layer
        | { kind: "shell_exec"; argv: string[]; cwd?: string;
            commandText: string; displayCommandText: string };
      options: ReadonlyArray<{
        optionId: string;          // stable, ≤ 32 bytes
        label: string;             // ≤ 20 chars (WhatsApp button-title budget)
        nativeDecision: string;    // pass-through to backend
        role?: "primary" | "secondary" | "destructive" | "cancel";
      }>;
      timeoutMs?: number;
    };
  • New resolution channel on AgentExecuteParams:

    resolvedApprovals?: ReadonlyArray<{
      approvalId: string;
      optionId: string;
      nativeDecision: string;
    }>;

Per-runtime translation

Each runtime adapter (claude.ts / gemini.ts / codex.ts / opencode.ts) translates its native permission-request shape into AgentApprovalRequestEvent AND consumes resolvedApprovals to send the response back via the native API:

  • Claude: control_request → emit; control_response ← consume (requires --input-format stream-json mode)
  • Gemini: ACP requestPermission (per-session JSON-RPC bidirectional) — requires --acp mode
  • Codex: item/commandExecution/requestApproval and item/fileChange/requestApproval — requires app-server mode
  • OpenCode: permission.asked SSE → emit; POST /permission/:id/reply ← consume — requires serve mode

Backend-native decision vocabulary preserved verbatim in options[].nativeDecision (Codex 6 decisions including AcceptForSession / AcceptWithExecpolicyAmendment / ApplyNetworkPolicyAmendment, OpenCode 3 once/always/reject, ACP 4 allow_once/allow_always/reject_once/reject_always, Claude 2 allow/deny). No flattening to lowest common denominator.

Channel-adapter contract (in extensions/{telegram,slack,discord,whatsapp,...}/)

  • New channel-adapter interface method to render an AgentApprovalRequestEvent as a platform-native interactive message with options[] mapped to buttons / quick-replies
  • Correlation: approvalId carried in callback_data / action_id / custom_id / button id
  • Channel-adapter validates decidedBy ∈ ChannelMessage.authorizedSenders BEFORE forwarding decision back into AgentExecuteParams.resolvedApprovals
  • Webhook signature verification non-negotiable per channel (Slack X-Slack-Signature, Telegram secret_token, Discord Ed25519)
  • Plain-text fallback for channels without interactive components (SMS / iMessage / Signal): /approve {id} and /deny {id} patterns

Security must-includes

  • Allow-always cardinality bound: any "always" decision (Codex AcceptForSession, OpenCode always, ACP allow_always) MUST be scoped to (toolName, argsHash), NEVER toolName alone — Cursor MCP exploit pattern prevention
  • Sensitive-data redaction at runtime layer (NOT channel adapter): runtime produces displayArgs separate from rawInput. Channel adapters receive pre-redacted display fields and never see raw secrets
  • Append-only audit log: every resolved approval logs {requestId, agentId, sessionKey, toolName, argsHash, decision, decidedBy, decidedAt, channel, decisionLatencyMs} to an append-only sink

Persistence

Persistence is fully on the CLI side. The CLI subprocess holds the pending approval in its own memory while blocked on the native API. No durable middleware approval store. If the subprocess dies mid-approval, the approval is lost; mitigation is operational (follow-up "agent restarted, please retry" channel message; no technical recovery).

Tests

  • Unit tests per runtime adapter: native-shape ↔ AgentApprovalRequestEvent round-trip
  • Unit tests per channel adapter: AgentApprovalRequestEvent → platform message round-trip + decision parsing back
  • Integration test: end-to-end approval flow for at least one runtime + one channel pair (recommend Claude + Telegram as the smallest viable scope)

Non-goals

  • Long-lived subprocess implementation (separate issue: the supervisor split — this work depends on it)
  • Persistence of pending approvals across subprocess restart (CLI side, no recovery)
  • Future approvalSource values (gateway_tool, mcp_elicitation) — schema slot reserved but not implemented

Dependencies

  • Long-lived subprocess supervisor (separate issue): the subprocess must stay alive across the chat round-trip (seconds to minutes). Per-execute ephemeral spawn cannot span that window.
  • Each runtime needs to be in approval-emitting mode (Codex app-server, OpenCode serve/acp, Gemini --acp, Claude --input-format stream-json) before its adapter changes ship.

Effort

5-10 days. Spec-first; lands per-runtime incrementally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions