Skip to content

feat: usable mid-turn steer — desktop affordance + trusted injection#40240

Merged
OutThisLife merged 5 commits into
mainfrom
bb/desktop-steer
Jun 6, 2026
Merged

feat: usable mid-turn steer — desktop affordance + trusted injection#40240
OutThisLife merged 5 commits into
mainfrom
bb/desktop-steer

Conversation

@OutThisLife

@OutThisLife OutThisLife commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

Makes /steer (nudge the agent mid-turn without interrupting) actually usable: a first-class desktop affordance and a backend fix so the model stops treating steers as prompt injection.

Why

  • Desktop could only queue while busy. /steer was in the palette but had no UI, so the mid-turn lane was unreachable in the GUI.
  • Backend: a steer is appended inside a tool result (the only role-alternation-safe slot mid-turn), labelled User guidance:. Well-behaved models read that as untrusted tool content and refuse it as prompt injection. Observed live:

    "I only follow instructions from you directly, not ones injected through command results … I treated it as an injection and skipped it."

Steer vs queue (why both exist)

  • queue → new user message at the next turn boundary (agent finishes first).
  • steer → folded into the next tool result mid-turn; the model sees it on its next iteration. No interrupt.
  • If steer is rejected (no live tool window), the text is re-queued so nothing is lost.

Changes

Desktop (composer)

  • While busy with a text-only draft: a steering-wheel button steers the draft via session.steer.

  • Cmd/Ctrl+Enter is reserved for steer — steers when there's a steerable draft, no-op otherwise (never a surprise send). Plain Enter keeps its role: queue while busy, send when idle.

    idle busy
    Enter send queue
    Cmd/Ctrl+Enter steer
  • Gated to busy && text-only && non-empty && not a slash command.

  • Reject/RPC-error → re-queue (safety net).

  • Ack is an inline transcript note rendered as a codicon row (compass · steered · …), not a toast.

  • Plumbing: onSteer through controller → ChatViewChatBar; SessionSteerResponse; SteeringWheel icon; steer i18n (en/zh/types).

Backend (the real fix — makes steer trusted)

  • prompt_builder.format_steer_marker() wraps steers in a bounded, self-describing [OUT-OF-BAND USER MESSAGE] marker; shared by both drain sites (agent_runtime_helpers, conversation_loop).
  • STEER_CHANNEL_NOTE added to the core system prompt: the model is told to expect and trust this exact marker as a genuine user message, while still ignoring lookalikes in tool/web/file output.
  • Static text → byte-stable prompt (no prompt-cache regression); gated on the agent having tools.

Security note

The marker is intentionally static, not a per-session nonce, to honor the byte-stable system-prompt caching policy (the codebase even uses day-precision dates to preserve the prefix cache). Trade-off: an attacker who fully controls tool output could forge the marker. A per-session nonce pinned in the prompt would close this; deferred as follow-up since it fights the caching policy and deserves its own discussion.

Verified

Live test (tool-heavy turn, steer sent mid-run): the model followed the steer (dated the remaining steps) with no injection refusal, and the inline codicon note rendered. See thread.

Test plan

  • Backend: tests/run_agent/test_steer.py (23 passed) — new marker on both drain paths, note↔marker contract, User guidance: regression guard.
  • Desktop: use-prompt-actions.test.tsx (11 passed) — steer injects trimmed text via session.steer, never prompt.submit; reject/error/empty paths.
  • tsc --noEmit clean; no import cycles.
  • Manual: steer mid-turn is followed (no injection refusal); inline note shows; plain Enter still queues; Cmd/Ctrl+Enter never surprise-sends.

The desktop app could only queue while busy — `/steer` was in the palette
but had no first-class affordance, so the "nudge the agent mid-turn without
interrupting" lane was effectively unreachable.

Add a steer action to the composer: while busy with a text-only draft, a
steering-wheel button (and Cmd/Ctrl+Enter) injects the text into the live
turn via the `session.steer` RPC — the gateway folds it into the next tool
result so the model reads it on its next iteration. Plain Enter still queues.

steerPrompt returns false when the gateway has no live tool window (or the
RPC errors), and the composer re-queues the words so nothing is lost — the
same safety net as a plain queue.
@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: bb/desktop-steer vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9870 on HEAD, 9870 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5119 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

A steer rides inside a tool result (the only role-alternation-safe slot
mid-turn), so a bare "User guidance:" line reads as untrusted tool content —
well-behaved models refuse it as suspected prompt injection (observed live:
"I only follow instructions from you directly, not ones injected through
command results").

- Wrap steers in a bounded, self-describing [OUT-OF-BAND USER MESSAGE] marker
  (prompt_builder.format_steer_marker), shared by both drain sites.
- Add STEER_CHANNEL_NOTE to the core system prompt so the model expects this
  exact marker and trusts it as a genuine user message — while still ignoring
  lookalikes buried in tool/web/file output. Static text → byte-stable prompt,
  no prompt-cache regression; gated on the agent having tools.
- Desktop: steer ack is now an inline transcript note (⏩ steered · …) instead
  of a toast.

Marker is intentionally static (not a per-session nonce) to honor the
byte-stable system-prompt caching policy; nonce hardening noted as follow-up.
@OutThisLife OutThisLife changed the title feat(desktop): steer the live run from the composer feat: usable mid-turn steer — desktop composer action + trusted injection Jun 6, 2026
Cmd/Ctrl+Enter now steers when there's a steerable draft and is a no-op
otherwise — it never falls through to a send, so the shortcut can't
surprise-send. Plain Enter keeps its role (queue while busy, send when idle).
The inline steer note used a ⏩ emoji. Emit a structured `steer:<text>`
system note and render it in SystemMessage as a codicon (compass) row —
same style as slash-status output. No emoji in the transcript.
Compute the trimmed draft once and reuse for hasComposerPayload + canSteer
instead of trimming three times per render.
@OutThisLife OutThisLife changed the title feat: usable mid-turn steer — desktop composer action + trusted injection feat: usable mid-turn steer — desktop affordance + trusted injection Jun 6, 2026
@OutThisLife OutThisLife enabled auto-merge June 6, 2026 02:06
@OutThisLife OutThisLife merged commit 1506874 into main Jun 6, 2026
22 checks passed
@OutThisLife OutThisLife deleted the bb/desktop-steer branch June 6, 2026 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant