feat: usable mid-turn steer — desktop affordance + trusted injection#40240
Merged
Conversation
The desktop app could only queue while busy — `/steer` was in the palette but had no first-class affordance, so the "nudge the agent mid-turn without interrupting" lane was effectively unreachable. Add a steer action to the composer: while busy with a text-only draft, a steering-wheel button (and Cmd/Ctrl+Enter) injects the text into the live turn via the `session.steer` RPC — the gateway folds it into the next tool result so the model reads it on its next iteration. Plain Enter still queues. steerPrompt returns false when the gateway has no live tool window (or the RPC errors), and the composer re-queues the words so nothing is lost — the same safety net as a plain queue.
Contributor
🔎 Lint report:
|
A steer rides inside a tool result (the only role-alternation-safe slot mid-turn), so a bare "User guidance:" line reads as untrusted tool content — well-behaved models refuse it as suspected prompt injection (observed live: "I only follow instructions from you directly, not ones injected through command results"). - Wrap steers in a bounded, self-describing [OUT-OF-BAND USER MESSAGE] marker (prompt_builder.format_steer_marker), shared by both drain sites. - Add STEER_CHANNEL_NOTE to the core system prompt so the model expects this exact marker and trusts it as a genuine user message — while still ignoring lookalikes buried in tool/web/file output. Static text → byte-stable prompt, no prompt-cache regression; gated on the agent having tools. - Desktop: steer ack is now an inline transcript note (⏩ steered · …) instead of a toast. Marker is intentionally static (not a per-session nonce) to honor the byte-stable system-prompt caching policy; nonce hardening noted as follow-up.
Cmd/Ctrl+Enter now steers when there's a steerable draft and is a no-op otherwise — it never falls through to a send, so the shortcut can't surprise-send. Plain Enter keeps its role (queue while busy, send when idle).
The inline steer note used a ⏩ emoji. Emit a structured `steer:<text>` system note and render it in SystemMessage as a codicon (compass) row — same style as slash-status output. No emoji in the transcript.
Compute the trimmed draft once and reuse for hasComposerPayload + canSteer instead of trimming three times per render.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Makes
/steer(nudge the agent mid-turn without interrupting) actually usable: a first-class desktop affordance and a backend fix so the model stops treating steers as prompt injection.Why
/steerwas in the palette but had no UI, so the mid-turn lane was unreachable in the GUI.User guidance:. Well-behaved models read that as untrusted tool content and refuse it as prompt injection. Observed live:Steer vs queue (why both exist)
Changes
Desktop (composer)
While busy with a text-only draft: a steering-wheel button steers the draft via
session.steer.Cmd/Ctrl+Enter is reserved for steer — steers when there's a steerable draft, no-op otherwise (never a surprise send). Plain Enter keeps its role: queue while busy, send when idle.
Gated to
busy && text-only && non-empty && not a slash command.Reject/RPC-error → re-queue (safety net).
Ack is an inline transcript note rendered as a codicon row (
compass · steered · …), not a toast.Plumbing:
onSteerthrough controller →ChatView→ChatBar;SessionSteerResponse;SteeringWheelicon;steeri18n (en/zh/types).Backend (the real fix — makes steer trusted)
prompt_builder.format_steer_marker()wraps steers in a bounded, self-describing[OUT-OF-BAND USER MESSAGE]marker; shared by both drain sites (agent_runtime_helpers,conversation_loop).STEER_CHANNEL_NOTEadded to the core system prompt: the model is told to expect and trust this exact marker as a genuine user message, while still ignoring lookalikes in tool/web/file output.Security note
The marker is intentionally static, not a per-session nonce, to honor the byte-stable system-prompt caching policy (the codebase even uses day-precision dates to preserve the prefix cache). Trade-off: an attacker who fully controls tool output could forge the marker. A per-session nonce pinned in the prompt would close this; deferred as follow-up since it fights the caching policy and deserves its own discussion.
Verified
Live test (tool-heavy turn, steer sent mid-run): the model followed the steer (dated the remaining steps) with no injection refusal, and the inline codicon note rendered. See thread.
Test plan
tests/run_agent/test_steer.py(23 passed) — new marker on both drain paths, note↔marker contract,User guidance:regression guard.use-prompt-actions.test.tsx(11 passed) — steer injects trimmed text viasession.steer, neverprompt.submit; reject/error/empty paths.tsc --noEmitclean; no import cycles.