Add <steer> metadata so model knows what they are#1
Open
Conversation
Steered messages currently look identical to ordinary user messages in the model-visible transcript. That leaves the model guessing whether a mid-turn input should hand control back to the user or continue the active task, which is exactly the failure described in openai#11062. Emit a model-visible <steer> fragment immediately before steered input, exclude that marker from memory generation, and add a terse base-prompt rule that treats the following user message as same-task steering. Keep the actual user text as a normal user message so the metadata stays separate from user content. Add coverage for marker detection, pending-input ordering, and mailbox delivery while steered input is in flight.
Use a bare <steer></steer> marker instead of a nested scope payload. The steer fragment only needs to tell the model that the following user message is same-task steering. Keeping it empty makes the marker more direct, keeps it in-family with the existing fragment system, and avoids pretending there is a richer steer schema than the implementation actually uses.
Keep the steer follow-up diff tight. Use a local import so the pending-input call stays on one simple line after rustfmt, and restore the nearby memory-exclusion comment to track upstream wording more closely instead of carrying avoidable prose churn.
Move the steer rule under the task-execution criteria list and tighten the wording to the final form. The new text is shorter, more direct, and matches the intended behavior: handle the steer, continue the current task, and do not end the turn unless explicitly requested.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mark steers with
<steer>Providing the model with the same language as the user improves communication and increases the chance of contextually appropriate responses. Models without contextual information cannot apply their intelligence.
Codex currently sees "steers" and ordinary user messages as the same thing in the model-visible transcript, with no way to differentiate.
When it currently receives a steer, its not contextually clear to it that it should address the steer and KEEP WORKING. As a result, it frequently will address the steer (reply to user, change slightly) and then end its turn. This is frustrating if its something you want it to keep working on for say the next hour.
As a user, this makes steers feel "risky", because N% of the time they interrupt the current work by accident. You end up mentally weighing if the risk of it stopping work is worth the nit.
This PR fixes openai#11062.
This PR
<steer></steer>control message fragment prior to sending the user's messagesteer_input_queues_model_visible_marker_before_user_messageWhen a <steer> block appears, treat the next user message as steering for your current task. Handle it, then continue your current task. Do not end your turn unless explicitly requested.Other approaches taken to workaround
I've tried addressing this via AGENTS.md instructions, but the fundamental issue is that Codex can't tell the different between what's a steer and what's a regular prompt (where it should return control back to dialog after a response).
I've had some success, but it increases the likelihood that a non-steer will launch codex into a long running task. Fundamentally, codex has no real way of guessing which is which (a human fed the same info would struggle too).