feat(grok): apply OpenAI execution guidance to xAI Grok / xai-oauth models#27797
Merged
Conversation
…odels
Grok models hit the same failure modes that OPENAI_MODEL_EXECUTION_GUIDANCE
addresses for GPT/Codex: claiming completion without tool calls
('to be honest, I didn't create the file yet'), suggesting workarounds
instead of using existing tools (proposing a folder-based memory system
when the memory tool exists), replying with plans instead of executing.
TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for any model whose
name contains 'grok' (TOOL_USE_ENFORCEMENT_MODELS). This extends the
follow-on family-specific block — OPENAI_MODEL_EXECUTION_GUIDANCE
(tool_persistence / mandatory_tool_use / act_dont_ask / prerequisite_checks
/ verification / missing_context) — to grok-named models too.
The OPENAI_ prefix is retained for backwards compat with imports/tests;
docstring + inline comment now note that the body is family-agnostic and
the prefix reflects origin, not exclusivity.
Tests cover the OpenRouter slug (x-ai/grok-4.3) and the xai-oauth bare
name (grok-4.3), plus a negative control on claude.
E2E verified against a real AIAgent build of the system prompt for both
xai-oauth and openrouter grok models.
Contributor
🔎 Lint report:
|
19 tasks
Lillard01
pushed a commit
to Lillard01/hermes-agent
that referenced
this pull request
May 21, 2026
…odels (NousResearch#27797) Grok models hit the same failure modes that OPENAI_MODEL_EXECUTION_GUIDANCE addresses for GPT/Codex: claiming completion without tool calls ('to be honest, I didn't create the file yet'), suggesting workarounds instead of using existing tools (proposing a folder-based memory system when the memory tool exists), replying with plans instead of executing. TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for any model whose name contains 'grok' (TOOL_USE_ENFORCEMENT_MODELS). This extends the follow-on family-specific block — OPENAI_MODEL_EXECUTION_GUIDANCE (tool_persistence / mandatory_tool_use / act_dont_ask / prerequisite_checks / verification / missing_context) — to grok-named models too. The OPENAI_ prefix is retained for backwards compat with imports/tests; docstring + inline comment now note that the body is family-agnostic and the prefix reflects origin, not exclusivity. Tests cover the OpenRouter slug (x-ai/grok-4.3) and the xai-oauth bare name (grok-4.3), plus a negative control on claude. E2E verified against a real AIAgent build of the system prompt for both xai-oauth and openrouter grok models.
1 task
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…odels (NousResearch#27797) Grok models hit the same failure modes that OPENAI_MODEL_EXECUTION_GUIDANCE addresses for GPT/Codex: claiming completion without tool calls ('to be honest, I didn't create the file yet'), suggesting workarounds instead of using existing tools (proposing a folder-based memory system when the memory tool exists), replying with plans instead of executing. TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for any model whose name contains 'grok' (TOOL_USE_ENFORCEMENT_MODELS). This extends the follow-on family-specific block — OPENAI_MODEL_EXECUTION_GUIDANCE (tool_persistence / mandatory_tool_use / act_dont_ask / prerequisite_checks / verification / missing_context) — to grok-named models too. The OPENAI_ prefix is retained for backwards compat with imports/tests; docstring + inline comment now note that the body is family-agnostic and the prefix reflects origin, not exclusivity. Tests cover the OpenRouter slug (x-ai/grok-4.3) and the xai-oauth bare name (grok-4.3), plus a negative control on claude. E2E verified against a real AIAgent build of the system prompt for both xai-oauth and openrouter grok models.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
xAI Grok models (xai-oauth + OpenRouter grok-*) now get the same family-specific execution discipline block (
OPENAI_MODEL_EXECUTION_GUIDANCE) that GPT/Codex get. Same failure modes in practice: claims completion without tool calls ("to be honest, I didn't create the file yet"), suggests workarounds instead of using existing tools (proposing a folder-based memory system when the memory tool exists), replies with plans instead of executing.The base
TOOL_USE_ENFORCEMENT_GUIDANCEwas already firing for grok ("grok" is inTOOL_USE_ENFORCEMENT_MODELS). This was just the family-specific second tier that GPT/Codex got and Grok didn't.Changes
agent/system_prompt.py: gate at L159 also matches"grok" in _model_loweragent/prompt_builder.py: docstring note thatOPENAI_prefix reflects origin, not exclusivity (body is family-agnostic — tool_persistence / mandatory_tool_use / act_dont_ask / prerequisite_checks / verification / missing_context)tests/run_agent/test_run_agent.py: 4 new tests covering OpenRouter slug, xai-oauth bare name, and a claude negative controlValidation
TestToolUseEnforcementConfig: 16/16 pass (12 pre-existing + 4 new)test_prompt_builder.py: 122/122 passAIAgent._build_system_prompt():grok-4.3(xai-oauth) andx-ai/grok-4.20(openrouter) both inject the full block including<verification>,<mandatory_tool_use>,<act_dont_ask>;claude-sonnet-4does not.