feat(grok): apply OpenAI execution guidance to xAI Grok / xai-oauth models by teknium1 · Pull Request #27797 · NousResearch/hermes-agent

teknium1 · 2026-05-18T06:00:28Z

Summary

xAI Grok models (xai-oauth + OpenRouter grok-*) now get the same family-specific execution discipline block (OPENAI_MODEL_EXECUTION_GUIDANCE) that GPT/Codex get. Same failure modes in practice: claims completion without tool calls ("to be honest, I didn't create the file yet"), suggests workarounds instead of using existing tools (proposing a folder-based memory system when the memory tool exists), replies with plans instead of executing.

The base TOOL_USE_ENFORCEMENT_GUIDANCE was already firing for grok ("grok" is in TOOL_USE_ENFORCEMENT_MODELS). This was just the family-specific second tier that GPT/Codex got and Grok didn't.

Changes

agent/system_prompt.py: gate at L159 also matches "grok" in _model_lower
agent/prompt_builder.py: docstring note that OPENAI_ prefix reflects origin, not exclusivity (body is family-agnostic — tool_persistence / mandatory_tool_use / act_dont_ask / prerequisite_checks / verification / missing_context)
tests/run_agent/test_run_agent.py: 4 new tests covering OpenRouter slug, xai-oauth bare name, and a claude negative control

Validation

	Before	After
Grok base enforcement block	injected	injected
Grok exec discipline (verification, mandatory_tool_use, act_dont_ask)	missing	injected
Claude exec discipline	not injected	not injected

TestToolUseEnforcementConfig: 16/16 pass (12 pre-existing + 4 new)
test_prompt_builder.py: 122/122 pass
E2E with real AIAgent._build_system_prompt(): grok-4.3 (xai-oauth) and x-ai/grok-4.20 (openrouter) both inject the full block including <verification>, <mandatory_tool_use>, <act_dont_ask>; claude-sonnet-4 does not.

…odels Grok models hit the same failure modes that OPENAI_MODEL_EXECUTION_GUIDANCE addresses for GPT/Codex: claiming completion without tool calls ('to be honest, I didn't create the file yet'), suggesting workarounds instead of using existing tools (proposing a folder-based memory system when the memory tool exists), replying with plans instead of executing. TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for any model whose name contains 'grok' (TOOL_USE_ENFORCEMENT_MODELS). This extends the follow-on family-specific block — OPENAI_MODEL_EXECUTION_GUIDANCE (tool_persistence / mandatory_tool_use / act_dont_ask / prerequisite_checks / verification / missing_context) — to grok-named models too. The OPENAI_ prefix is retained for backwards compat with imports/tests; docstring + inline comment now note that the body is family-agnostic and the prefix reflects origin, not exclusivity. Tests cover the OpenRouter slug (x-ai/grok-4.3) and the xai-oauth bare name (grok-4.3), plus a negative control on claude. E2E verified against a real AIAgent build of the system prompt for both xai-oauth and openrouter grok models.

github-actions · 2026-05-18T06:01:11Z

🔎 Lint report: `hermes/hermes-e3fd584d` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8749 on HEAD, 8749 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4608 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

…odels (NousResearch#27797) Grok models hit the same failure modes that OPENAI_MODEL_EXECUTION_GUIDANCE addresses for GPT/Codex: claiming completion without tool calls ('to be honest, I didn't create the file yet'), suggesting workarounds instead of using existing tools (proposing a folder-based memory system when the memory tool exists), replying with plans instead of executing. TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for any model whose name contains 'grok' (TOOL_USE_ENFORCEMENT_MODELS). This extends the follow-on family-specific block — OPENAI_MODEL_EXECUTION_GUIDANCE (tool_persistence / mandatory_tool_use / act_dont_ask / prerequisite_checks / verification / missing_context) — to grok-named models too. The OPENAI_ prefix is retained for backwards compat with imports/tests; docstring + inline comment now note that the body is family-agnostic and the prefix reflects origin, not exclusivity. Tests cover the OpenRouter slug (x-ai/grok-4.3) and the xai-oauth bare name (grok-4.3), plus a negative control on claude. E2E verified against a real AIAgent build of the system prompt for both xai-oauth and openrouter grok models.

teknium1 merged commit 9b91377 into main May 18, 2026
16 of 17 checks passed

teknium1 deleted the hermes/hermes-e3fd584d branch May 18, 2026 06:00

alt-glitch added type/feature New feature or request comp/agent Core agent loop, run_agent.py, prompt builder provider/xai xAI (Grok) P3 Low — cosmetic, nice to have labels May 18, 2026

briandevans mentioned this pull request May 18, 2026

fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS #28195

Closed

19 tasks

BrewTestBot mentioned this pull request May 28, 2026

hermes-agent 2026.5.28 Homebrew/homebrew-core#285115

Merged

1 task

intelac mentioned this pull request May 30, 2026

feat(local-models): apply OpenAI execution guidance to Qwen / DeepSeek / GLM families #35087

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(grok): apply OpenAI execution guidance to xAI Grok / xai-oauth models#27797

feat(grok): apply OpenAI execution guidance to xAI Grok / xai-oauth models#27797
teknium1 merged 1 commit into
mainfrom
hermes/hermes-e3fd584d

teknium1 commented May 18, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented May 18, 2026

Summary

Changes

Validation

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026

🔎 Lint report: hermes/hermes-e3fd584d vs origin/main

ruff

ty (type checker)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🔎 Lint report: `hermes/hermes-e3fd584d` vs `origin/main`