Skip to content

feat: add grok to TOOL_USE_ENFORCEMENT_MODELS for direct xAI usage#5595

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-ba679ba8
Apr 6, 2026
Merged

feat: add grok to TOOL_USE_ENFORCEMENT_MODELS for direct xAI usage#5595
teknium1 merged 1 commit into
mainfrom
hermes/hermes-ba679ba8

Conversation

@teknium1

@teknium1 teknium1 commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds "grok" to the TOOL_USE_ENFORCEMENT_MODELS tuple so Grok models receive tool-use enforcement guidance in the system prompt.

Closes #5531

What changed

  • agent/prompt_builder.py: Added "grok" to TOOL_USE_ENFORCEMENT_MODELS
  • tests/agent/test_prompt_builder.py: Added assertion test for grok inclusion

Why

Grok models (x-ai/grok-4.20-beta, grok-code-fast-1) accessed via OpenRouter or direct xAI API were not getting the tool-use enforcement guidance that steers models to actually call tools instead of describing intended actions. The substring match on "grok" covers both routing paths.

Test plan

  • python -m pytest tests/agent/test_prompt_builder.py -n0 -q — 119 passed

Grok models (x-ai/grok-4.20-beta, grok-code-fast-1) now receive tool-use
enforcement guidance, steering them to actually call tools instead of
describing intended actions. Matches both OpenRouter (x-ai/grok-*) and
direct xAI API usage.
@github-actions

github-actions Bot commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

@teknium1 teknium1 merged commit 582dbbb into main Apr 6, 2026
3 of 4 checks passed
Julientalbot pushed a commit to Julientalbot/hermes-agent that referenced this pull request Apr 10, 2026
Grok reasoning models have a failure mode where they describe planned
actions in text ("I will check X", "Je vais lancer Y") without
actually calling the corresponding tools. The existing
TOOL_USE_ENFORCEMENT_GUIDANCE mitigates the "action reflex" trait
(NousResearch#5595) but doesn't address the narration-vs-execution split that is
specific to reasoning architectures.

Add GROK_EXECUTION_GUIDANCE — a targeted system prompt block injected
alongside TOOL_USE_ENFORCEMENT_GUIDANCE when the model name contains
"grok". Three XML-tagged sections:

- <no_intent_phrases>: explicit list of forbidden phrases in English
  and French ("I will...", "Let me...", "Je vais...", etc.) with
  the rule: if you need to act, call the tool now; do not narrate
  the intent.
- <execute_first>: mandate that the first response to any work-implying
  request contain a tool call, not a plan. Chain multiple tool calls
  in the same turn without intermediate prose.
- <no_analysis_hallucination>: forbid structured analyses, diagnosis
  lists, or recommendations produced from pure reasoning without tool
  calls to verify the claims.

Injected in run_agent.py next to the existing provider-specific guidance
blocks (OPENAI_MODEL_EXECUTION_GUIDANCE, GOOGLE_MODEL_OPERATIONAL_GUIDANCE).

Tests (6 new in TestGrokExecutionGuidance):
- Verifies XML tag structure
- Asserts intent-phrase examples are present in both English and French
- Asserts the execute-first mandate is documented
- Asserts the no-analysis-hallucination rule is present
- Size and type checks

124 passed, 1 skipped in tests/agent/test_prompt_builder.py (no regression).

NOT YET PUSHED as a PR. To be dogfooded on the author's production
instance on xAI before upstream submission, given the precedent of
'behavioral' patches being classified as prostheses in prior work.
Tommyeds pushed a commit to Tommyeds/hermes-agent that referenced this pull request Apr 12, 2026
…ousResearch#5595)

Grok models (x-ai/grok-4.20-beta, grok-code-fast-1) now receive tool-use
enforcement guidance, steering them to actually call tools instead of
describing intended actions. Matches both OpenRouter (x-ai/grok-*) and
direct xAI API usage.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
…ousResearch#5595)

Grok models (x-ai/grok-4.20-beta, grok-code-fast-1) now receive tool-use
enforcement guidance, steering them to actually call tools instead of
describing intended actions. Matches both OpenRouter (x-ai/grok-*) and
direct xAI API usage.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…ousResearch#5595)

Grok models (x-ai/grok-4.20-beta, grok-code-fast-1) now receive tool-use
enforcement guidance, steering them to actually call tools instead of
describing intended actions. Matches both OpenRouter (x-ai/grok-*) and
direct xAI API usage.
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
…ousResearch#5595)

Grok models (x-ai/grok-4.20-beta, grok-code-fast-1) now receive tool-use
enforcement guidance, steering them to actually call tools instead of
describing intended actions. Matches both OpenRouter (x-ai/grok-*) and
direct xAI API usage.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ousResearch#5595)

Grok models (x-ai/grok-4.20-beta, grok-code-fast-1) now receive tool-use
enforcement guidance, steering them to actually call tools instead of
describing intended actions. Matches both OpenRouter (x-ai/grok-*) and
direct xAI API usage.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…ousResearch#5595)

Grok models (x-ai/grok-4.20-beta, grok-code-fast-1) now receive tool-use
enforcement guidance, steering them to actually call tools instead of
describing intended actions. Matches both OpenRouter (x-ai/grok-*) and
direct xAI API usage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant