feat: tool-use enforcement + strip budget warnings from history#3528
Merged
Conversation
Cherry-pick of feat/gpt-tool-steering with modifications:
1. Tool-use enforcement prompt (refactored from GPT-specific):
- Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE
- Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex')
- Injection logic now checks against the tuple instead of hardcoding
'gpt' — adding new model families is a one-line change
- Addresses models describing actions instead of making tool calls
2. Budget warning history stripping:
- _strip_budget_warnings_from_history() strips _budget_warning JSON
keys and [BUDGET WARNING: ...] text from tool results at the start
of run_conversation()
- Prevents old budget warnings from poisoning subsequent turns
Based on PR #3479 by teknium1.
teknium1
added a commit
that referenced
this pull request
Mar 28, 2026
…cement The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in #3528) was hardcoded to only match gpt/codex model names. This makes it a config option so users can turn it on for any model family. New config key: agent.tool_use_enforcement - "auto" (default): matches gpt/codex (existing behavior) - true: inject for all models - false: never inject - list of strings: custom model-name substrings to match e.g. ["gpt", "codex", "deepseek", "qwen"] No version bump needed — deep merge provides the default automatically for existing installs. 12 new tests covering all config modes.
teknium1
added a commit
that referenced
this pull request
Mar 28, 2026
…cement (#3551) The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in #3528) was hardcoded to only match gpt/codex model names. This makes it a config option so users can turn it on for any model family. New config key: agent.tool_use_enforcement - "auto" (default): matches gpt/codex (existing behavior) - true: inject for all models - false: never inject - list of strings: custom model-name substrings to match e.g. ["gpt", "codex", "deepseek", "qwen"] No version bump needed — deep merge provides the default automatically for existing installs. 12 new tests covering all config modes.
teknium1
added a commit
that referenced
this pull request
Mar 28, 2026
… pages Fixes found by auditing docs against recent PRs/commits: Critical (misleading): - hooks.md: Remove stale 'planned — not yet wired' markers for 4 hooks that are now active (#3542). Add correct callback signatures. - security.md: Update tirith verdict behavior — block verdicts now go through approval flow instead of hard-blocking (#3428). Add pkill/killall self-termination guard and gateway-run backgrounding patterns (#3593). New feature docs: - configuration.md: Add tool_use_enforcement section with value table (auto/true/false/list) from #3551/#3528. - configuration.md: Expand auxiliary config with per-task timeouts (compression 120s, web_extract 30s, approval 30s) from #3597. - api-server.md: Add /v1/health alias, Security Headers section, CORS details (Max-Age, SSE headers, Idempotency-Key) from #3572/#3573/#3576/#3580/#3530. Stale/incomplete: - configuration.md: Fix Alibaba model name qwen-plus -> qwen3.5-plus (#3484). - environment-variables.md: Specify actual DashScope default URL. - cli-commands.md: Add alibaba to --provider list. - fallback-providers.md: Add Alibaba/DashScope to provider table. - email.md: Document noreply/automated sender filtering (#3606). - toolsets-reference.md: Add 4 missing platform toolsets — matrix, mattermost, dingtalk, api-server (#3583). - skills.md: List default GitHub taps including garrytan/gstack (#3605).
teknium1
added a commit
that referenced
this pull request
Mar 28, 2026
… pages (#3618) Fixes found by auditing docs against recent PRs/commits: Critical (misleading): - hooks.md: Remove stale 'planned — not yet wired' markers for 4 hooks that are now active (#3542). Add correct callback signatures. - security.md: Update tirith verdict behavior — block verdicts now go through approval flow instead of hard-blocking (#3428). Add pkill/killall self-termination guard and gateway-run backgrounding patterns (#3593). New feature docs: - configuration.md: Add tool_use_enforcement section with value table (auto/true/false/list) from #3551/#3528. - configuration.md: Expand auxiliary config with per-task timeouts (compression 120s, web_extract 30s, approval 30s) from #3597. - api-server.md: Add /v1/health alias, Security Headers section, CORS details (Max-Age, SSE headers, Idempotency-Key) from #3572/#3573/#3576/#3580/#3530. Stale/incomplete: - configuration.md: Fix Alibaba model name qwen-plus -> qwen3.5-plus (#3484). - environment-variables.md: Specify actual DashScope default URL. - cli-commands.md: Add alibaba to --provider list. - fallback-providers.md: Add Alibaba/DashScope to provider table. - email.md: Document noreply/automated sender filtering (#3606). - toolsets-reference.md: Add 4 missing platform toolsets — matrix, mattermost, dingtalk, api-server (#3583). - skills.md: List default GitHub taps including garrytan/gstack (#3605).
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 27, 2026
…Research#3528) Cherry-pick of feat/gpt-tool-steering with modifications: 1. Tool-use enforcement prompt (refactored from GPT-specific): - Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE - Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex') - Injection logic now checks against the tuple instead of hardcoding 'gpt' — adding new model families is a one-line change - Addresses models describing actions instead of making tool calls 2. Budget warning history stripping: - _strip_budget_warnings_from_history() strips _budget_warning JSON keys and [BUDGET WARNING: ...] text from tool results at the start of run_conversation() - Prevents old budget warnings from poisoning subsequent turns Based on PR NousResearch#3479 by teknium1.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 27, 2026
…cement (NousResearch#3551) The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in NousResearch#3528) was hardcoded to only match gpt/codex model names. This makes it a config option so users can turn it on for any model family. New config key: agent.tool_use_enforcement - "auto" (default): matches gpt/codex (existing behavior) - true: inject for all models - false: never inject - list of strings: custom model-name substrings to match e.g. ["gpt", "codex", "deepseek", "qwen"] No version bump needed — deep merge provides the default automatically for existing installs. 12 new tests covering all config modes.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 27, 2026
… pages (NousResearch#3618) Fixes found by auditing docs against recent PRs/commits: Critical (misleading): - hooks.md: Remove stale 'planned — not yet wired' markers for 4 hooks that are now active (NousResearch#3542). Add correct callback signatures. - security.md: Update tirith verdict behavior — block verdicts now go through approval flow instead of hard-blocking (NousResearch#3428). Add pkill/killall self-termination guard and gateway-run backgrounding patterns (NousResearch#3593). New feature docs: - configuration.md: Add tool_use_enforcement section with value table (auto/true/false/list) from NousResearch#3551/NousResearch#3528. - configuration.md: Expand auxiliary config with per-task timeouts (compression 120s, web_extract 30s, approval 30s) from NousResearch#3597. - api-server.md: Add /v1/health alias, Security Headers section, CORS details (Max-Age, SSE headers, Idempotency-Key) from NousResearch#3572/NousResearch#3573/NousResearch#3576/NousResearch#3580/NousResearch#3530. Stale/incomplete: - configuration.md: Fix Alibaba model name qwen-plus -> qwen3.5-plus (NousResearch#3484). - environment-variables.md: Specify actual DashScope default URL. - cli-commands.md: Add alibaba to --provider list. - fallback-providers.md: Add Alibaba/DashScope to provider table. - email.md: Document noreply/automated sender filtering (NousResearch#3606). - toolsets-reference.md: Add 4 missing platform toolsets — matrix, mattermost, dingtalk, api-server (NousResearch#3583). - skills.md: List default GitHub taps including garrytan/gstack (NousResearch#3605).
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
…cement The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in NousResearch#3528) was hardcoded to only match gpt/codex model names. This makes it a config option so users can turn it on for any model family. New config key: agent.tool_use_enforcement - "auto" (default): matches gpt/codex (existing behavior) - true: inject for all models - false: never inject - list of strings: custom model-name substrings to match e.g. ["gpt", "codex", "deepseek", "qwen"] No version bump needed — deep merge provides the default automatically for existing installs. 12 new tests covering all config modes.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…Research#3528) Cherry-pick of feat/gpt-tool-steering with modifications: 1. Tool-use enforcement prompt (refactored from GPT-specific): - Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE - Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex') - Injection logic now checks against the tuple instead of hardcoding 'gpt' — adding new model families is a one-line change - Addresses models describing actions instead of making tool calls 2. Budget warning history stripping: - _strip_budget_warnings_from_history() strips _budget_warning JSON keys and [BUDGET WARNING: ...] text from tool results at the start of run_conversation() - Prevents old budget warnings from poisoning subsequent turns Based on PR NousResearch#3479 by teknium1.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…cement (NousResearch#3551) The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in NousResearch#3528) was hardcoded to only match gpt/codex model names. This makes it a config option so users can turn it on for any model family. New config key: agent.tool_use_enforcement - "auto" (default): matches gpt/codex (existing behavior) - true: inject for all models - false: never inject - list of strings: custom model-name substrings to match e.g. ["gpt", "codex", "deepseek", "qwen"] No version bump needed — deep merge provides the default automatically for existing installs. 12 new tests covering all config modes.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
… pages (NousResearch#3618) Fixes found by auditing docs against recent PRs/commits: Critical (misleading): - hooks.md: Remove stale 'planned — not yet wired' markers for 4 hooks that are now active (NousResearch#3542). Add correct callback signatures. - security.md: Update tirith verdict behavior — block verdicts now go through approval flow instead of hard-blocking (NousResearch#3428). Add pkill/killall self-termination guard and gateway-run backgrounding patterns (NousResearch#3593). New feature docs: - configuration.md: Add tool_use_enforcement section with value table (auto/true/false/list) from NousResearch#3551/NousResearch#3528. - configuration.md: Expand auxiliary config with per-task timeouts (compression 120s, web_extract 30s, approval 30s) from NousResearch#3597. - api-server.md: Add /v1/health alias, Security Headers section, CORS details (Max-Age, SSE headers, Idempotency-Key) from NousResearch#3572/NousResearch#3573/NousResearch#3576/NousResearch#3580/NousResearch#3530. Stale/incomplete: - configuration.md: Fix Alibaba model name qwen-plus -> qwen3.5-plus (NousResearch#3484). - environment-variables.md: Specify actual DashScope default URL. - cli-commands.md: Add alibaba to --provider list. - fallback-providers.md: Add Alibaba/DashScope to provider table. - email.md: Document noreply/automated sender filtering (NousResearch#3606). - toolsets-reference.md: Add 4 missing platform toolsets — matrix, mattermost, dingtalk, api-server (NousResearch#3583). - skills.md: List default GitHub taps including garrytan/gstack (NousResearch#3605).
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
…Research#3528) Cherry-pick of feat/gpt-tool-steering with modifications: 1. Tool-use enforcement prompt (refactored from GPT-specific): - Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE - Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex') - Injection logic now checks against the tuple instead of hardcoding 'gpt' — adding new model families is a one-line change - Addresses models describing actions instead of making tool calls 2. Budget warning history stripping: - _strip_budget_warnings_from_history() strips _budget_warning JSON keys and [BUDGET WARNING: ...] text from tool results at the start of run_conversation() - Prevents old budget warnings from poisoning subsequent turns Based on PR NousResearch#3479 by teknium1.
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
…cement (NousResearch#3551) The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in NousResearch#3528) was hardcoded to only match gpt/codex model names. This makes it a config option so users can turn it on for any model family. New config key: agent.tool_use_enforcement - "auto" (default): matches gpt/codex (existing behavior) - true: inject for all models - false: never inject - list of strings: custom model-name substrings to match e.g. ["gpt", "codex", "deepseek", "qwen"] No version bump needed — deep merge provides the default automatically for existing installs. 12 new tests covering all config modes.
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
… pages (NousResearch#3618) Fixes found by auditing docs against recent PRs/commits: Critical (misleading): - hooks.md: Remove stale 'planned — not yet wired' markers for 4 hooks that are now active (NousResearch#3542). Add correct callback signatures. - security.md: Update tirith verdict behavior — block verdicts now go through approval flow instead of hard-blocking (NousResearch#3428). Add pkill/killall self-termination guard and gateway-run backgrounding patterns (NousResearch#3593). New feature docs: - configuration.md: Add tool_use_enforcement section with value table (auto/true/false/list) from NousResearch#3551/NousResearch#3528. - configuration.md: Expand auxiliary config with per-task timeouts (compression 120s, web_extract 30s, approval 30s) from NousResearch#3597. - api-server.md: Add /v1/health alias, Security Headers section, CORS details (Max-Age, SSE headers, Idempotency-Key) from NousResearch#3572/NousResearch#3573/NousResearch#3576/NousResearch#3580/NousResearch#3530. Stale/incomplete: - configuration.md: Fix Alibaba model name qwen-plus -> qwen3.5-plus (NousResearch#3484). - environment-variables.md: Specify actual DashScope default URL. - cli-commands.md: Add alibaba to --provider list. - fallback-providers.md: Add Alibaba/DashScope to provider table. - email.md: Document noreply/automated sender filtering (NousResearch#3606). - toolsets-reference.md: Add 4 missing platform toolsets — matrix, mattermost, dingtalk, api-server (NousResearch#3583). - skills.md: List default GitHub taps including garrytan/gstack (NousResearch#3605).
CumulusService
pushed a commit
to Cumulus-Service-GmbH/hermes-agent
that referenced
this pull request
May 30, 2026
…cement The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in NousResearch#3528) was hardcoded to only match gpt/codex model names. This makes it a config option so users can turn it on for any model family. New config key: agent.tool_use_enforcement - "auto" (default): matches gpt/codex (existing behavior) - true: inject for all models - false: never inject - list of strings: custom model-name substrings to match e.g. ["gpt", "codex", "deepseek", "qwen"] No version bump needed — deep merge provides the default automatically for existing installs. 12 new tests covering all config modes.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…Research#3528) Cherry-pick of feat/gpt-tool-steering with modifications: 1. Tool-use enforcement prompt (refactored from GPT-specific): - Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE - Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex') - Injection logic now checks against the tuple instead of hardcoding 'gpt' — adding new model families is a one-line change - Addresses models describing actions instead of making tool calls 2. Budget warning history stripping: - _strip_budget_warnings_from_history() strips _budget_warning JSON keys and [BUDGET WARNING: ...] text from tool results at the start of run_conversation() - Prevents old budget warnings from poisoning subsequent turns Based on PR NousResearch#3479 by teknium1.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…cement (NousResearch#3551) The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in NousResearch#3528) was hardcoded to only match gpt/codex model names. This makes it a config option so users can turn it on for any model family. New config key: agent.tool_use_enforcement - "auto" (default): matches gpt/codex (existing behavior) - true: inject for all models - false: never inject - list of strings: custom model-name substrings to match e.g. ["gpt", "codex", "deepseek", "qwen"] No version bump needed — deep merge provides the default automatically for existing installs. 12 new tests covering all config modes.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
… pages (NousResearch#3618) Fixes found by auditing docs against recent PRs/commits: Critical (misleading): - hooks.md: Remove stale 'planned — not yet wired' markers for 4 hooks that are now active (NousResearch#3542). Add correct callback signatures. - security.md: Update tirith verdict behavior — block verdicts now go through approval flow instead of hard-blocking (NousResearch#3428). Add pkill/killall self-termination guard and gateway-run backgrounding patterns (NousResearch#3593). New feature docs: - configuration.md: Add tool_use_enforcement section with value table (auto/true/false/list) from NousResearch#3551/NousResearch#3528. - configuration.md: Expand auxiliary config with per-task timeouts (compression 120s, web_extract 30s, approval 30s) from NousResearch#3597. - api-server.md: Add /v1/health alias, Security Headers section, CORS details (Max-Age, SSE headers, Idempotency-Key) from NousResearch#3572/NousResearch#3573/NousResearch#3576/NousResearch#3580/NousResearch#3530. Stale/incomplete: - configuration.md: Fix Alibaba model name qwen-plus -> qwen3.5-plus (NousResearch#3484). - environment-variables.md: Specify actual DashScope default URL. - cli-commands.md: Add alibaba to --provider list. - fallback-providers.md: Add Alibaba/DashScope to provider table. - email.md: Document noreply/automated sender filtering (NousResearch#3606). - toolsets-reference.md: Add 4 missing platform toolsets — matrix, mattermost, dingtalk, api-server (NousResearch#3583). - skills.md: List default GitHub taps including garrytan/gstack (NousResearch#3605).
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…Research#3528) Cherry-pick of feat/gpt-tool-steering with modifications: 1. Tool-use enforcement prompt (refactored from GPT-specific): - Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE - Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex') - Injection logic now checks against the tuple instead of hardcoding 'gpt' — adding new model families is a one-line change - Addresses models describing actions instead of making tool calls 2. Budget warning history stripping: - _strip_budget_warnings_from_history() strips _budget_warning JSON keys and [BUDGET WARNING: ...] text from tool results at the start of run_conversation() - Prevents old budget warnings from poisoning subsequent turns Based on PR NousResearch#3479 by teknium1.
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…cement (NousResearch#3551) The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in NousResearch#3528) was hardcoded to only match gpt/codex model names. This makes it a config option so users can turn it on for any model family. New config key: agent.tool_use_enforcement - "auto" (default): matches gpt/codex (existing behavior) - true: inject for all models - false: never inject - list of strings: custom model-name substrings to match e.g. ["gpt", "codex", "deepseek", "qwen"] No version bump needed — deep merge provides the default automatically for existing installs. 12 new tests covering all config modes.
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
… pages (NousResearch#3618) Fixes found by auditing docs against recent PRs/commits: Critical (misleading): - hooks.md: Remove stale 'planned — not yet wired' markers for 4 hooks that are now active (NousResearch#3542). Add correct callback signatures. - security.md: Update tirith verdict behavior — block verdicts now go through approval flow instead of hard-blocking (NousResearch#3428). Add pkill/killall self-termination guard and gateway-run backgrounding patterns (NousResearch#3593). New feature docs: - configuration.md: Add tool_use_enforcement section with value table (auto/true/false/list) from NousResearch#3551/NousResearch#3528. - configuration.md: Expand auxiliary config with per-task timeouts (compression 120s, web_extract 30s, approval 30s) from NousResearch#3597. - api-server.md: Add /v1/health alias, Security Headers section, CORS details (Max-Age, SSE headers, Idempotency-Key) from NousResearch#3572/NousResearch#3573/NousResearch#3576/NousResearch#3580/NousResearch#3530. Stale/incomplete: - configuration.md: Fix Alibaba model name qwen-plus -> qwen3.5-plus (NousResearch#3484). - environment-variables.md: Specify actual DashScope default URL. - cli-commands.md: Add alibaba to --provider list. - fallback-providers.md: Add Alibaba/DashScope to provider table. - email.md: Document noreply/automated sender filtering (NousResearch#3606). - toolsets-reference.md: Add 4 missing platform toolsets — matrix, mattermost, dingtalk, api-server (NousResearch#3583). - skills.md: List default GitHub taps including garrytan/gstack (NousResearch#3605).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Salvage of PR #3479 with refactoring to make tool-use enforcement reusable.
1. Tool-use enforcement prompt (refactored)
Adds
TOOL_USE_ENFORCEMENT_GUIDANCEto the system prompt for models that need explicit steering to use tools instead of describing actions. Refactored from the original GPT-specific implementation:TOOL_USE_ENFORCEMENT_MODELS— tuple of model name substrings that trigger the guidance. Currently("gpt", "codex"). Adding a new model family is a one-line change to this tuple._build_system_prompt()whenany(p in model_lower for p in TOOL_USE_ENFORCEMENT_MODELS)and tools are loaded2. Budget warning history stripping
_strip_budget_warnings_from_history()strips turn-scoped budget pressure warnings from tool-result messages at the start ofrun_conversation(). Previously these persisted in the session transcript and caused models to avoid tool calls in ALL subsequent turns.Handles both formats:
_budget_warningkey from parsed tool result dicts[BUDGET WARNING: Iteration N/M...]patternsFiles changed
agent/prompt_builder.py—TOOL_USE_ENFORCEMENT_GUIDANCEconstant +TOOL_USE_ENFORCEMENT_MODELStuplerun_agent.py— Import + inject guidance, new_strip_budget_warnings_from_history()+ call inrun_conversation()tests/agent/test_prompt_builder.py— 11 new tests (guidance content, model list membership, budget stripping)Test plan
Closes #3479.