fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS#28195
fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS#28195briandevans wants to merge 2 commits into
Conversation
When `agent.tool_use_enforcement` is `"auto"` (the default), the runtime checks the active model name against `TOOL_USE_ENFORCEMENT_MODELS` in `agent/prompt_builder.py` and only injects `TOOL_USE_ENFORCEMENT_GUIDANCE` if a substring matches. Qwen and DeepSeek hit the same chatty/hallucinatory failure mode as GPT, Codex, Grok, and GLM (describing intended actions instead of calling tools, ignoring memory, silently stopping mid-execution), but neither substring was in the tuple — so the enforcement prompt was never injected for users on those families, even with `auto` left at its default. Add `"qwen"` and `"deepseek"` to the tuple, matching the established additive pattern (NousResearch#5595 added grok, NousResearch#24715 added glm, NousResearch#27797 widened grok to xai-oauth). Add four regression-guard tests that fail before the production change and pass after: two unit assertions in `test_prompt_builder.py` mirroring the existing grok/gpt checks, and two integration tests in `test_run_agent.py` confirming that a qwen/deepseek model under `tool_use_enforcement="auto"` now gets the guidance string in its system prompt. The "robust" alternative from the issue (default-true for all models) is intentionally not taken: it would silently flip behavior for users who currently rely on `auto` leaving Claude / non-listed families unsteered, and the maintainer's prior merged work in this area is uniformly additive. Fixes NousResearch#28079
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds Qwen and DeepSeek model families to the tool-use enforcement list so that their system prompts include enforcement guidance by default.
Changes:
- Append
"qwen"and"deepseek"toTOOL_USE_ENFORCEMENT_MODELS. - Add unit tests verifying the tuple includes the new entries.
- Add integration tests verifying enforcement guidance is injected for Qwen and DeepSeek models under
automode.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| agent/prompt_builder.py | Adds qwen and deepseek substrings to the enforcement models tuple. |
| tests/agent/test_prompt_builder.py | Adds membership tests for new model substrings. |
| tests/run_agent/test_run_agent.py | Adds auto-injection tests for Qwen and DeepSeek model IDs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_auto_injects_for_qwen(self): | ||
| """Qwen models default to chatty/hallucinatory tool use without enforcement.""" | ||
| from agent.prompt_builder import TOOL_USE_ENFORCEMENT_GUIDANCE | ||
| agent = self._make_agent(model="qwen/qwen3.6-plus", tool_use_enforcement="auto") |
Copilot flagged that `qwen/qwen3.6-plus` is not a real Qwen model identifier (no such version exists). Substring matching only needs "qwen" so the test still proves the path, but using a name that matches the issue body (`qwen-plus`, Alibaba Cloud) is clearer.
|
@copilot Addressed in commit 9433eab: switched the qwen integration-test model identifier from |
|
Thanks @alt-glitch — flagging the overlap is appreciated. Quick positioning note for the maintainer to choose between:
Happy to defer to #28081 if widening to Claude/Anthropic is the desired direction; happy to widen this PR to |
What does this PR do?
When
agent.tool_use_enforcementis left at its default"auto",agent/system_prompt.pyonly injectsTOOL_USE_ENFORCEMENT_GUIDANCEif the active model name contains a substring fromTOOL_USE_ENFORCEMENT_MODELSinagent/prompt_builder.py:271. The tuple was("gpt", "codex", "gemini", "gemma", "grok", "glm")— bothqwenanddeepseekwere missing, even though both families exhibit the exact failure mode the enforcement prompt was written for (describing intended actions instead of calling tools, hallucinating execution, ignoring existing context/memory, silently stopping mid-task).This PR adds
"qwen"and"deepseek"to the tuple, mirroring the established additive pattern in this area (merged #5595 addedgrok, #24715 addedglm, #27797 widened grok toxai-oauth).The "robust" alternative from the issue (default-true for all models) is intentionally not taken — it would silently flip behavior for users who currently rely on
autoleaving Claude and other non-listed families unsteered, and every prior merged change in this area has been additive.Related Issue
Fixes #28079
Type of Change
Changes Made
agent/prompt_builder.py— append"qwen"and"deepseek"toTOOL_USE_ENFORCEMENT_MODELS. 1-line tuple edit.tests/agent/test_prompt_builder.py— addtest_enforcement_models_includes_qwenandtest_enforcement_models_includes_deepseek, mirroring the existing_includes_gpt/_includes_codex/_includes_grokpattern.tests/run_agent/test_run_agent.py— addtest_auto_injects_for_qwen(modelqwen/qwen3.6-plus) andtest_auto_injects_for_deepseek(modeldeepseek/deepseek-r1) inTestToolUseEnforcementConfig, confirming thattool_use_enforcement="auto"now causes the guidance string to appear in the system prompt for both families.How to Test
git stash push -- agent/prompt_builder.py uv run --with pytest --with pytest-xdist --with pytest-asyncio python3 -m pytest \ tests/run_agent/test_run_agent.py::TestToolUseEnforcementConfig::test_auto_injects_for_qwen \ tests/run_agent/test_run_agent.py::TestToolUseEnforcementConfig::test_auto_injects_for_deepseek \ tests/agent/test_prompt_builder.py::TestToolUseEnforcementGuidance::test_enforcement_models_includes_qwen \ tests/agent/test_prompt_builder.py::TestToolUseEnforcementGuidance::test_enforcement_models_includes_deepseek -v # 4 failures (regression proved) git stash popuv run --with pytest --with pytest-xdist --with pytest-asyncio python3 -m pytest \ tests/agent/test_prompt_builder.py::TestToolUseEnforcementGuidance \ tests/run_agent/test_run_agent.py::TestToolUseEnforcementConfig -v # 9 + 18 = 27 passedmodel.default: qwen/qwen3.6-plus(or any deepseek model), leaveagent.tool_use_enforcementat its default"auto", runhermes chat -q "list files in this directory". Before the fix the agent often replies with a narrated plan and no tool call; after the fixTOOL_USE_ENFORCEMENT_GUIDANCEappears in the system prompt and tool use is enforced.Checklist
Code
fix(agent):)Documentation & Housekeeping
# Add new patterns here when a model family needs explicit steering.comment that already documents the maintenance pattern)cli-config.yaml.exampleif I added/changed config keys — N/A (no config keys changed; the existingagent.tool_use_enforcementkey behaviour is unchanged for explicit values, only theautodefault is widened)CONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — N/ASibling code paths that may need the same fix
The issue body explicitly flags
mistralandllamaas potentially affected (no concrete repro provided). I left them out of this PR's scope to keep the diff minimal and the additions evidence-driven — happy to widen to("mistral", "llama")during cherry-pick if the same chatty-narration failure mode is confirmed on those families.Screenshots / Logs
N/A — guidance-string injection has no UI surface; the change is observable only via system-prompt assembly, which is exercised by the new tests.