fix(core): remove greeting-responder example from agent tool prompt#4923
Conversation
The greeting-responder example in the agent tool description could mislead some models into launching a subagent for simple greetings instead of responding with plain text. Remove the example and its associated agent description entry to eliminate this risk.
|
Thanks for the PR! Template looks good ✓ On direction: This is a clean, targeted fix. The On approach: Scope is minimal — 9 lines deleted, no code logic touched. The remaining Moving on to code review. 🔍 中文说明感谢贡献! 模板完整 ✓ 方向:这是一个干净、精准的修复。 方案:范围最小——删除 9 行,不涉及代码逻辑。剩余的 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Code ReviewThe diff is clean and matches what I'd do independently: remove the Real-Scenario TestingRan Before (main — greeting-responder example present)After (PR branch — greeting-responder example removed)Summary
Both runs produce identical user-visible behavior — plain text response, no agent tool invocation. The PR reduces prompt size by ~56 input tokens. The change is preventive: it removes a misleading few-shot example that could trigger agent misuse on other models, even though qwen3.7-max is not affected. 中文说明代码审查Diff 干净,与我的独立方案完全一致:删除 真实场景测试在两个分支上运行了 两次运行产生了相同的用户可见行为——纯文本响应,无 agent 工具调用。PR 减少了约 56 个输入 token。该更改是预防性的:它移除了一个可能在其他模型上触发 agent 误用的误导性 few-shot 示例,即使 qwen3.7-max 不受影响。 — Qwen Code · qwen3.7-max |
|
This is a clean, well-justified PR. The author identified a real risk — the greeting-responder few-shot example teaches models to delegate trivial input to a subagent — and applied the simplest possible fix: delete it. The remaining test-runner example already covers the agent tool pattern, so nothing is lost. Testing confirms no behavioral regression on qwen3.7-max (the model responds with plain text in both cases), and the prompt shrinks by ~56 tokens. The author's own 40-run cross-model test couldn't reproduce the issue either, but the preventive removal of a misleading example is sound engineering — it eliminates a footgun for models we haven't tested and future model versions. No concerns. Ships it. ✅ 中文说明这是一个干净、充分论证的 PR。作者识别出了一个真实风险——greeting-responder 的 few-shot 示例教会模型将简单输入委托给子代理——并应用了最简单的修复:删除它。剩余的 test-runner 示例已经覆盖了 agent 工具的使用模式,没有任何损失。 测试确认在 qwen3.7-max 上没有行为回归(两种情况下模型都用纯文本响应),提示词缩小了约 56 个 token。作者自己的 40 次跨模型测试也无法复现问题,但预防性移除误导性示例是合理的工程实践——消除了未测试模型和未来模型版本的隐患。 无顾虑,通过。✅ — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
What this PR does
Removes the
greeting-responderexample from the agent tool's model-facing prompt. This includes the fake agent description entry ("greeting-responder": use this agent to respond to user greetings with a friendly joke) and the accompanying few-shot example that shows the model launching a subagent in response to "Hello".Why it's needed
We encountered reports of some models misusing the agent tool for simple greeting messages — launching a subagent instead of responding with plain text. The
greeting-responderexample in the agent tool prompt is the likely culprit: it teaches the model that greetings should be delegated to an agent.We ran a cross-model headless test (8 models × 5 repeats = 40 runs) sending "hi" and checking for agent tool invocations via both JSON output and raw API request logs. None of the tested models (claude-opus-4-6, gpt-5.5, deepseek-v4-pro, deepseek-v4-flash, qwen3.7-max, qwen3.7-plus, glm-5.1, kimi-k2.6) triggered the agent tool in our tests. However, the example still presents an unnecessary risk — it's the only few-shot that teaches a "delegate to agent" pattern for trivial user input, and models we haven't tested (or future model versions) may be more susceptible.
Reviewer Test Plan
How to verify
npm run build && npm run bundlenode dist/cli.js "hi" --approval-mode yolo --output-format json 2>/dev/null— confirm the response is plain text with no agent tool call.--openai-logging --openai-logging-dir /tmp/api-logsand inspect the API log to confirmresponse.choices[0].message.tool_callsis empty or absent.Evidence (Before & After)
N/A — prompt-level change, no TUI impact. The removed text was part of the model-facing tool description, not user-visible UI.
Tested on
Risk & Scope
中文说明
从 agent 工具的模型提示词中移除了
greeting-responder示例。该示例包括一个虚构的 agent 描述条目和一个展示模型在收到 "Hello" 时启动子代理的 few-shot 示例。部分模型会误用 agent 工具来处理简单的问候消息(启动子代理而非直接回复文本)。虽然我们在 8 个模型 × 5 次重复 = 40 次测试中未能复现此问题,但该示例仍然存在不必要的风险——它是唯一一个教导模型将简单用户输入委托给 agent 的 few-shot 示例。