diagnose: loud invalid_tool_call WARN + status emit (closes #127)#128
Merged
Conversation
After v0.18.4's tool_call recovery (#124) landed, the next-level bug surfaced in sandbox field-test: model calls a tool name hermes' worker didn't register, the invalid_tool_call retry path fires, but its verbose-only print is invisible in default runs. Combined with model hallucination ("the file has been created..." narration on the NEXT turn), the mismatch becomes invisible — operators see model narration, not the underlying tool-name mismatch. ## Fix Upgrade conversation_loop.py:3219's verbose-only print to: 1. ``logger.warning`` with the invented name + count + first 10 registered names + model + provider for cross-system log correlation 2. ``agent._emit_status`` surfacing the mismatch in the user- facing stream Operator immediately sees: - WHICH name the model invented - HOW MANY tools the worker has registered - WHICH tools (sample) ARE registered - Across which retry of 3 No behavior change — existing invalid_tool_call retry semantics unchanged. Pure observability boost. ## Tests - 3 new source-level tests in tests/agent/test_loud_invalid_tool_call.py: patch-landed, emit_status template includes name + count, warning includes model + provider for correlation. - 20 total green across affected suites — no regression. ## Composition Same observability family as the #95 / #96 doctor probes. Helps operators distinguish "hermes ate the tool_call" from "sandbox toolset doesn't expose what the model is calling".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #127. After v0.18.4's #124 fix landed, sandbox field-test surfaced the next-level bug: model calls a tool name worker didn't register, retry path fires silently, model hallucinates success in narration on the next turn → mismatch invisible. This PR upgrades the verbose-only print to WARN log + emit_status. Pure observability; no behavior change. 3 source-level tests + 20 total green.