Skip to content

diagnose: loud invalid_tool_call WARN + status emit (closes #127)#128

Merged
PowerCreek merged 1 commit into
mainfrom
loud-invalid-tool-call-127
May 27, 2026
Merged

diagnose: loud invalid_tool_call WARN + status emit (closes #127)#128
PowerCreek merged 1 commit into
mainfrom
loud-invalid-tool-call-127

Conversation

@PowerCreek

Copy link
Copy Markdown

Closes #127. After v0.18.4's #124 fix landed, sandbox field-test surfaced the next-level bug: model calls a tool name worker didn't register, retry path fires silently, model hallucinates success in narration on the next turn → mismatch invisible. This PR upgrades the verbose-only print to WARN log + emit_status. Pure observability; no behavior change. 3 source-level tests + 20 total green.

After v0.18.4's tool_call recovery (#124) landed, the next-level
bug surfaced in sandbox field-test: model calls a tool name
hermes' worker didn't register, the invalid_tool_call retry path
fires, but its verbose-only print is invisible in default runs.
Combined with model hallucination ("the file has been created..."
narration on the NEXT turn), the mismatch becomes invisible —
operators see model narration, not the underlying tool-name
mismatch.

## Fix

Upgrade conversation_loop.py:3219's verbose-only print to:

1. ``logger.warning`` with the invented name + count + first 10
   registered names + model + provider for cross-system log
   correlation
2. ``agent._emit_status`` surfacing the mismatch in the user-
   facing stream

Operator immediately sees:
- WHICH name the model invented
- HOW MANY tools the worker has registered
- WHICH tools (sample) ARE registered
- Across which retry of 3

No behavior change — existing invalid_tool_call retry semantics
unchanged. Pure observability boost.

## Tests

- 3 new source-level tests in
  tests/agent/test_loud_invalid_tool_call.py: patch-landed,
  emit_status template includes name + count, warning includes
  model + provider for correlation.
- 20 total green across affected suites — no regression.

## Composition

Same observability family as the #95 / #96 doctor probes. Helps
operators distinguish "hermes ate the tool_call" from "sandbox
toolset doesn't expose what the model is calling".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

diagnose: model calls write_file but tool not registered — narration covers up the mismatch

1 participant