Skip to content

diagnose: model calls write_file but tool not registered — narration covers up the mismatch #127

@PowerCreek

Description

@PowerCreek

Symptom from sandbox v0.18.4 field test

After v0.18.4's #124 fix landed, mistral's tool_call IS now correctly parsed by hermes (no more empty-retry loop). But the actual side-effect doesn't fire:

Most likely root cause

write_file not in agent.valid_tool_names for the sandbox session. The validation at conversation_loop.py:3207-3210 puts it in invalid_tool_calls; line 3236 appends an error tool message; the model on its NEXT turn sees the error AND hallucinates success in its narration.

The verbose-only print at line 3219 (agent._vprint(...)) is invisible in default runs. Operators see only the model's hallucinated narration; the underlying mismatch is invisible.

Fix

Upgrade the invalid_tool_call diagnostic from verbose-only print to:

  1. logger.warning with invalid name + count + sample of registered names + model + provider — operators see it in agent.log regardless of verbose
  2. agent._emit_status with the mismatch — visible in user-facing status stream

Operator can immediately tell:

  • WHICH name the model invented (or which name our sandbox doesn't register)
  • HOW MANY tools the sandbox actually has registered
  • WHICH tools the sandbox DOES have (sample)

This is observability, not behavior change — the existing invalid_tool_call retry path remains. But the mismatch becomes visible at boot/dispatch time instead of being hidden by model hallucination on the next turn.

Acceptance

  • Sandbox session that calls write_file (or any unregistered tool) surfaces a clear WARN + status line naming the tool + the registered tool sample.
  • Operator can correlate hermes log entries with devagentic-side dispatch logs via model + provider in the WARN.

Composition

Same observability family as PR #95 / #96 (env-var probes). Catches sandbox-config bugs that look like recovery failures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions