Skip to content

diagnose: tool_calls dispatched but side-effect doesn't fire — need entry/return WARNs in tool_executor #130

@PowerCreek

Description

@PowerCreek

Symptom from sandbox v0.18.5 field test

After v0.18.5's #127 fix landed, the invalid_tool_call WARN does NOT fire — confirming write_file IS in agent.valid_tool_names for the sandbox session. But the actual write_file side-effect STILL doesn't materialize: /tmp/random_test.py doesn't exist on the container despite the model emitting + hermes parsing the tool_call.

Symptom-localized hypothesis

The dispatch reaches the tool branch + validation passes + execute_tool_calls_sequential is invoked, but somewhere between handler entry and filesystem write, the side-effect is lost. Operators currently have no signal between (a) dispatch entry, (b) handler invocation, (c) handler return — so can't tell if:

  • Guardrail / pre-hook blocked execution silently
  • Handler ran but routed write to a sandbox_backend / ssh-remote workspace (path resolved to a different filesystem the user isn't checking)
  • Handler ran but returned an error string the model ignored in its narration
  • Some other early-return between conversation_loop.py:3422 and handler entry

Fix

Pure observability — no behavior change. Add WARNING logs in
agent/tool_executor.py::execute_tool_calls_sequential:

  1. Entry: dispatching N tool_call(s) [names] (task_id=X api_call=N model=M provider=P) — one log per dispatch
  2. Per-tool post-handler: tool_call dispatched: name=X task_id=X blocked=BOOL duration=Ns result_preview='...' — captures the handler's actual return string (truncated) + the _execution_blocked flag so guardrail / pre-hook blocks are visible

After install + a sandbox session that calls write_file, operators see:

  • The dispatch entry (count + names confirms hermes IS routing to executor)
  • The per-tool return preview (confirms handler ran + what it returned)
  • blocked=True if guardrail / pre-hook intercepted

This isolates the next layer: response-handler bug vs. handler-internal bug vs. workspace-routing bug.

Acceptance

  • Every execute_tool_calls_sequential invocation logs an entry WARN with tool count + names
  • Every per-tool dispatch (across the 4 branches: spinner-quiet, memory-provider, plain-quiet, non-quiet) logs a result-preview WARN
  • Operator can correlate model's narrated outcome with the handler's actual return string

Composition

Same observability family as #95 / #96 / #127. After this, the next diagnostic layer is whatever the result preview reveals (could trigger another small ship for the actual root cause).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions