Symptom from sandbox v0.18.5 field test
After v0.18.5's #127 fix landed, the invalid_tool_call WARN does NOT fire — confirming write_file IS in agent.valid_tool_names for the sandbox session. But the actual write_file side-effect STILL doesn't materialize: /tmp/random_test.py doesn't exist on the container despite the model emitting + hermes parsing the tool_call.
Symptom-localized hypothesis
The dispatch reaches the tool branch + validation passes + execute_tool_calls_sequential is invoked, but somewhere between handler entry and filesystem write, the side-effect is lost. Operators currently have no signal between (a) dispatch entry, (b) handler invocation, (c) handler return — so can't tell if:
- Guardrail / pre-hook blocked execution silently
- Handler ran but routed write to a sandbox_backend / ssh-remote workspace (path resolved to a different filesystem the user isn't checking)
- Handler ran but returned an error string the model ignored in its narration
- Some other early-return between conversation_loop.py:3422 and handler entry
Fix
Pure observability — no behavior change. Add WARNING logs in
agent/tool_executor.py::execute_tool_calls_sequential:
- Entry:
dispatching N tool_call(s) [names] (task_id=X api_call=N model=M provider=P) — one log per dispatch
- Per-tool post-handler:
tool_call dispatched: name=X task_id=X blocked=BOOL duration=Ns result_preview='...' — captures the handler's actual return string (truncated) + the _execution_blocked flag so guardrail / pre-hook blocks are visible
After install + a sandbox session that calls write_file, operators see:
- The dispatch entry (count + names confirms hermes IS routing to executor)
- The per-tool return preview (confirms handler ran + what it returned)
blocked=True if guardrail / pre-hook intercepted
This isolates the next layer: response-handler bug vs. handler-internal bug vs. workspace-routing bug.
Acceptance
- Every
execute_tool_calls_sequential invocation logs an entry WARN with tool count + names
- Every per-tool dispatch (across the 4 branches: spinner-quiet, memory-provider, plain-quiet, non-quiet) logs a result-preview WARN
- Operator can correlate model's narrated outcome with the handler's actual return string
Composition
Same observability family as #95 / #96 / #127. After this, the next diagnostic layer is whatever the result preview reveals (could trigger another small ship for the actual root cause).
Symptom from sandbox v0.18.5 field test
After v0.18.5's #127 fix landed, the invalid_tool_call WARN does NOT fire — confirming
write_fileIS inagent.valid_tool_namesfor the sandbox session. But the actualwrite_fileside-effect STILL doesn't materialize:/tmp/random_test.pydoesn't exist on the container despite the model emitting + hermes parsing the tool_call.Symptom-localized hypothesis
The dispatch reaches the tool branch + validation passes +
execute_tool_calls_sequentialis invoked, but somewhere between handler entry and filesystem write, the side-effect is lost. Operators currently have no signal between (a) dispatch entry, (b) handler invocation, (c) handler return — so can't tell if:Fix
Pure observability — no behavior change. Add WARNING logs in
agent/tool_executor.py::execute_tool_calls_sequential:dispatching N tool_call(s) [names] (task_id=X api_call=N model=M provider=P)— one log per dispatchtool_call dispatched: name=X task_id=X blocked=BOOL duration=Ns result_preview='...'— captures the handler's actual return string (truncated) + the_execution_blockedflag so guardrail / pre-hook blocks are visibleAfter install + a sandbox session that calls
write_file, operators see:blocked=Trueif guardrail / pre-hook interceptedThis isolates the next layer: response-handler bug vs. handler-internal bug vs. workspace-routing bug.
Acceptance
execute_tool_calls_sequentialinvocation logs an entry WARN with tool count + namesComposition
Same observability family as #95 / #96 / #127. After this, the next diagnostic layer is whatever the result preview reveals (could trigger another small ship for the actual root cause).