Skip to content

fix(run-agent): normalize tool_call_id before matching results#10000

Open
shm197 wants to merge 1 commit into
NousResearch:mainfrom
shm197:fix/normalize-tool-call-id-whitespace
Open

fix(run-agent): normalize tool_call_id before matching results#10000
shm197 wants to merge 1 commit into
NousResearch:mainfrom
shm197:fix/normalize-tool-call-id-whitespace

Conversation

@shm197

@shm197 shm197 commented Apr 15, 2026

Copy link
Copy Markdown

What does this PR do?

Fixes a run_agent.py bug where Hermes can replace a real tool result with [Result unavailable — see context summary above] if the persisted tool_call_id has surrounding whitespace.

The root cause is inconsistent normalization:

  • assistant-side tool call IDs are already trimmed in _build_assistant_message()
  • _sanitize_api_messages() was still comparing raw tool-result tool_call_id values

That makes these two IDs fail to match even though they are semantically the same:

  • functions.cronjob:24
  • functions.cronjob:24

This PR normalizes tool_call_id values before comparing them during the pre-call sanitizer pass.

Related Issue

Fixes #9999

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • run_agent.py
    • add AIAgent._normalize_tool_call_id()
    • normalize assistant/tool call IDs in _get_tool_call_id_static()
    • normalize tool-result IDs before building result_call_ids
    • normalize tool-result IDs when stripping orphaned tool messages
  • tests/run_agent/test_run_agent.py
    • add regression test that preserves a valid tool result when tool_call_id has leading whitespace
    • add regression test that still strips an orphaned tool result when tool_call_id has surrounding whitespace
    • add regression coverage showing _build_assistant_message() trims tool call IDs

How to Test

  1. Run:
    uv run pytest tests/run_agent/test_run_agent.py -q -o addopts='' -k 'tool_calls_strips_whitespace or preserves_tool_result_when_tool_call_id_has_leading_space or strips_orphaned_tool_result_even_when_id_has_whitespace'
  2. Confirm all 3 targeted tests pass.
  3. Optionally reproduce with a saved session containing:
    • assistant tool call id: functions.cronjob:24
    • tool result id: functions.cronjob:24
      and verify Hermes no longer injects the Result unavailable stub for that pair.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 15 / Python 3.12 via uv

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

Targeted verification:

3 passed, 250 deselected, 1 warning in 7.94s

Local full-suite run status:

47 failed, 11321 passed, 95 skipped, 163 warnings, 60 errors in 149.37s

The full pytest tests/ -q run does not currently pass in my local environment, so that checklist item is intentionally left unchecked.

@shm197 shm197 marked this pull request as ready for review April 15, 2026 02:08
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

run_agent sanitizer misclassifies tool results when tool_call_id has surrounding whitespace

2 participants