Skip to content

fix(agent): keep image tool results from poisoning text-only sessions#25903

Closed
helix4u wants to merge 1 commit into
NousResearch:mainfrom
helix4u:fix/text-only-image-tool-results
Closed

fix(agent): keep image tool results from poisoning text-only sessions#25903
helix4u wants to merge 1 commit into
NousResearch:mainfrom
helix4u:fix/text-only-image-tool-results

Conversation

@helix4u

@helix4u helix4u commented May 14, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Prevents computer_use screenshot results from poisoning text-only model sessions.

When computer_use returns a screenshot result while the active model/provider does not support image input, Hermes now stores a clean tool error instead of appending raw image_url content to the canonical conversation history. That avoids the repeated 400 loop where every later user turn resends the rejected image message before the agent can recover.

This is intentionally narrower than the existing provider-profile image fallback PRs. It does not try to make a text-only model operate the desktop from an auxiliary vision description; it fails cleanly and tells the model/user to switch to a vision-capable model for desktop computer use.

Related Issue

Related: #23733

Related but not duplicate:

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • run_agent.py: added _tool_result_content_for_active_model() so multimodal tool results are adapted before they enter session history.
  • run_agent.py: converts computer_use screenshot results into a clear tool error for text-only active models/providers.
  • run_agent.py: preserves screenshot/image tool results for vision-capable active models.
  • run_agent.py: recognizes DeepSeek's exact text-only image rejection wording in adaptive recovery.
  • tests/tools/test_computer_use.py: covers text-only and vision-capable handling for computer_use multimodal results.

How to Test

  1. Use computer_use with a text-only OpenAI-compatible provider such as direct DeepSeek.
  2. Confirm Hermes records a tool error instead of a raw image_url tool result.
  3. Continue the same session and confirm later turns do not repeat the same provider-side image deserialization 400.

Targeted tests run locally:

pytest -q tests/tools/test_computer_use.py::TestRunAgentMultimodalHelpers tests/run_agent/test_vision_aware_preprocessing.py

Result: 19 passed in 2.89s

Full suite run locally:

scripts/run_tests.sh

Result: failed in 0:11:36 with 62 failed, 22918 passed, 61 skipped, 244 warnings, 19 errors.

The full-suite failures/errors are broad existing/global areas outside this PR's two touched files. Examples include tests/cli/test_cli_save_config_value.py, tests/agent/lsp/test_client_e2e.py, tests/gateway/test_approve_deny_commands.py, tests/run_agent/test_compression_feasibility.py, process/terminal timeout cleanup tests, and tests/tools/test_browser_supervisor.py teardown errors from the live-system guard blocking os.kill(...) on spawned Chrome PIDs.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: WSL2 / Linux targeted unit tests

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

N/A

Screenshots / Logs

Support repro showed direct DeepSeek rejecting the post-computer_use request with:

messages[10]: unknown variant image_url, expected text

@helix4u helix4u force-pushed the fix/text-only-image-tool-results branch from ec5ae8b to ae3e796 Compare May 14, 2026 19:28
@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder tool/vision Vision analysis and image generation P2 Medium — degraded but workaround exists labels May 14, 2026
@helix4u helix4u marked this pull request as ready for review May 14, 2026 19:48
@teknium1

Copy link
Copy Markdown
Contributor

Merged via #25925 — your commit ae3e79637 was cherry-picked onto current main with authorship preserved in git log. Thanks for the fix! The poisoned-history class is now blocked at the tool-result write site, and DeepSeek's exact 400 wording is in the adaptive recovery list so any already-poisoned session can self-heal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists tool/vision Vision analysis and image generation type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants