Skip to content

fix(computer_use): two bugs blocking cua-driver integration#24232

Closed
iceTruth wants to merge 1 commit into
NousResearch:mainfrom
iceTruth:fix/computer-use-cua-driver-bugs
Closed

fix(computer_use): two bugs blocking cua-driver integration#24232
iceTruth wants to merge 1 commit into
NousResearch:mainfrom
iceTruth:fix/computer-use-cua-driver-bugs

Conversation

@iceTruth

Copy link
Copy Markdown

Summary

Two bugs prevent the computer_use tool from working correctly with cua-driver v0.1.6+.

Bug 1: type_text_chars tool does not exist

cua_backend.py line 535 calls self._action("type_text_chars", ...), but cua-driver exposes type_text (not type_text_chars). Every computer_use(action="type") call fails with:

McpError: Invalid params: Unknown tool: type_text_chars

cua-driver's type_text already handles both AXSetAttribute (fast, background-safe) and CGEvent fallback (for apps like Safari WebKit that reject AX text insertion). The old type_text_chars name appears to have been written against an earlier or hypothetical cua-driver API.

Fix: Change "type_text_chars""type_text".

Bug 2: Provider-profile path skips image→text fallback

When the active model lacks vision (supports_vision=False, e.g. GLM-5.1) but the provider has a registered profile, _prepare_messages_for_non_vision_model() is never called. The raw image_url parts in multimodal tool results reach the model, which rejects them → _vision_supported=False → screenshots stripped entirely.

The image→text fallback (which calls vision_analyze via the auxiliary vision model to produce text descriptions) only ran on the legacy no-profile path. Models behind a provider profile got no fallback.

Fix: Call _prepare_messages_for_non_vision_model(api_messages) in the provider-profile path too, before building kwargs.

Testing

  • Verified cua-driver v0.1.6 tool list: type_text exists, type_text_chars does not
  • Confirmed _model_supports_vision() returns False for GLM-5.1
  • After fix, type_text works: cua-driver call type_text '{...}' succeeds
  • After fix, non-vision models with provider profiles get vision_analyze text descriptions for screenshots instead of raw image_url parts

Files changed

  • tools/computer_use/cua_backend.py — fix tool name
  • run_agent.py — add _prepare_messages_for_non_vision_model call in provider-profile path

1. type_text_chars → type_text: cua-driver v0.1.6+ exposes type_text
   (AXSetAttribute with CGEvent fallback), not type_text_chars.
   The old name caused 'Unknown tool: type_text_chars' errors on
   every type action.

2. Provider-profile path skipped image→text fallback: when the
   active model lacks vision (e.g. GLM-5.1) but the provider has
   a registered profile, _prepare_messages_for_non_vision_model()
   was never called. Raw image_url parts reached the model, which
   rejected them, triggering _vision_supported=False and stripping
   screenshots entirely. Now the profile path also runs the
   fallback, which calls vision_analyze (auxiliary vision model)
   to produce text descriptions.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder tool/vision Vision analysis and image generation labels May 12, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Bug 1 (type_text_chars → type_text) overlaps with PR #24181 which already fixes the same MCP tool name. Bug 2 (vision fallback for provider-profile path) is a new fix not covered elsewhere. Related: #24170 (parent bug report).

@teknium1

Copy link
Copy Markdown
Contributor

Bug 3 (type_text MCP name) fixed in PR #30032 (cherry-picked from #24181). Thanks for independently catching the same bug — credit noted in the salvage PR body. The unrelated vision-fallback change in run_agent.py was kept out of scope and will be addressed in the vision-routing cluster (#24015 / #24070 / #29407).

@iceTruth

iceTruth commented Jun 3, 2026

Copy link
Copy Markdown
Author

Related Issues

This PR fixes two bugs that have been reported separately:

Bug 1 (type_text_charstype_text):

Bug 2 (vision fallback skipped for provider-profile path):

Both fixes are minimal (5 lines changed total) and have been tested in production for 3+ weeks. Would appreciate a review — happy to adjust if the approach needs changes.

@teknium1

Copy link
Copy Markdown
Contributor

This appears to be implemented on current main. Automated hermes-sweeper review found both fixes from this PR already landed, split across later mainline commits.

Evidence:

  • tools/computer_use/cua_backend.py:653 now dispatches type_text() via self._action("type_text", ...), so the stale type_text_chars MCP tool name is gone. This landed in 18cd1e5c728ddf93a854ac9818f527013a9f6daf.
  • tests/tools/test_computer_use.py:180 covers the dispatch path and explicitly documents the type_text_chars regression.
  • The provider-profile vision fallback now lives in the post-refactor helper: agent/chat_completion_helpers.py:734 calls agent._prepare_messages_for_non_vision_model(api_messages) before building kwargs for registered providers. This landed in 563b4d9e51a46cc421e327b351cb7efe1ccb151b.
  • run_agent.py:4405 confirms the fallback contract: non-vision models have image parts replaced with cached vision_analyze text descriptions, while vision-capable models pass through unchanged.

Thanks for catching and testing both issues independently; the discussion here was useful in verifying the current mainline behavior.

@teknium1 teknium1 closed this Jun 11, 2026
@teknium1 teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists sweeper:implemented-on-main Sweeper: behavior already present on current main tool/vision Vision analysis and image generation type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants