fix(computer_use): two bugs blocking cua-driver integration by iceTruth · Pull Request #24232 · NousResearch/hermes-agent

iceTruth · 2026-05-12T07:23:43Z

Summary

Two bugs prevent the computer_use tool from working correctly with cua-driver v0.1.6+.

Bug 1: `type_text_chars` tool does not exist

cua_backend.py line 535 calls self._action("type_text_chars", ...), but cua-driver exposes type_text (not type_text_chars). Every computer_use(action="type") call fails with:

McpError: Invalid params: Unknown tool: type_text_chars

cua-driver's type_text already handles both AXSetAttribute (fast, background-safe) and CGEvent fallback (for apps like Safari WebKit that reject AX text insertion). The old type_text_chars name appears to have been written against an earlier or hypothetical cua-driver API.

Fix: Change "type_text_chars" → "type_text".

Bug 2: Provider-profile path skips image→text fallback

When the active model lacks vision (supports_vision=False, e.g. GLM-5.1) but the provider has a registered profile, _prepare_messages_for_non_vision_model() is never called. The raw image_url parts in multimodal tool results reach the model, which rejects them → _vision_supported=False → screenshots stripped entirely.

The image→text fallback (which calls vision_analyze via the auxiliary vision model to produce text descriptions) only ran on the legacy no-profile path. Models behind a provider profile got no fallback.

Fix: Call _prepare_messages_for_non_vision_model(api_messages) in the provider-profile path too, before building kwargs.

Testing

Verified cua-driver v0.1.6 tool list: type_text exists, type_text_chars does not
Confirmed _model_supports_vision() returns False for GLM-5.1
After fix, type_text works: cua-driver call type_text '{...}' succeeds
After fix, non-vision models with provider profiles get vision_analyze text descriptions for screenshots instead of raw image_url parts

Files changed

tools/computer_use/cua_backend.py — fix tool name
run_agent.py — add _prepare_messages_for_non_vision_model call in provider-profile path

1. type_text_chars → type_text: cua-driver v0.1.6+ exposes type_text (AXSetAttribute with CGEvent fallback), not type_text_chars. The old name caused 'Unknown tool: type_text_chars' errors on every type action. 2. Provider-profile path skipped image→text fallback: when the active model lacks vision (e.g. GLM-5.1) but the provider has a registered profile, _prepare_messages_for_non_vision_model() was never called. Raw image_url parts reached the model, which rejected them, triggering _vision_supported=False and stripping screenshots entirely. Now the profile path also runs the fallback, which calls vision_analyze (auxiliary vision model) to produce text descriptions.

alt-glitch · 2026-05-12T07:38:42Z

Bug 1 (type_text_chars → type_text) overlaps with PR #24181 which already fixes the same MCP tool name. Bug 2 (vision fallback for provider-profile path) is a new fix not covered elsewhere. Related: #24170 (parent bug report).

teknium1 · 2026-05-21T21:08:49Z

Bug 3 (type_text MCP name) fixed in PR #30032 (cherry-picked from #24181). Thanks for independently catching the same bug — credit noted in the salvage PR body. The unrelated vision-fallback change in run_agent.py was kept out of scope and will be addressed in the vision-routing cluster (#24015 / #24070 / #29407).

iceTruth · 2026-06-03T06:19:45Z

Related Issues

This PR fixes two bugs that have been reported separately:

Bug 1 (type_text_chars → type_text):

Reported as Bug 3 in [Bug] computer_use toolset: 5 bugs found during hands-on testing (macOS 26.4.1, cua-driver v0.1.6) #24170 — root cause confirmed by @arspesk (same line cua_backend.py:540), workaround applied locally by multiple users, but fix never merged into mainline.

Bug 2 (vision fallback skipped for provider-profile path):

Reported in computer_use (cua-driver backend) is too fragile and breaks auxiliary vision routing #32766 — "computer_use breaks auxiliary vision routing" when using text-only models with custom providers
Also related to Vision-capable model detection missing for custom providers — HTTP 400 'text is not set' #25594 — "Vision-capable model detection missing for custom providers", which identifies the same root cause: _prepare_messages_for_non_vision_model() is not called in the provider-profile code path

Both fixes are minimal (5 lines changed total) and have been tested in production for 3+ weeks. Would appreciate a review — happy to adjust if the approach needs changes.

teknium1 · 2026-06-11T18:22:19Z

This appears to be implemented on current main. Automated hermes-sweeper review found both fixes from this PR already landed, split across later mainline commits.

Evidence:

tools/computer_use/cua_backend.py:653 now dispatches type_text() via self._action("type_text", ...), so the stale type_text_chars MCP tool name is gone. This landed in 18cd1e5c728ddf93a854ac9818f527013a9f6daf.
tests/tools/test_computer_use.py:180 covers the dispatch path and explicitly documents the type_text_chars regression.
The provider-profile vision fallback now lives in the post-refactor helper: agent/chat_completion_helpers.py:734 calls agent._prepare_messages_for_non_vision_model(api_messages) before building kwargs for registered providers. This landed in 563b4d9e51a46cc421e327b351cb7efe1ccb151b.
run_agent.py:4405 confirms the fallback contract: non-vision models have image parts replaced with cached vision_analyze text descriptions, while vision-capable models pass through unchanged.

Thanks for catching and testing both issues independently; the discussion here was useful in verifying the current mainline behavior.

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder tool/vision Vision analysis and image generation labels May 12, 2026

briandevans mentioned this pull request May 12, 2026

fix(computer-use): surface app=… filter no-match instead of silently using frontmost (#24170 bug 1) #24324

Closed

19 tasks

teknium1 mentioned this pull request May 21, 2026

fix(computer_use): correct type_text MCP tool name and implement drag action (#24170 bugs 3 & 4) #30032

Merged

This was referenced Jun 3, 2026

computer_use (cua-driver backend) is too fragile and breaks auxiliary vision routing #32766

Open

Vision-capable model detection missing for custom providers — HTTP 400 'text is not set' #25594

Closed

teknium1 closed this Jun 11, 2026

teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(computer_use): two bugs blocking cua-driver integration#24232

fix(computer_use): two bugs blocking cua-driver integration#24232
iceTruth wants to merge 1 commit into
NousResearch:mainfrom
iceTruth:fix/computer-use-cua-driver-bugs

iceTruth commented May 12, 2026

Uh oh!

alt-glitch commented May 12, 2026

Uh oh!

teknium1 commented May 21, 2026

Uh oh!

iceTruth commented Jun 3, 2026

Uh oh!

teknium1 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

iceTruth commented May 12, 2026

Summary

Bug 1: type_text_chars tool does not exist

Bug 2: Provider-profile path skips image→text fallback

Testing

Files changed

Uh oh!

alt-glitch commented May 12, 2026

Uh oh!

teknium1 commented May 21, 2026

Uh oh!

iceTruth commented Jun 3, 2026

Related Issues

Uh oh!

teknium1 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bug 1: `type_text_chars` tool does not exist