fix(computer_use): two bugs blocking cua-driver integration#24232
fix(computer_use): two bugs blocking cua-driver integration#24232iceTruth wants to merge 1 commit into
Conversation
1. type_text_chars → type_text: cua-driver v0.1.6+ exposes type_text (AXSetAttribute with CGEvent fallback), not type_text_chars. The old name caused 'Unknown tool: type_text_chars' errors on every type action. 2. Provider-profile path skipped image→text fallback: when the active model lacks vision (e.g. GLM-5.1) but the provider has a registered profile, _prepare_messages_for_non_vision_model() was never called. Raw image_url parts reached the model, which rejected them, triggering _vision_supported=False and stripping screenshots entirely. Now the profile path also runs the fallback, which calls vision_analyze (auxiliary vision model) to produce text descriptions.
|
Bug 3 (type_text MCP name) fixed in PR #30032 (cherry-picked from #24181). Thanks for independently catching the same bug — credit noted in the salvage PR body. The unrelated vision-fallback change in run_agent.py was kept out of scope and will be addressed in the vision-routing cluster (#24015 / #24070 / #29407). |
Related IssuesThis PR fixes two bugs that have been reported separately: Bug 1 (
Bug 2 (vision fallback skipped for provider-profile path):
Both fixes are minimal (5 lines changed total) and have been tested in production for 3+ weeks. Would appreciate a review — happy to adjust if the approach needs changes. |
|
This appears to be implemented on current Evidence:
Thanks for catching and testing both issues independently; the discussion here was useful in verifying the current mainline behavior. |
Summary
Two bugs prevent the
computer_usetool from working correctly with cua-driver v0.1.6+.Bug 1:
type_text_charstool does not existcua_backend.pyline 535 callsself._action("type_text_chars", ...), but cua-driver exposestype_text(nottype_text_chars). Everycomputer_use(action="type")call fails with:cua-driver's
type_textalready handles both AXSetAttribute (fast, background-safe) and CGEvent fallback (for apps like Safari WebKit that reject AX text insertion). The oldtype_text_charsname appears to have been written against an earlier or hypothetical cua-driver API.Fix: Change
"type_text_chars"→"type_text".Bug 2: Provider-profile path skips image→text fallback
When the active model lacks vision (
supports_vision=False, e.g. GLM-5.1) but the provider has a registered profile,_prepare_messages_for_non_vision_model()is never called. The rawimage_urlparts in multimodal tool results reach the model, which rejects them →_vision_supported=False→ screenshots stripped entirely.The image→text fallback (which calls
vision_analyzevia the auxiliary vision model to produce text descriptions) only ran on the legacy no-profile path. Models behind a provider profile got no fallback.Fix: Call
_prepare_messages_for_non_vision_model(api_messages)in the provider-profile path too, before building kwargs.Testing
type_textexists,type_text_charsdoes not_model_supports_vision()returnsFalsefor GLM-5.1type_textworks:cua-driver call type_text '{...}'succeedsFiles changed
tools/computer_use/cua_backend.py— fix tool namerun_agent.py— add_prepare_messages_for_non_vision_modelcall in provider-profile path