Skip to content

fix(computer_use): correct type_text MCP tool name and implement drag action#24181

Closed
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/issue-24170-computer-use-type-and-drag
Closed

fix(computer_use): correct type_text MCP tool name and implement drag action#24181
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/issue-24170-computer-use-type-and-drag

Conversation

@liuhao1024

Copy link
Copy Markdown
Contributor

Summary

Fixes two of the five bugs reported in #24170 for the computer_use toolset wrapper layer:

  1. Bug 3: type action fails with "Unknown tool: type_text_chars" — The cua_backend.type_text() method called MCP tool type_text_chars which does not exist in current cua-driver. Changed to type_text, the correct MCP tool name.

  2. Bug 4: drag action returns "not supported" error — The drag() method in CuaDriverBackend returned a hardcoded error even though cua-driver exposes a drag MCP tool. Implemented proper drag dispatching with both coordinate-based and element-based targeting, following the same pattern as click() and scroll().

Additionally added dispatch-level validation in tool.py to ensure drag receives either coordinates or element indices before calling any backend, providing a consistent error message across all backend implementations.

Changes

File Change
tools/computer_use/cua_backend.py Fix type_text() MCP tool name; implement drag()
tools/computer_use/tool.py Add dispatch-level validation for drag action
tests/tools/test_computer_use.py Add 4 regression tests for type and drag dispatching

Testing

  • All 48 tests in tests/tools/test_computer_use.py pass
  • New tests verify: type action routes to type_text, drag routes with coordinates, drag routes with elements, drag without targets returns error

Remaining bugs from #24170

Bugs 1 (app= ignored on initial capture), 2 (capture_after=True loses app context), and 5 (element labels stripped) involve state management in the capture/app-targeting flow and are not addressed in this PR.

Fixes #24170 (bugs 3 and 4)

… action

Bug 3: The cua_backend type_text() method called MCP tool 'type_text_chars'
which does not exist in current cua-driver. Changed to 'type_text' which is
the correct MCP tool name.

Bug 4: The drag() method returned a hardcoded 'not supported' error even
though cua-driver exposes a 'drag' MCP tool. Implemented proper drag
dispatching with coordinate-based and element-based targeting.

Added dispatch-level validation for drag to ensure from/to coordinates
or elements are provided before calling any backend.

Fixes NousResearch#24170 (bugs 3 and 4)
@daimon-nous daimon-nous Bot added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/tools Tool registry, model_tools, toolsets labels May 12, 2026
briandevans added a commit to briandevans/hermes-agent that referenced this pull request May 19, 2026
…using frontmost (NousResearch#24170 bug 1)

`CuaDriverBackend.capture(app=X)` and `focus_app(app=X)` silently fell back
to the frontmost on-screen window when X matched no app — typically a
menu-bar utility (e.g. "Fuwari" in the bug reporter's case) rather than
the requested app. The agent then received UI elements for the wrong app
and clicked / typed into it.

The root cause is a localized macOS app name mismatch: `list_windows`
returns the localized `app_name` (e.g. "計算機" on a Japanese/Chinese
system) but callers naturally pass the English name ("Calculator"). The
substring filter doesn't match, and the code falls through to picking the
frontmost window with no signal that the filter was effectively dropped.

Fix:

- `capture(app=…)`: when the filter matches nothing, return a
  `CaptureResult` with empty `app`/`elements` and a diagnostic
  `window_title` pointing the caller at `list_apps` and noting the
  localized-name convention. `_active_pid` / `_active_window_id` are left
  untouched so a subsequent action doesn't inadvertently hit the wrong
  process.
- `focus_app(app=…)`: when the filter matches nothing, set `target = None`
  and let the existing `return ActionResult(ok=False, …, "No on-screen
  window found for app …")` path fire instead of falsely reporting success
  on the frontmost window.

This addresses bug 1 only from NousResearch#24170. Bugs 2 & 5 are addressed in
NousResearch#24242, bugs 3 & 4 in NousResearch#24181.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@teknium1

Copy link
Copy Markdown
Contributor

Salvaged via PR #30032 (commit 18cd1e5 on main). Your commit was cherry-picked onto current main with your authorship preserved. Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tools Tool registry, model_tools, toolsets P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] computer_use toolset: 5 bugs found during hands-on testing (macOS 26.4.1, cua-driver v0.1.6)

2 participants