Skip to content

[Bug] computer_use toolset: 5 bugs found during hands-on testing (macOS 26.4.1, cua-driver v0.1.6) #24170

@AkiGarage

Description

@AkiGarage

Environment

Summary

During thorough testing of the computer_use toolset (Path A wrapper), I found 5 bugs. The raw MCP path (mcp_cua_driver_*) works correctly for all of these — the bugs are specific to the Hermes wrapper layer.


Bug 1: app= parameter ignored on initial capture

Repro:

computer_use(action="capture", mode="som", app="Calculator")

Expected: Captures the Calculator app window.

Actual: Captures the frontmost app (in my case, "Fuwari" — a menu bar utility). The app parameter is completely ignored on the first capture call.

Workaround: Call focus_app(app="計算機") first, then capture(mode="som"). Note: the app name must match the macOS localized name (e.g., "計算機" not "Calculator").


Bug 2: capture_after=True loses app context after actions

Repro:

computer_use(action="capture", mode="som", app="計算機")  # Works after focus_app
computer_use(action="click", element=14, capture_after=True)

Expected: Clicks element 14, then recaptures the same app (計算機) for verification.

Actual: The click itself succeeds, but the follow-up capture reverts to the wrong app (Fuwari in my case). The app context is lost between the action and the post-action capture.

Workaround: Don't use capture_after=True. Click first, then separately call focus_app + capture.


Bug 3: type action broken — "Unknown tool: type_text_chars"

Repro:

computer_use(action="type", text="hello")

Expected: Types "hello" into the focused element.

Actual: Error: cua-driver error: Invalid params: Unknown tool: type_text_chars

The wrapper appears to map action="type" to a type_text_chars tool that doesn't exist in cua-driver. The correct MCP tool name is type_text.

Workaround: Use mcp_cua_driver_type_text(pid=..., text="hello") directly.


Bug 4: drag action not supported

Repro:

computer_use(action="drag", from_coordinate=[100,200], to_coordinate=[400,500])

Expected: Performs a drag gesture from (100,200) to (400,500).

Actual: Error: drag is not supported by the cua-driver backend.

However, the raw MCP tool mcp_cua_driver_drag works perfectly. The wrapper simply hasn't implemented the drag action mapping.

Workaround: Use mcp_cua_driver_drag(pid=..., from_x=100, from_y=200, to_x=400, to_y=500).


Bug 5: Element labels stripped in capture results

Repro:

computer_use(action="capture", mode="som", app="計算機")

Actual output (Path A):

#14 AXButton "" @ (0, 0, 0, 0)
#15 AXButton "" @ (0, 0, 0, 0)

Expected output (raw MCP mcp_cua_driver_get_window_state):

[14] AXButton (1) id=One
[15] AXButton (2) id=Two

All element labels are empty strings in the Path A wrapper output, while the raw MCP path preserves them. This makes element identification impossible without trial-and-error clicking.

Workaround: Use raw MCP for discovery, then use element indices with Path A if needed.


E2E Verification

Despite these bugs, I verified the underlying mechanism works via the raw MCP path:

  • Calculator test: 12 + 3 = 15 ✅ (using mcp_cua_driver_get_window_statemcp_cua_driver_click sequence)
  • All 7 click operations succeeded
  • Result verified via AX tree query

Suggested Fixes

  1. Bug 1: Ensure app= parameter is passed through to the cua-driver focus/lookup before capture
  2. Bug 2: Preserve the app context (pid/window_id) across action → capture_after calls
  3. Bug 3: Map action="type" to type_text instead of type_text_chars
  4. Bug 4: Implement drag action mapping to mcp_cua_driver_drag
  5. Bug 5: Include element labels (AXTitle/AXDescription) in the capture output

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/toolsTool registry, model_tools, toolsetstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions