Environment
Summary
During thorough testing of the computer_use toolset (Path A wrapper), I found 5 bugs. The raw MCP path (mcp_cua_driver_*) works correctly for all of these — the bugs are specific to the Hermes wrapper layer.
Bug 1: app= parameter ignored on initial capture
Repro:
computer_use(action="capture", mode="som", app="Calculator")
Expected: Captures the Calculator app window.
Actual: Captures the frontmost app (in my case, "Fuwari" — a menu bar utility). The app parameter is completely ignored on the first capture call.
Workaround: Call focus_app(app="計算機") first, then capture(mode="som"). Note: the app name must match the macOS localized name (e.g., "計算機" not "Calculator").
Bug 2: capture_after=True loses app context after actions
Repro:
computer_use(action="capture", mode="som", app="計算機") # Works after focus_app
computer_use(action="click", element=14, capture_after=True)
Expected: Clicks element 14, then recaptures the same app (計算機) for verification.
Actual: The click itself succeeds, but the follow-up capture reverts to the wrong app (Fuwari in my case). The app context is lost between the action and the post-action capture.
Workaround: Don't use capture_after=True. Click first, then separately call focus_app + capture.
Bug 3: type action broken — "Unknown tool: type_text_chars"
Repro:
computer_use(action="type", text="hello")
Expected: Types "hello" into the focused element.
Actual: Error: cua-driver error: Invalid params: Unknown tool: type_text_chars
The wrapper appears to map action="type" to a type_text_chars tool that doesn't exist in cua-driver. The correct MCP tool name is type_text.
Workaround: Use mcp_cua_driver_type_text(pid=..., text="hello") directly.
Bug 4: drag action not supported
Repro:
computer_use(action="drag", from_coordinate=[100,200], to_coordinate=[400,500])
Expected: Performs a drag gesture from (100,200) to (400,500).
Actual: Error: drag is not supported by the cua-driver backend.
However, the raw MCP tool mcp_cua_driver_drag works perfectly. The wrapper simply hasn't implemented the drag action mapping.
Workaround: Use mcp_cua_driver_drag(pid=..., from_x=100, from_y=200, to_x=400, to_y=500).
Bug 5: Element labels stripped in capture results
Repro:
computer_use(action="capture", mode="som", app="計算機")
Actual output (Path A):
#14 AXButton "" @ (0, 0, 0, 0)
#15 AXButton "" @ (0, 0, 0, 0)
Expected output (raw MCP mcp_cua_driver_get_window_state):
[14] AXButton (1) id=One
[15] AXButton (2) id=Two
All element labels are empty strings in the Path A wrapper output, while the raw MCP path preserves them. This makes element identification impossible without trial-and-error clicking.
Workaround: Use raw MCP for discovery, then use element indices with Path A if needed.
E2E Verification
Despite these bugs, I verified the underlying mechanism works via the raw MCP path:
- Calculator test: 12 + 3 = 15 ✅ (using
mcp_cua_driver_get_window_state → mcp_cua_driver_click sequence)
- All 7 click operations succeeded
- Result verified via AX tree query
Suggested Fixes
- Bug 1: Ensure
app= parameter is passed through to the cua-driver focus/lookup before capture
- Bug 2: Preserve the app context (pid/window_id) across action → capture_after calls
- Bug 3: Map
action="type" to type_text instead of type_text_chars
- Bug 4: Implement drag action mapping to
mcp_cua_driver_drag
- Bug 5: Include element labels (AXTitle/AXDescription) in the capture output
Related
Environment
Summary
During thorough testing of the
computer_usetoolset (Path A wrapper), I found 5 bugs. The raw MCP path (mcp_cua_driver_*) works correctly for all of these — the bugs are specific to the Hermes wrapper layer.Bug 1:
app=parameter ignored on initialcaptureRepro:
Expected: Captures the Calculator app window.
Actual: Captures the frontmost app (in my case, "Fuwari" — a menu bar utility). The
appparameter is completely ignored on the first capture call.Workaround: Call
focus_app(app="計算機")first, thencapture(mode="som"). Note: the app name must match the macOS localized name (e.g., "計算機" not "Calculator").Bug 2:
capture_after=Trueloses app context after actionsRepro:
Expected: Clicks element 14, then recaptures the same app (計算機) for verification.
Actual: The click itself succeeds, but the follow-up capture reverts to the wrong app (Fuwari in my case). The app context is lost between the action and the post-action capture.
Workaround: Don't use
capture_after=True. Click first, then separately callfocus_app+capture.Bug 3:
typeaction broken — "Unknown tool: type_text_chars"Repro:
Expected: Types "hello" into the focused element.
Actual: Error:
cua-driver error: Invalid params: Unknown tool: type_text_charsThe wrapper appears to map
action="type"to atype_text_charstool that doesn't exist in cua-driver. The correct MCP tool name istype_text.Workaround: Use
mcp_cua_driver_type_text(pid=..., text="hello")directly.Bug 4:
dragaction not supportedRepro:
Expected: Performs a drag gesture from (100,200) to (400,500).
Actual: Error:
drag is not supported by the cua-driver backend.However, the raw MCP tool
mcp_cua_driver_dragworks perfectly. The wrapper simply hasn't implemented the drag action mapping.Workaround: Use
mcp_cua_driver_drag(pid=..., from_x=100, from_y=200, to_x=400, to_y=500).Bug 5: Element labels stripped in capture results
Repro:
Actual output (Path A):
Expected output (raw MCP
mcp_cua_driver_get_window_state):All element labels are empty strings in the Path A wrapper output, while the raw MCP path preserves them. This makes element identification impossible without trial-and-error clicking.
Workaround: Use raw MCP for discovery, then use element indices with Path A if needed.
E2E Verification
Despite these bugs, I verified the underlying mechanism works via the raw MCP path:
mcp_cua_driver_get_window_state→mcp_cua_driver_clicksequence)Suggested Fixes
app=parameter is passed through to the cua-driver focus/lookup before captureaction="type"totype_textinstead oftype_text_charsmcp_cua_driver_dragRelated