Skip to content

fix(computer_use): preserve app context for capture_after; fix element label parsing (#24170 bugs 2 & 5)#24242

Closed
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/24170-computer-use-app-context-labels
Closed

fix(computer_use): preserve app context for capture_after; fix element label parsing (#24170 bugs 2 & 5)#24242
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/24170-computer-use-app-context-labels

Conversation

@Bartok9

@Bartok9 Bartok9 commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes bugs 2 and 5 from #24170 (cua-driver v0.1.6 hands-on testing report). Bugs 3 and 4 were addressed in #24181; bug 1 (localized app name matching) is tracked separately.


Bug 2 — capture_after=True loses app context after actions

Root cause: _maybe_follow_capture() in tool.py always called backend.capture(mode='som') with no app=, so cua-driver enumerated list_windows fresh and returned the frontmost window — which could be a different app if an action (click, drag, type) caused an OS focus shift.

Fix: Add _last_app: Optional[str] to CuaDriverBackend. capture() and focus_app() both populate it with the resolved app name. _maybe_follow_capture() reads the attribute via getattr(backend, '_last_app', None) and passes it to the follow-up capture, so the same app is re-targeted regardless of which window currently has OS focus.


Bug 5 — Element labels stripped in capture results

Root cause: _ELEMENT_LINE_RE was written for the classic cua-driver format:

  - [14] AXButton "One"

cua-driver v0.1.6 changed the format to:

[14] AXButton (1) id=One

The old regex never matched the new format, so _parse_elements_from_tree() silently returned empty label fields for all elements, making element identification by name impossible.

Fix: Updated _ELEMENT_LINE_RE to capture both:

  • Group 3: quoted label (classic format)
  • Group 4: id= label (new format)

_parse_elements_from_tree() now uses group(3) or group(4) or ''. Both formats in the same tree are handled correctly.


Changes

File Change
tools/computer_use/cua_backend.py Add _last_app field; update _ELEMENT_LINE_RE + _parse_elements_from_tree; set _last_app in capture() and focus_app()
tools/computer_use/tool.py _maybe_follow_capture reads _last_app from backend for app-aware follow-up
tests/tools/test_computer_use.py 5 regression tests: 3 for label parsing (classic/new/mixed), 2 for capture_after context

Testing

pytest tests/tools/test_computer_use.py -v  # 49 passed

Fixes #24170 (bugs 2 and 5)

…t label parsing (NousResearch#24170 bugs 2 & 5)

Bug 2 (capture_after=True loses app context):
_maybe_follow_capture called backend.capture(mode='som') with no app=,
causing cua-driver to capture the frontmost window instead of the app
targeted by the preceding capture/focus_app. Fix: track _last_app on
CuaDriverBackend and thread it through the follow-up capture call so
the same app is re-captured regardless of which window has OS focus.

Bug 5 (element labels stripped in capture results):
_ELEMENT_LINE_RE matched the classic '  - [N] AXRole "label"' format
but not the '[N] AXRole (order) id=Label' format introduced in
cua-driver v0.1.6. All element labels were silently dropped as empty
strings, making element identification impossible.

Fix: extend regex to capture both group(3) (quoted label) and group(4)
(id= label), and update _parse_elements_from_tree to use group(4) as
fallback. Both old and new cua-driver output now produce populated
UIElement.label values.

focus_app() now also sets _last_app so that capture_after= on any
subsequent action re-targets the focused app.

5 new regression tests added.

Part of NousResearch#24170 (bugs 1 and 3/4 addressed separately).
@alt-glitch alt-glitch added type/bug Something isn't working comp/tools Tool registry, model_tools, toolsets P2 Medium — degraded but workaround exists labels May 12, 2026
briandevans added a commit to briandevans/hermes-agent that referenced this pull request May 19, 2026
…using frontmost (NousResearch#24170 bug 1)

`CuaDriverBackend.capture(app=X)` and `focus_app(app=X)` silently fell back
to the frontmost on-screen window when X matched no app — typically a
menu-bar utility (e.g. "Fuwari" in the bug reporter's case) rather than
the requested app. The agent then received UI elements for the wrong app
and clicked / typed into it.

The root cause is a localized macOS app name mismatch: `list_windows`
returns the localized `app_name` (e.g. "計算機" on a Japanese/Chinese
system) but callers naturally pass the English name ("Calculator"). The
substring filter doesn't match, and the code falls through to picking the
frontmost window with no signal that the filter was effectively dropped.

Fix:

- `capture(app=…)`: when the filter matches nothing, return a
  `CaptureResult` with empty `app`/`elements` and a diagnostic
  `window_title` pointing the caller at `list_apps` and noting the
  localized-name convention. `_active_pid` / `_active_window_id` are left
  untouched so a subsequent action doesn't inadvertently hit the wrong
  process.
- `focus_app(app=…)`: when the filter matches nothing, set `target = None`
  and let the existing `return ActionResult(ok=False, …, "No on-screen
  window found for app …")` path fire instead of falsely reporting success
  on the frontmost window.

This addresses bug 1 only from NousResearch#24170. Bugs 2 & 5 are addressed in
NousResearch#24242, bugs 3 & 4 in NousResearch#24181.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@teknium1

Copy link
Copy Markdown
Contributor

Salvaged via PR #30046 (commit 4cc1887 on main). Your commit was cherry-picked onto current main with your authorship preserved. Thanks for the fix! Closes #24170 bugs 2 (capture_after app context) and 5 (element label regex).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tools Tool registry, model_tools, toolsets P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] computer_use toolset: 5 bugs found during hands-on testing (macOS 26.4.1, cua-driver v0.1.6)

3 participants