fix(computer_use): preserve app context for capture_after; fix element label parsing (#24170 bugs 2 & 5)#24242
Closed
Bartok9 wants to merge 1 commit into
Closed
Conversation
…t label parsing (NousResearch#24170 bugs 2 & 5) Bug 2 (capture_after=True loses app context): _maybe_follow_capture called backend.capture(mode='som') with no app=, causing cua-driver to capture the frontmost window instead of the app targeted by the preceding capture/focus_app. Fix: track _last_app on CuaDriverBackend and thread it through the follow-up capture call so the same app is re-captured regardless of which window has OS focus. Bug 5 (element labels stripped in capture results): _ELEMENT_LINE_RE matched the classic ' - [N] AXRole "label"' format but not the '[N] AXRole (order) id=Label' format introduced in cua-driver v0.1.6. All element labels were silently dropped as empty strings, making element identification impossible. Fix: extend regex to capture both group(3) (quoted label) and group(4) (id= label), and update _parse_elements_from_tree to use group(4) as fallback. Both old and new cua-driver output now produce populated UIElement.label values. focus_app() now also sets _last_app so that capture_after= on any subsequent action re-targets the focused app. 5 new regression tests added. Part of NousResearch#24170 (bugs 1 and 3/4 addressed separately).
19 tasks
19 tasks
briandevans
added a commit
to briandevans/hermes-agent
that referenced
this pull request
May 19, 2026
…using frontmost (NousResearch#24170 bug 1) `CuaDriverBackend.capture(app=X)` and `focus_app(app=X)` silently fell back to the frontmost on-screen window when X matched no app — typically a menu-bar utility (e.g. "Fuwari" in the bug reporter's case) rather than the requested app. The agent then received UI elements for the wrong app and clicked / typed into it. The root cause is a localized macOS app name mismatch: `list_windows` returns the localized `app_name` (e.g. "計算機" on a Japanese/Chinese system) but callers naturally pass the English name ("Calculator"). The substring filter doesn't match, and the code falls through to picking the frontmost window with no signal that the filter was effectively dropped. Fix: - `capture(app=…)`: when the filter matches nothing, return a `CaptureResult` with empty `app`/`elements` and a diagnostic `window_title` pointing the caller at `list_apps` and noting the localized-name convention. `_active_pid` / `_active_window_id` are left untouched so a subsequent action doesn't inadvertently hit the wrong process. - `focus_app(app=…)`: when the filter matches nothing, set `target = None` and let the existing `return ActionResult(ok=False, …, "No on-screen window found for app …")` path fire instead of falsely reporting success on the frontmost window. This addresses bug 1 only from NousResearch#24170. Bugs 2 & 5 are addressed in NousResearch#24242, bugs 3 & 4 in NousResearch#24181. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes bugs 2 and 5 from #24170 (cua-driver v0.1.6 hands-on testing report). Bugs 3 and 4 were addressed in #24181; bug 1 (localized app name matching) is tracked separately.
Bug 2 —
capture_after=Trueloses app context after actionsRoot cause:
_maybe_follow_capture()intool.pyalways calledbackend.capture(mode='som')with noapp=, so cua-driver enumeratedlist_windowsfresh and returned the frontmost window — which could be a different app if an action (click, drag, type) caused an OS focus shift.Fix: Add
_last_app: Optional[str]toCuaDriverBackend.capture()andfocus_app()both populate it with the resolved app name._maybe_follow_capture()reads the attribute viagetattr(backend, '_last_app', None)and passes it to the follow-up capture, so the same app is re-targeted regardless of which window currently has OS focus.Bug 5 — Element labels stripped in capture results
Root cause:
_ELEMENT_LINE_REwas written for the classic cua-driver format:cua-driver v0.1.6 changed the format to:
The old regex never matched the new format, so
_parse_elements_from_tree()silently returned emptylabelfields for all elements, making element identification by name impossible.Fix: Updated
_ELEMENT_LINE_REto capture both:id=label (new format)_parse_elements_from_tree()now usesgroup(3) or group(4) or ''. Both formats in the same tree are handled correctly.Changes
tools/computer_use/cua_backend.py_last_appfield; update_ELEMENT_LINE_RE+_parse_elements_from_tree; set_last_appincapture()andfocus_app()tools/computer_use/tool.py_maybe_follow_capturereads_last_appfrom backend for app-aware follow-uptests/tools/test_computer_use.pyTesting
Fixes #24170 (bugs 2 and 5)