cua-driver: overlay z-ordering, background focus fixes, hermes form-fill tests#1375
Conversation
…d hermes form-fill tests
## Overlay z-ordering (NSWindowLevel.normal)
Changed the agent-cursor overlay from NSWindowLevel.floating (above all
normal windows) to NSWindowLevel.normal so it is sandwiched just above
the target window: [target, overlay, fg-windows]. Foreground windows
above the target now correctly occlude the cursor, making it visually
clear which window the agent is interacting with.
## Continuous repin loop (30 fps)
Replaced the sparse one-shot defensive-repin schedule [60, 180, 360,
600, 900, 1200ms] with a continuous ~30 fps loop. macOS sometimes
raises a background window's z-level asynchronously after an AX action;
the old schedule left a gap of up to 300ms where the overlay sat below
the target. At 33ms the overlay snaps back within one frame.
## Background Safari form-fill fixes
- ClickTool: 800ms post-AXPress delay for AXTextField/AXTextArea so
WebKit establishes DOM focus before type_text_chars fires (email/text
field race condition).
- ClickTool: detect AXPopUpButton clicks, list available options and
hint to use set_value instead of click (native popup closes on bg).
- SetValueTool: two-strategy popup selection — AX child AXPress (native
AppKit) with JavaScript injection fallback for Safari/WebKit selects
that expose no AX children when the popup is closed.
- AXInput: added screenBoundingRect(), children(), stringAttribute()
helpers used by ClickTool and SetValueTool.
## Agent cursor focus rect
AgentCursorRenderer/View/Cursor: draw a cyan glowing rounded rect around
the targeted AX element's screen bbox after each click so the user sees
which element the agent targeted.
## FocusMonitorApp enhancements
Added TrackingTextField (NSTextField subclass) that holds keyboard focus
throughout the test run and tracks field-level resignFirstResponder
events. Three-level focus-loss counters (app, window key, text field)
written to /tmp/focus_monitor_{losses,key_losses,field_losses}.txt.
## Structured content for list_windows / launch_app
Both tools now return structuredContent alongside their human-readable
text so integration tests can access window/app data without parsing.
## New integration tests
- test_hermes_form_fill.py: 9 tests driving individual HTML input types
(text, password, email, number, textarea, checkbox, select, radio,
submit) in a backgrounded Safari window via the hermes CLI, asserting
no focus is stolen from FocusMonitorApp.
- test_overlay_z_order.py: asserts overlay appears at layer 0 (not
floating/layer 3) and is sandwiched between the target and any
foreground windows above it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe PR modernizes the cursor overlay system by switching from defensive repin tasks to continuous ~30 fps repinning and changing the overlay window level from Changes
Sequence Diagram(s)sequenceDiagram
participant ClickTool as ClickTool
participant AXInput as AXInput
participant AgentCursor as AgentCursor
participant Renderer as AgentCursorRenderer
participant View as AgentCursorView
participant Screen as On-Screen Display
ClickTool->>AXInput: screenBoundingRect(of: element)
AXInput-->>ClickTool: CGRect (screen space)
ClickTool->>AgentCursor: showFocusRect(rect)
AgentCursor->>Renderer: focusRect = rect
Renderer-->>View: focusRect property updated
View->>View: drawFocusHighlight()
Note over View: Rounded rect + cyan border + glow
View->>Screen: Render focus highlight overlay
Screen-->>View: ✓ Visible on screen
sequenceDiagram
participant SetValueTool as SetValueTool
participant AXInput as AXInput
participant App as Target App
participant Safari as Safari (fallback)
SetValueTool->>SetValueTool: Detect AXPopUpButton role
alt AX Children Available
SetValueTool->>AXInput: children(of: popupElement)
AXInput-->>SetValueTool: [AXUIElement]
loop Each Child
SetValueTool->>SetValueTool: Compare AXTitle/AXValue to target value
end
SetValueTool->>App: AXPress on matched option
App-->>SetValueTool: ✓ Option selected
else No AX Children & Safari
SetValueTool->>Safari: osascript: inject & execute JavaScript
Note over Safari: DOM search + dispatch change event
Safari-->>SetValueTool: ✓ select option changed
else Error
SetValueTool-->>SetValueTool: Return error with available titles
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
libs/cua-driver/Sources/CuaDriverCore/Cursor/AgentCursorOverlayWindow.swift (1)
3-11:⚠️ Potential issue | 🟡 MinorClass-level doc comment is stale after the level change.
The summary still states the overlay is "installed at
.statusBar + 1", but the window level is now.normal(line 45). Please update this paragraph so the top-of-file contract matches the new sandwiched-in-regular-stack behavior described at lines 39-45.✏️ Suggested wording
/// Transparent, click-through, borderless window used to host the agent /// cursor overlay. Shares the classic "floating HUD" recipe: no window /// chrome, no shadow, clear background, ignores mouse events so the user -/// keeps using the machine normally, installed at `.statusBar + 1` so -/// it stays above ordinary windows but below system menus and modals. +/// keeps using the machine normally. Installed at `.normal` level and +/// continuously re-pinned one slot above the target window by +/// `AgentCursor.pinAbove(pid:)`, so foreground windows above the target +/// correctly occlude the cursor (ordering: [target, overlay, fg-windows]).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/cua-driver/Sources/CuaDriverCore/Cursor/AgentCursorOverlayWindow.swift` around lines 3 - 11, Update the class-level doc for AgentCursorOverlayWindow to remove the outdated claim that the overlay is "installed at `.statusBar + 1`" and instead describe the current behavior: the window uses `.normal` level and is intentionally sandwiched in the regular window stack (above ordinary windows but below system menus/modals) while remaining borderless, transparent, click-through, and non-key/non-main; edit the opening summary sentence to reflect this new window level and stacking contract so the top-of-file comment matches the implementation in AgentCursorOverlayWindow.
🧹 Nitpick comments (3)
libs/cua-driver/Sources/CuaDriverServer/Tools/ClickTool.swift (1)
272-289: Consider narrowing the 800 ms text-field delay to web views.The rationale documents a WebKit DOM-focus race, but the guard only checks
role == "AXTextField" || role == "AXTextArea", so every element-indexed click on a native Cocoa text field (TextEdit, Notes, Mail compose, Finder rename, etc.) now adds ~0.8 s of mandatory latency per click even though those flows don't need it. Options:
- Gate on the owning app's bundle id (e.g. only delay for Safari / WebKit-hosted content), or
- Expose a
post_press_delay_msargument / config knob so interactive native-app clicks don't pay the WebKit tax, or- Poll
AXFocusedUIElement == elementwith a shorter timeout instead of sleeping a fixed duration.Not blocking — the current constant is empirically justified — but it's worth a follow-up once there's a signal path to tell web-content fields apart from native ones.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/cua-driver/Sources/CuaDriverServer/Tools/ClickTool.swift` around lines 272 - 289, The unconditional 800 ms sleep after AXPress for any element whose AXInput.describe(element) role is "AXTextField" or "AXTextArea" is adding latency for native Cocoa text fields; narrow the condition in ClickTool.swift by either (1) gating the delay on the owning app (check the element's owning app bundle id returned by AXInput.describe/owner info and only apply the sleep for WebKit/Safari bundle ids), (2) adding a configurable post_press_delay_ms parameter (expose a param on the ClickTool/press API and use that value instead of the hardcoded 800 ms), or (3) replace the fixed Task.sleep(for: .milliseconds(800)) with a short polling loop that repeatedly reads AXFocusedUIElement and returns as soon as AXFocusedUIElement == element (with a small max timeout), and update the condition around axAction == "AXPress" and target.role to incorporate the chosen gate.libs/cua-driver/Tests/integration/test_overlay_z_order.py (1)
142-144: Stale comment and over-long sleep after switching to continuous repin.The comment still references the old
[60, 180, 360, 600, 900, 1200]msdefensive-repin schedule that this PR replaces with the ~33ms continuous loop (seestartContinuousRepininAgentCursor.swift). Under the new loop the overlay is back in place within one frame (~33ms), so thetime.sleep(1.5)is both inaccurately justified and ~40× longer than needed — inflates the test suite wall-clock for no coverage benefit.🧹 Suggested cleanup
- # Let the defensive-repin ticks fully settle (last tick fires at ~1200ms - # after the second pinAbove inside click, which returns after animations). - time.sleep(1.5) + # Continuous repin loop fires every ~33 ms; one frame plus slack + # is plenty for the overlay to land in its final z-position. + time.sleep(0.3)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/cua-driver/Tests/integration/test_overlay_z_order.py` around lines 142 - 144, The test contains a stale comment referencing the old defensive-repin schedule and uses an excessively long sleep (time.sleep(1.5)); update the comment to reflect the new continuous repin loop (startContinuousRepin in AgentCursor.swift) which restores overlay within one frame (~33ms) and reduce the sleep to a small margin (e.g., ~0.05s) sufficient for one frame; locate the sleep in test_overlay_z_order.py and replace the comment text and the 1.5s sleep value accordingly.libs/cua-driver/Tests/integration/test_hermes_form_fill.py (1)
70-72: Dead helper:_build_focus_appis never called.
setUpClasshas its own inline build-if-stale check at lines 157-160 using a different policy (rebuilds when the.swiftis newer than the binary), so_build_focus_appis never invoked and its "build only when missing" logic doesn't match what the suite actually uses. Either delete it, or refactorsetUpClassto call it (and move the mtime check into the helper) so there's a single source of truth shared withtest_overlay_z_order.py's_ensure_focus_app_built.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/cua-driver/Tests/integration/test_hermes_form_fill.py` around lines 70 - 72, The helper function _build_focus_app is unused and its "build only when missing" behavior conflicts with the inline mtime logic in setUpClass; refactor by moving the mtime check/build logic into _build_focus_app (so it rebuilds when the .swift source is newer OR builds when missing), then replace the inline build-if-stale block in setUpClass with a call to _build_focus_app, and update test_overlay_z_order.py to call the same _ensure_focus_app_built/_build_focus_app helper so there is a single source of truth for building the focus app.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@libs/cua-driver/Sources/CuaDriverCore/Cursor/AgentCursor.swift`:
- Around line 442-449: The hide() method fails to clear the lingering focus
rectangle; call the private clearFocusRect() from hide() so
AgentCursorRenderer.shared.focusRect is cleared when the cursor auto-hides (or
remove clearFocusRect() if you intentionally don't need it). Update the hide()
implementation (method name: hide) to invoke clearFocusRect() before returning,
ensuring the singleton AgentCursorRenderer.shared.focusRect is reset and
preventing the old rect from flashing on the next show().
In `@libs/cua-driver/Sources/CuaDriverServer/Tools/SetValueTool.swift`:
- Around line 239-254: proc.waitUntilExit() can block indefinitely if osascript
stalls; modify the block in SetValueTool (where proc, proc.executableURL,
proc.arguments, outPipe, and errorResult are used) to enforce a bounded timeout:
start the process as you already do, then run a timeout mechanism (e.g., Task
race with Task.sleep or a DispatchWorkItem on a background queue) that waits up
to a fixed duration (e.g., 10s) and, on timeout, calls proc.terminate() (and
proc.kill() if needed) and returns an errorResult indicating timeout; if the
process exits normally before the deadline, proceed to read from outPipe and
return the existing result.
- Around line 217-238: The current value escaping in SetValueTool (variables
vLow, js, and appleScript) is incomplete and allows double quotes and
backslashes to break the AppleScript/JS string; update the sanitizer to first
escape backslashes then single quotes for the JS single-quoted literal (vLow),
and also escape backslashes and double quotes when embedding into the outer
AppleScript double-quoted string (or switch to passing the value via osascript
stdin/argument to avoid double-escaping altogether); additionally, when
launching osascript where proc.waitUntilExit() is used, add a timeout guard and
handle timeout-by-killing the process to avoid indefinite hangs.
In `@libs/cua-driver/Tests/FocusMonitorApp/FocusMonitorApp.swift`:
- Around line 3-15: The file header currently documents two focus-loss kinds but
the implementation also tracks a third: the text-field first-responder loss via
TrackingTextField which increments fieldLossCount and writes to
/tmp/focus_monitor_field_losses.txt; update the top docstring to list this third
signal (mention TrackingTextField, fieldLossCount, and the field losses file) so
readers/tests understand all three tracked events.
---
Outside diff comments:
In `@libs/cua-driver/Sources/CuaDriverCore/Cursor/AgentCursorOverlayWindow.swift`:
- Around line 3-11: Update the class-level doc for AgentCursorOverlayWindow to
remove the outdated claim that the overlay is "installed at `.statusBar + 1`"
and instead describe the current behavior: the window uses `.normal` level and
is intentionally sandwiched in the regular window stack (above ordinary windows
but below system menus/modals) while remaining borderless, transparent,
click-through, and non-key/non-main; edit the opening summary sentence to
reflect this new window level and stacking contract so the top-of-file comment
matches the implementation in AgentCursorOverlayWindow.
---
Nitpick comments:
In `@libs/cua-driver/Sources/CuaDriverServer/Tools/ClickTool.swift`:
- Around line 272-289: The unconditional 800 ms sleep after AXPress for any
element whose AXInput.describe(element) role is "AXTextField" or "AXTextArea" is
adding latency for native Cocoa text fields; narrow the condition in
ClickTool.swift by either (1) gating the delay on the owning app (check the
element's owning app bundle id returned by AXInput.describe/owner info and only
apply the sleep for WebKit/Safari bundle ids), (2) adding a configurable
post_press_delay_ms parameter (expose a param on the ClickTool/press API and use
that value instead of the hardcoded 800 ms), or (3) replace the fixed
Task.sleep(for: .milliseconds(800)) with a short polling loop that repeatedly
reads AXFocusedUIElement and returns as soon as AXFocusedUIElement == element
(with a small max timeout), and update the condition around axAction ==
"AXPress" and target.role to incorporate the chosen gate.
In `@libs/cua-driver/Tests/integration/test_hermes_form_fill.py`:
- Around line 70-72: The helper function _build_focus_app is unused and its
"build only when missing" behavior conflicts with the inline mtime logic in
setUpClass; refactor by moving the mtime check/build logic into _build_focus_app
(so it rebuilds when the .swift source is newer OR builds when missing), then
replace the inline build-if-stale block in setUpClass with a call to
_build_focus_app, and update test_overlay_z_order.py to call the same
_ensure_focus_app_built/_build_focus_app helper so there is a single source of
truth for building the focus app.
In `@libs/cua-driver/Tests/integration/test_overlay_z_order.py`:
- Around line 142-144: The test contains a stale comment referencing the old
defensive-repin schedule and uses an excessively long sleep (time.sleep(1.5));
update the comment to reflect the new continuous repin loop
(startContinuousRepin in AgentCursor.swift) which restores overlay within one
frame (~33ms) and reduce the sleep to a small margin (e.g., ~0.05s) sufficient
for one frame; locate the sleep in test_overlay_z_order.py and replace the
comment text and the 1.5s sleep value accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eaaf5ea9-04cb-4f95-ab08-7b680fdc53a9
📒 Files selected for processing (13)
libs/cua-driver/Sources/CuaDriverCore/Cursor/AgentCursor.swiftlibs/cua-driver/Sources/CuaDriverCore/Cursor/AgentCursorOverlayWindow.swiftlibs/cua-driver/Sources/CuaDriverCore/Cursor/AgentCursorRenderer.swiftlibs/cua-driver/Sources/CuaDriverCore/Cursor/AgentCursorView.swiftlibs/cua-driver/Sources/CuaDriverCore/Input/AXInput.swiftlibs/cua-driver/Sources/CuaDriverServer/Tools/ClickTool.swiftlibs/cua-driver/Sources/CuaDriverServer/Tools/LaunchAppTool.swiftlibs/cua-driver/Sources/CuaDriverServer/Tools/ListWindowsTool.swiftlibs/cua-driver/Sources/CuaDriverServer/Tools/SetValueTool.swiftlibs/cua-driver/Tests/FocusMonitorApp/FocusMonitorApp.swiftlibs/cua-driver/Tests/integration/fixtures/form_all_inputs.htmllibs/cua-driver/Tests/integration/test_hermes_form_fill.pylibs/cua-driver/Tests/integration/test_overlay_z_order.py
- AgentCursor: call clearFocusRect() in hide() so the focus rect does not linger after the cursor auto-hides (the helper had a docstring explicitly saying it should be called there but was never wired up) - SetValueTool: fix value escaping in JS/AppleScript injection path. Previously only single-quotes were escaped, leaving double-quotes and backslashes able to break the outer AppleScript string or inject arbitrary commands. Switch to percent-encoding with unreserved chars only and decode via decodeURIComponent() in the JS, eliminating the double-escape problem entirely. Also add a 10-second polling timeout around proc.waitUntilExit() so a stuck Safari/osascript does not block the MCP tool handler indefinitely. - FocusMonitorApp: update doc-comment from "two kinds" to "three kinds" of focus loss to match the implementation (app-level, window key status, and text-field first-responder, the third being the newest addition tracked via TrackingTextField). - test_hermes_form_fill: change module-level os.environ["ANTHROPIC_API_KEY"] to os.environ.get(..., "") so test discovery / lint / collect-only passes do not raise KeyError. setUpClass now skips the suite cleanly when the key is absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
NSWindowLevel.floatingto.normalso it is sandwiched just above the target window ([target, overlay, fg-windows]). Foreground windows above the target correctly occlude the cursor — background interaction is visually clear.[60, 180, 360, 600, 900, 1200ms]with a continuous~30 fpsloop. macOS raises background window z-levels asynchronously after AX actions; the old schedule allowed up to 300ms where the overlay sat below the target. At 33ms it snaps back within one frame.AXTextField/AXTextAreaso WebKit DOM focus is established beforetype_text_charsfires;AXPopUpButtonhandling (list options, hint to useset_value);SetValueToolwith AX-child-press + JS-injection fallback for WebKit<select>elements.TrackingTextFieldthat holds keyboard focus throughout tests; three-level focus-loss counters (app, window key, text field) written to/tmp/.list_windowsandlaunch_appnow returnstructuredContentalongside human-readable text.New files
Tests/integration/fixtures/form_all_inputs.html— HTML form with all input types for hermes testingTests/integration/test_hermes_form_fill.py— 9 tests driving individual HTML inputs in backgrounded Safari via hermes CLI; asserts no focus stolen from FocusMonitorAppTests/integration/test_overlay_z_order.py— asserts overlay appears atlayer=0(not floating/layer 3) and is sandwiched between target and foreground windowsTest plan
CUA_DRIVER_BINARY=.build/CuaDriver.app/Contents/MacOS/cua-driver python3 -m unittest test_overlay_z_order -v— new z-order test passesANTHROPIC_API_KEY=<key> CUA_DRIVER_BINARY=.build/CuaDriver.app/Contents/MacOS/cua-driver python3 -m unittest test_hermes_form_fill -v— all 9 form-fill tests passANTHROPIC_API_KEYenv var at test runtime)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests