feat: stage macOS computer-use progress by glasses666 · Pull Request #13308 · NousResearch/hermes-agent

glasses666 · 2026-04-21T04:23:05Z

Summary

add a real macOS hermes-computer-use adapter with app sessions, window metadata, approval-gated state, and pointer plumbing
add gateway-mediated app-access approvals for hermes-computer-use (Allow Once, Session, Always, Deny)
add Telegram inline approval buttons for generic app-access requests
add refreshed grounded macOS proof screenshots taken on the real desktop
replace the old purple pixel-art overlay with a more macOS-native white pointer treatment
verify real click success against TextEdit through both the local backend and the live MCP/chat path
verify real local scroll + drag receipts in the branch-local adapter/backend TextEdit smoke

Why this matters

Hermes has had strong browser and terminal control for a while, but local desktop control on macOS was still the missing proof point.

This branch moves that boundary:

desktop sessions are approval-gated instead of silently trusted
app state is window-scoped and inspectable
lockscreen handling is grounded
real pointer actions are no longer just mocked previews

Grounded receipts

docs/media/computer-use/lockscreen-password-ui.png
docs/media/computer-use/unlocked-terminal-after-input.png
docs/media/computer-use/textedit-window-state.png
docs/media/computer-use/textedit-click-overlay.png
docs/media/computer-use/textedit-scroll-state.png
docs/media/computer-use/textedit-drag-selection.png
focused tests: 153 passed
py_compile: passed on the touched Python files

Current boundary

This PR is the clean macOS computer-use slice against current main.

What is verified already:

Telegram approval flow is wired for app access
get_app_state(app_name=...) returns window-scoped state
lockscreen -> desktop recovery is proven with explicit user consent
live MCP click(...) reaches the real local pointer backend and closes the target TextEdit window
branch-local scroll(...) produces a visibly scrolled TextEdit viewport
branch-local drag(...) produces a visible multi-line TextEdit selection

What is still honestly ahead:

detached / non-disruptive cursor UX
broader session polish beyond the current verified slice
careful boundary-setting around permission dialogs and other macOS-sensitive surfaces
hot-reloading and re-verifying the newest scroll/drag slice through every live runtime path

- capture frontmost macOS window metadata via Swift helper - target screencapture at a specific window id with full-screen fallback - expose window metadata through the computer-use adapter and harden approval matching - cover the new behavior with focused tool and adapter tests

- require approved active sessions for type_text and press_key - reject ambiguous multi-session keyboard targeting without app_session_id - add regression tests for unapproved, ambiguous, and inactive keyboard targets

- persist the canonical approved app identity for localized frontmost sessions - keep revoked localized sessions bound to the same app_session_id and block keyboard input - add regression tests for localized app_session_id refresh and revocation

- have the Swift helper emit all visible layer-0 windows for the frontmost app - choose frontmost metadata in Python while skipping only clearly fragmentary untitled candidates - cover tiny fragments, background titled windows, and compact frontmost dialogs with regression tests

…Windows) Three OS-specific tools — `computer_use_macos`, `computer_use_linux`, `computer_use_windows` — sharing one JSON schema and one set of action semantics, but with native backends per platform. Complementary to the containerised proposal in NousResearch#15876 (which targets the "Hermes-runs-in-Docker" deployment shape) and the macOS-Anthropic-protocol work in NousResearch#4562 / NousResearch#13308. This PR owns the "Hermes runs natively on the host desktop, control any of the three majors with consistent abstraction" shape. Architecture ============ * All three tools register at module top via `registry.register()` so the AST tool-discovery picks them up. `check_fn` returns False off the matching platform / when `HERMES_COMPUTER_USE_ENABLED` is unset / when required deps are missing — so on a given host the model only sees the one tool it can actually use. * `computer_use_common.py` — schema, `ActionRequest`, `ActionResult`, parameter validation, screen-bounds enforcement. * `computer_use_safety.py` — env gate, kill-switch flag, JSONL action log under `$HERMES_HOME/logs/computer_use.jsonl`, screenshot redaction (PIL). * `computer_use_grammar.py` — one parser, four targets. `Cmd+Shift+T` produces Quartz CGEvent flags+keycode on macOS, `xdotool key` string on X11, ydotool input event codes on Wayland, Win32 VK codes on Windows. * `computer_use_macos.py` — Quartz `CGEvent` for input, `screencapture` CLI for capture, `CGWindowListCopyWindowInfo` for active window. pyobjc-framework-Quartz is the only new dep. * `computer_use_linux.py` — runtime detection of X11 vs Wayland. X11 → `xdotool` + `scrot`/`import`. Wayland → `ydotool` + `grim` (wlroots) / `gnome-screenshot` / `spectacle`. Active-window queries via Sway IPC / hyprctl / xdotool depending on path. * `computer_use_windows.py` — `ctypes` over `user32.SendInput` (modern path; avoids legacy `keybd_event`). DPI-aware on import. Screenshot via `mss` if installed, falls back to ctypes BitBlt + PIL otherwise. Skills ====== Per-OS skill teaches the model what's actually different on each host: Cmd-vs-Ctrl, Spotlight vs Win+S, X11 vs Wayland detection, UAC/UIPI, accessibility / screen-recording perm setup, etc. The common skill covers when to reach for `computer_use_*` at all (vs `browser_tool` / `terminal`) and the screenshot-first discipline. Validation ========== * 56/56 unit tests passing (mocked Quartz / subprocess / user32 across all three backends + grammar + safety). * macOS backend integration-tested live on the author's MacBook: screen_size, cursor_position, get_active_window, screenshot (full), screenshot (region crop), screenshot (with redact), wait, off-screen click validation, type-without-text validation, unknown-action validation, env-off refusal — all 11/11 cases pass. * Linux + Windows are unit-test-only at the moment; author has no Linux or Windows host immediately available for end-to-end validation. Honest framing in the eventual PR body. Safety posture ============== * `HERMES_COMPUTER_USE_ENABLED=true` required. Default: refused. * Action allowlist + per-action validation (no off-screen, no >10K type strings, no >30s waits, no unknown actions). * Process-global kill-switch flag (`set_kill_switch()`) checked before every action — engaged once, all subsequent actions refuse until cleared. * JSONL audit log of every attempt (action, params minus image bytes, success bit, error if any). * `screenshot` action accepts `redact_regions` to blank rectangles (password manager, MFA codes) before the image reaches the model.

glasses666 · 2026-06-02T04:54:31Z

Closing this stale draft branch; current work is being handled through the narrower gateway/compression PRs.

glasses666 added 24 commits April 21, 2026 12:43

feat: add baseline computer-use adapter on macOS

4fd0efd

feat: add session and approval state to computer-use adapter

1231c38

feat: add active session controls to computer-use adapter

dbbef40

feat: gate keyboard actions on active computer-use sessions

c2af55c

feat: add virtual cursor state to computer-use sessions

9559ec5

feat: add session-targeted interaction routing

42abc3b

feat: let get-app-state resume active sessions by id

bdc7bc5

feat: track window metadata in active sessions

df2c32f

feat: render virtual cursor overlay previews

2dc94bc

feat: render pixelated cursor previews

4b7c3bb

feat: harden overlay preview lifecycle

6a6d178

feat: export session state manifests

8bfd7db

feat: bridge pending pointer actions

c7694b1

[verified] feat: report helper pointer action results

bbcd581

[verified] fix: block pending pointer action overwrites

145e980

[verified] feat: add pending pointer claim leases

072d990

[verified] fix: serialize pending pointer session transitions

b560ad7

[verified] fix: harden keyboard session routing

a6d0575

- require approved active sessions for type_text and press_key - reject ambiguous multi-session keyboard targeting without app_session_id - add regression tests for unapproved, ambiguous, and inactive keyboard targets

feat: stage macOS computer-use progress

7bb498c

docs: refresh computer-use milestone status

445c8ee

test: stub tirith in blocking approval e2e

e30a74d

glasses666 force-pushed the feat/computer-use-session-adapter branch from 243346f to e30a74d Compare April 21, 2026 05:02

[verified] feat: add macOS mouse receipts and polish overlay

cd4ed5f

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder comp/gateway Gateway runner, session dispatch, delivery labels Apr 22, 2026

alt-glitch added the platform/telegram Telegram bot adapter label Apr 22, 2026

glasses666 closed this Apr 22, 2026

glasses666 reopened this Apr 22, 2026

alt-glitch mentioned this pull request Apr 26, 2026

Proposal: Optional desktop computer-use module (noVNC + screenshot + mouse/keyboard control) #15876

Closed

0xMrBlueOps mentioned this pull request Apr 29, 2026

feat(desktop): add persistent computer-use VM #17258

Open

Abd0r mentioned this pull request May 6, 2026

feat(tools/computer_use): native per-OS desktop control (macOS / Linux / Windows) #20660

Open

7 tasks

glasses666 closed this Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: stage macOS computer-use progress#13308

feat: stage macOS computer-use progress#13308
glasses666 wants to merge 25 commits into
NousResearch:mainfrom
glasses666:feat/computer-use-session-adapter

glasses666 commented Apr 21, 2026 •

edited

Loading

Uh oh!

glasses666 commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

glasses666 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this matters

Grounded receipts

Current boundary

Uh oh!

glasses666 commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

glasses666 commented Apr 21, 2026 •

edited

Loading