If it's on the screen, it's an API.
Control any desktop app through JSON. REST API + MCP server. Single binary. Zero dependencies.
Quick Start • How It Works • API • SDKs • MCP Setup • Dashboard • Examples • API Spec • Changelog • Contributing
OculOS is a lightweight daemon that reads the OS accessibility tree and exposes every button, text field, checkbox, and menu item as a JSON endpoint. It works as a REST API for scripts, testing, and CI/CD — and as an MCP server for AI agents like Claude, Cursor, and Windsurf.
No screenshots. No pixel coordinates. No browser extensions. No code injection. No AI required. Just structured JSON.
Claude Code uses OculOS MCP tools to open Calculator, find buttons, click 5 × 5 =, and read the result — fully autonomous.
Claude Code uses OculOS MCP tools to find Spotify, focus it, search for a song, and play it — fully autonomous.
Built-in dashboard with window list, interactive element tree, inspector, recorder, and live WebSocket events.
git clone https://github.com/huseyinstif/oculos.git
cd oculos
cargo build --releaseOculOS reads the OS accessibility tree, so macOS requires you to grant permission:
- Open System Settings → Privacy & Security → Accessibility
- Click the lock icon and enter your password
- Click + and add your terminal app (Terminal, iTerm2, Windsurf, etc.) or the
oculosbinary itself - Make sure the toggle is enabled
Without this permission, OculOS can list windows but cannot read UI elements or interact with them.
./target/release/oculos
# API → http://127.0.0.1:7878
# Dashboard → http://127.0.0.1:7878./target/release/oculos --mcpOculOS reads the OS accessibility tree and assigns each UI element a session-scoped UUID (oculos_id). You use that ID to interact.
# 1. List open windows
curl http://localhost:7878/windows
# 2. Get the UI tree for a window
curl http://localhost:7878/windows/{pid}/tree
# 3. Find a specific element
curl "http://localhost:7878/windows/{pid}/find?q=Submit&type=Button"
# 4. Click it
curl -X POST http://localhost:7878/interact/{id}/click
# 5. Type into a text field
curl -X POST http://localhost:7878/interact/{id}/set-text \
-H "Content-Type: application/json" \
-d '{"text":"hello world"}'Every element includes an actions array — the API tells you exactly what you can do:
{
"oculos_id": "a3f8c2d1-...",
"type": "Button",
"label": "Submit",
"enabled": true,
"actions": ["click", "focus"],
"rect": { "x": 120, "y": 340, "width": 80, "height": 32 }
}| Endpoint | Description |
|---|---|
GET /windows |
List all visible windows |
GET /windows/{pid}/tree |
Full UI element tree |
GET /windows/{pid}/find?q=&type=&interactive= |
Search elements |
GET /hwnd/{hwnd}/tree |
Tree by window handle |
GET /hwnd/{hwnd}/find |
Search by window handle |
| Endpoint | Description |
|---|---|
POST /windows/{pid}/focus |
Bring to foreground |
POST /windows/{pid}/close |
Close gracefully |
GET /windows/{pid}/wait?q=&type=&timeout= |
Wait for element to appear (polls, default 5s) |
GET /windows/{pid}/screenshot |
Capture window as PNG |
| Endpoint | Body | Description |
|---|---|---|
POST /interact/{id}/click |
— | Click |
POST /interact/{id}/set-text |
{"text":"…"} |
Replace text content |
POST /interact/{id}/send-keys |
{"keys":"…"} |
Keyboard input |
POST /interact/{id}/focus |
— | Move focus |
POST /interact/{id}/toggle |
— | Toggle checkbox |
POST /interact/{id}/expand |
— | Expand dropdown / tree |
POST /interact/{id}/collapse |
— | Collapse |
POST /interact/{id}/select |
— | Select list item |
POST /interact/{id}/set-range |
{"value":N} |
Set slider value |
POST /interact/{id}/scroll |
{"direction":"…"} |
Scroll container |
POST /interact/{id}/scroll-into-view |
— | Scroll into viewport |
POST /interact/{id}/highlight |
{"duration_ms":N} |
Highlight on screen |
POST /interact/batch |
{"actions":[...]} |
Execute multiple interactions |
| Endpoint | Description |
|---|---|
GET /health |
Status, version, uptime |
GET /ws |
WebSocket (live action events) |
Works with any MCP-compatible client. Add to your config:
{
"mcpServers": {
"oculos": {
"command": "/path/to/oculos",
"args": ["--mcp"]
}
}
}Tested with: Claude Code, Claude Desktop, Cursor, Windsurf
For non-MCP agents (OpenAI, Gemini, custom), paste AGENTS.md into the system prompt and give the agent HTTP access.
Built-in web UI at http://127.0.0.1:7878:
- Window list — all open windows with focus/close buttons
- Element tree — full interactive UI tree with search and filter
- Inspector — element details, properties, and all available actions
- Recorder — record a sequence of interactions, export as Python, JavaScript, or curl
- JSON viewer — raw element data with copy
- WebSocket — live event indicator, real-time action feed
- Shortcuts —
Rrefresh ·/search ·Eexpand ·Ccollapse ·Hhighlight ·JJSON
| Platform | Backend | Status |
|---|---|---|
| Windows | UI Automation (windows-rs) |
✅ Full — Win32, WPF, Electron, Qt |
| Linux | AT-SPI2 (atspi + zbus) |
✅ Working — GTK, Qt, Electron |
| macOS | Accessibility API (AXUIElement + CoreGraphics) |
✅ Working — Cocoa, Electron, Qt |
| App type | Coverage | Notes |
|---|---|---|
| Win32 / WPF / WinForms | Excellent | Full deep tree, all interactions |
| GTK / Qt | Excellent | Full tree on all platforms |
| Electron (Spotify, VS Code, Slack, Chrome) | Good | Key interactive elements exposed; tree is shallower than native |
| Cocoa (macOS native) | Good | Standard controls fully exposed |
| Custom-drawn / OpenGL / DirectX | Poor | Minimal or no accessibility tree — games, CAD, etc. |
Tip: Run
curl "localhost:7878/windows/{pid}/find?interactive=true"to see what's available for any app.
Official wrappers for the REST API. Install from source (PyPI/npm packages coming soon):
cd sdk/python
pip install .from oculos import OculOS
client = OculOS()
windows = client.list_windows()
client.click(element_id)
client.set_text(element_id, "hello world")See sdk/python for full docs.
cd sdk/typescript
npm install
npm run buildimport { OculOS } from "./sdk/typescript/src/index";
const client = new OculOS();
const windows = await client.listWindows();
await client.click(elementId);
await client.setText(elementId, "hello world");See sdk/typescript for full docs.
oculos [OPTIONS]
-b, --bind <ADDR> Bind address [default: 127.0.0.1:7878]
--static-dir <DIR> Static files directory [default: static]
--log <LEVEL> Log level: trace/debug/info/warn/error [default: info]
--mcp Run as MCP server over stdin/stdout
-h, --help Print help
| OculOS | Vision agents | Screen coordinate tools | Browser-only tools | |
|---|---|---|---|---|
| Approach | OS accessibility tree | Screenshots + LLM | Pixel positions | DOM / a11y tree |
| Scope | Any desktop app | Any (with latency) | Any (fragile) | Browser only |
| Speed | Instant | Seconds | Instant | Instant |
| Deterministic | ✅ | ❌ | ✅ | ✅ |
| GPU required | ❌ | ✅ | ❌ | ❌ |
| Cloud required | ❌ | Usually | ❌ | ❌ |
| Semantic | ✅ Labels + types | Varies | ❌ Coordinates | ✅ |
- Windows UIA backend (full — Win32, WPF, Electron, Qt)
- Linux AT-SPI2 backend
- macOS Accessibility backend (
AXUIElement, CoreGraphics window enumeration, CGEvent keyboard simulation) - REST API server (Axum)
- MCP server (JSON-RPC 2.0 over stdio)
- Session-scoped element registry with UUIDs
- Full keyboard simulation engine
- Window list with focus/close
- Interactive element tree with search/filter
- Element inspector with all actions
- API request log
- JSON viewer with copy
- Keyboard shortcuts
- Element highlighting (native GDI overlay)
- Automation recorder (record + export Python/JS/curl)
- WebSocket live events
- Health endpoint (uptime, version, platform)
- macOS element highlighting (native overlay)
- Python & TypeScript client SDKs
- Batch operations (multiple interactions per request)
- Conditional waits (
/waitendpoint with timeout) - Screenshot capture (
/screenshotendpoint) - GitHub Actions CI (Windows, Linux, macOS)
- Docker image for CI/CD
- OpenAPI spec
-
--versionCLI flag - Element caching & diffing (change detection)
- PyPI / npm SDK publishing
We welcome contributions! See CONTRIBUTING.md for details.
Top areas:
- Tests — cross-app integration tests
- macOS highlight — native overlay for element highlighting
- Element caching — change detection & diffing
- Documentation — guides, examples, tutorials


