Skip to content

OpenAdaptAI/openadapt-capture

Repository files navigation

OpenAdapt Capture

Build Status License: MIT Python 3.10+

OpenAdapt Capture is the data collection component of the OpenAdapt GUI automation ecosystem.

Capture platform-agnostic GUI interaction streams with time-aligned screenshots and audio for training ML models or replaying workflows.

Status: Pre-alpha. See docs/DESIGN.md for architecture discussion.


The OpenAdapt Ecosystem

                          OpenAdapt GUI Automation Pipeline
                          =================================

    +-----------------+          +------------------+          +------------------+
    |                 |          |                  |          |                  |
    | openadapt-      |  ------> | openadapt-ml     |  ------> |    Deploy        |
    | capture         |  Convert | (Train & Eval)   |  Export  |    (Inference)   |
    |                 |          |                  |          |                  |
    +-----------------+          +------------------+          +------------------+
          |                             |                             |
          v                             v                             v
    - Record GUI                  - Fine-tune VLMs              - Run trained
      interactions                - Evaluate on                   agent on new
    - Mouse, keyboard,              benchmarks (WAA)              tasks
      screen, audio               - Compare models              - Real-time
    - Privacy scrubbing           - Cloud GPU training            automation

Component Purpose Repository
openadapt-capture Record human demonstrations GitHub
openadapt-ml Train and evaluate GUI automation models GitHub
openadapt-privacy PII scrubbing for recordings Coming soon

Installation

uv add openadapt-capture

This includes everything needed to capture and replay GUI interactions (mouse, keyboard, screen recording).

For audio capture with Whisper transcription (large download):

uv add "openadapt-capture[audio]"

Quick Start

Capture

from openadapt_capture import Recorder

# Record GUI interactions
with Recorder("./my_capture", task_description="Demo task") as recorder:
    # Captures mouse, keyboard, and screen until context exits
    input("Press Enter to stop recording...")

print(f"Captured {recorder.event_count} events")

Replay / Analysis

from openadapt_capture import Capture

# Load and iterate over time-aligned events
capture = Capture.load("./my_capture")

for action in capture.actions():
    # Each action has an associated screenshot
    print(f"{action.timestamp}: {action.type} at ({action.x}, {action.y})")
    screenshot = action.screenshot  # PIL Image at time of action

Low-Level API

from openadapt_capture import (
    create_capture, process_events,
    MouseDownEvent, MouseButton,
)

# Create storage (platform and screen size auto-detected)
capture, storage = create_capture("./my_capture")

# Write raw events
storage.write_event(MouseDownEvent(timestamp=1.0, x=100, y=200, button=MouseButton.LEFT))

# Query and process
raw_events = storage.get_events()
actions = process_events(raw_events)  # Merges clicks, drags, typed text

Event Types

Raw events (captured):

  • mouse.move, mouse.down, mouse.up, mouse.scroll
  • key.down, key.up
  • screen.frame, audio.chunk

Actions (processed):

  • mouse.singleclick, mouse.doubleclick, mouse.drag
  • key.type (merged keystrokes → text)

Architecture

capture_directory/
├── capture.db      # SQLite: events, metadata
├── video.mp4       # Screen recording
└── audio.flac      # Audio (optional)

Performance Statistics

Track event write latency and analyze capture performance:

from openadapt_capture import Recorder

with Recorder("./my_capture") as recorder:
    input("Press Enter to stop...")

# Access performance statistics
summary = recorder.stats.summary()
print(f"Mean latency: {summary['mean_latency_ms']:.1f}ms")

# Generate performance plot
recorder.stats.plot(output_path="performance.png")

Performance Statistics

Frame Extraction Verification

Compare extracted video frames against original images to verify lossless capture:

from openadapt_capture import compare_video_to_images, plot_comparison

# Compare frames
report = compare_video_to_images(
    "capture/video.mp4",
    [(timestamp, image) for timestamp, image in captured_frames],
)

print(f"Mean diff: {report.mean_diff_overall:.2f}")
print(f"Lossless: {report.is_lossless}")

# Visualize comparison
plot_comparison(report, output_path="comparison.png")

Frame Comparison

Visualization

Generate animated demos and interactive viewers from recordings:

Animated GIF Demo

from openadapt_capture import Capture, create_demo

capture = Capture.load("./my_capture")
create_demo(capture, output="demo.gif", fps=10, max_duration=15)

Interactive HTML Viewer

from openadapt_capture import Capture, create_html

capture = Capture.load("./my_capture")
create_html(capture, output="viewer.html", include_audio=True)

The HTML viewer includes:

  • Timeline scrubber with event markers
  • Frame-by-frame navigation
  • Synchronized audio playback
  • Event list with details panel
  • Keyboard shortcuts (Space, arrows, Home/End)

Generate Demo from Command Line

uv run python scripts/generate_readme_demo.py --duration 10

Optional Extras

Extra Features
audio Audio capture + Whisper transcription
privacy PII scrubbing (openadapt-privacy)
all Everything

Training with OpenAdapt-ML

Captured recordings can be used to train vision-language models with openadapt-ml.

End-to-End Workflow

# 1. Capture a workflow demonstration
uv run python -c "
from openadapt_capture import Recorder

with Recorder('./my_capture', task_description='Turn off Night Shift') as recorder:
    input('Perform the task, then press Enter to stop...')
"

# 2. Train a model on the capture (requires openadapt-ml)
uv pip install openadapt-ml
uv run python -m openadapt_ml.cloud.local train \
  --capture ./my_capture \
  --open  # Opens training dashboard

# 3. Compare human vs model predictions
uv run python -m openadapt_ml.scripts.compare \
  --capture ./my_capture \
  --checkpoint checkpoints/model \
  --open

Cloud GPU Training

For faster training with cloud GPUs:

# Train on Lambda Labs A10 (~$0.75/hr)
uv run python -m openadapt_ml.cloud.lambda_labs train \
  --capture ./my_capture \
  --goal "Turn off Night Shift"

See the openadapt-ml documentation for cloud setup.

Data Format

OpenAdapt-ML converts captures to its Episode format automatically:

from openadapt_ml.ingest.capture import capture_to_episode

episode = capture_to_episode("./my_capture")
print(f"Loaded {len(episode.steps)} steps")
print(f"Instruction: {episode.instruction}")

The conversion maps capture event types to ML action types:

  • mouse.singleclick / mouse.click -> CLICK
  • mouse.doubleclick -> DOUBLE_CLICK
  • mouse.drag -> DRAG
  • mouse.scroll -> SCROLL
  • key.type -> TYPE

Development

uv sync --dev
uv run pytest

Related Projects

License

MIT

About

GUI interaction capture -- production-ready event streams with time-aligned media

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •