Inspiration

Navigating unfamiliar software can be frustrating. We wanted to create an AI-powered guide that could overlay instructions directly on any application, making tech tutorials seamless and interactive

What We Learned

  • Building transparent, click-through windows with Electron
  • Implementing system-wide mouse tracking using native hooks
  • Handling macOS accessibility permissions and security constraints
  • Balancing interactivity with pass-through behavior for overlay UIs

How We Built It

Beacon combines multiple cutting-edge technologies into a sophisticated AI-powered tutorial system:

Core Architecture

Built with Electron using a dual-window architecture:

  • Overlay Window: Transparent fullscreen canvas that draws highlights and instructions
  • Spotlight Window: Separate effect layer for visual guidance

The core challenge was creating a window that's both visible and interactive while allowing clicks to pass through to applications below. We achieved this using:

  • setIgnoreMouseEvents() with selective toggling for smart click-through behavior
  • uiohook-napi for system-wide mouse event tracking across all applications
  • Canvas-based rendering with requestAnimationFrame for smooth 60fps animations
  • IPC messaging architecture for coordinating between main and renderer processes

Computer Vision & AI Integration

OCR Text Recognition: We integrated Optical Character Recognition to analyze screen content in real-time. The system captures screenshots and extracts text from UI elements, allowing Beacon to understand what's displayed on screen and provide context-aware instructions.

Icon Recognition: Custom icon detection algorithms identify common UI patterns, buttons, and interactive elements. This enables Beacon to automatically locate and highlight specific controls even as applications update their interfaces.

AI Agent Orchestration: The tutorial system leverages AI agents that:

  • Analyze application screenshots to understand current UI state
  • Generate step-by-step tutorial instructions dynamically
  • Adapt guidance based on user progress and context
  • Make intelligent decisions about what to highlight and when

Intelligent Location System

Grid-Based Location Prediction: Rather than hardcoding coordinates, Beacon uses a grid-based prediction algorithm that:

  • Divides the screen into a responsive grid system
  • Uses ML models to predict likely positions of UI elements
  • Adapts to different screen resolutions and window sizes
  • Combines OCR results with visual patterns for accurate element localization
  • Maintains accuracy even when applications move or resize

Python Integration

Python Scripts handle the heavy computational tasks:

  • Image processing pipelines for screenshot analysis
  • OCR preprocessing and text extraction using Tesseract
  • Machine learning inference for element prediction
  • Data processing for tutorial generation
  • Communication with the Electron frontend via IPC/subprocess calls

Technical Stack Summary

  • Frontend: Electron, HTML5 Canvas, JavaScript
  • System Hooks: uiohook-napi for global event tracking
  • Computer Vision: Python + Tesseract
  • AI/ML: Gemini API integration, custom ML models for location prediction
  • IPC Architecture: Bidirectional communication between Electron, Python, and AI services

Challenges

  • Click-through complexity: Making the overlay interactive for our UI while transparent for everything else required careful event handling
  • Global mouse tracking: Required Accessibility permissions on macOS and robust fallback mechanisms
  • Performance: Maintaining smooth animations with requestAnimationFrame while tracking global events
  • Window management: Keeping overlay windows always-on-top across all virtual desktops without interfering with normal workflow

Built With

Share this project:

Updates