Beacon

Inspiration

Navigating unfamiliar software can be frustrating. We wanted to create an AI-powered guide that could overlay instructions directly on any application, making tech tutorials seamless and interactive

What We Learned

Building transparent, click-through windows with Electron
Implementing system-wide mouse tracking using native hooks
Handling macOS accessibility permissions and security constraints
Balancing interactivity with pass-through behavior for overlay UIs

How We Built It

Beacon combines multiple cutting-edge technologies into a sophisticated AI-powered tutorial system:

Core Architecture

Built with Electron using a dual-window architecture:

Overlay Window: Transparent fullscreen canvas that draws highlights and instructions
Spotlight Window: Separate effect layer for visual guidance

The core challenge was creating a window that's both visible and interactive while allowing clicks to pass through to applications below. We achieved this using:

setIgnoreMouseEvents() with selective toggling for smart click-through behavior
uiohook-napi for system-wide mouse event tracking across all applications
Canvas-based rendering with requestAnimationFrame for smooth 60fps animations
IPC messaging architecture for coordinating between main and renderer processes

Computer Vision & AI Integration

OCR Text Recognition: We integrated Optical Character Recognition to analyze screen content in real-time. The system captures screenshots and extracts text from UI elements, allowing Beacon to understand what's displayed on screen and provide context-aware instructions.

Icon Recognition: Custom icon detection algorithms identify common UI patterns, buttons, and interactive elements. This enables Beacon to automatically locate and highlight specific controls even as applications update their interfaces.

AI Agent Orchestration: The tutorial system leverages AI agents that:

Analyze application screenshots to understand current UI state
Generate step-by-step tutorial instructions dynamically
Adapt guidance based on user progress and context
Make intelligent decisions about what to highlight and when

Intelligent Location System

Grid-Based Location Prediction: Rather than hardcoding coordinates, Beacon uses a grid-based prediction algorithm that:

Divides the screen into a responsive grid system
Uses ML models to predict likely positions of UI elements
Adapts to different screen resolutions and window sizes
Combines OCR results with visual patterns for accurate element localization
Maintains accuracy even when applications move or resize

Python Integration

Python Scripts handle the heavy computational tasks:

Image processing pipelines for screenshot analysis
OCR preprocessing and text extraction using Tesseract
Machine learning inference for element prediction
Data processing for tutorial generation
Communication with the Electron frontend via IPC/subprocess calls

Technical Stack Summary

Frontend: Electron, HTML5 Canvas, JavaScript
System Hooks: uiohook-napi for global event tracking
Computer Vision: Python + Tesseract
AI/ML: Gemini API integration, custom ML models for location prediction
IPC Architecture: Bidirectional communication between Electron, Python, and AI services

Challenges

Click-through complexity: Making the overlay interactive for our UI while transparent for everything else required careful event handling
Global mouse tracking: Required Accessibility permissions on macOS and robust fallback mechanisms
Performance: Maintaining smooth animations with requestAnimationFrame while tracking global events
Window management: Keeping overlay windows always-on-top across all virtual desktops without interfering with normal workflow