AI-Powered Screen Tutorial Guide
Beacon is an intelligent desktop assistant that provides real-time, step-by-step guidance for any task on your screen. Simply describe what you want to do, and Beacon will analyze your screen, create a plan, and guide you through each step with visual overlays.
- Powered by Google Gemini AI to understand natural language tasks
- Automatically generates step-by-step plans based on your screen context
- Adapts to your current application and UI state
- OCR-based text detection - Finds buttons, labels, and UI text
- AI-powered region detection - Custom algorithm using Gemini for intelligent UI element identification
- Hybrid locator - Combines multiple detection methods for accuracy
- Region-aware - Understands screen regions (menu bar, dock, main content)
- Non-intrusive transparent overlay
- Highlights UI elements you need to interact with
- Shows step progress and instructions
- Click-through design that doesn't interfere with your workflow
- Takes screenshots to understand your current context
- Provides contextual help based on what's on screen
- Guides you through complex multi-step workflows
- Learn new software - Get guided tours of unfamiliar applications
- Workflow assistance - Step-by-step help for complex tasks
- UI automation preparation - Plan and visualize automation sequences
- Accessibility - Visual guidance for navigation and interaction
- Documentation - Record and share step-by-step procedures
- Describe your task - Type what you want to do in natural language
- AI analyzes your screen - Beacon captures your screen and uses AI to understand the context
- Plan generation - Creates a step-by-step plan to accomplish your task
- Visual guidance - Highlights elements and shows instructions in real-time
- Progress tracking - Follow along as Beacon guides you through each step
- Frontend: Electron - Cross-platform desktop app with transparent overlay
- Backend: Python - AI processing, vision, and OCR
- AI: Google Gemini - Natural language understanding and task planning
- Vision:
- Tesseract OCR - Text detection
- Custom region detection algorithm with Gemini - AI-powered UI element identification
- PIL/Pillow - Image processing
- Platform: macOS (with Quartz framework for window management)
- Clone the repository
- Follow the build instructions
- Set up your Google API key
- Run
npm startin the client directory - Start getting guided!
For detailed setup instructions, see BUILD.md.
┌─────────────────────────────────────┐
│ Electron Desktop App (Client) │
│ - Transparent overlay │
│ - User input capture │
│ - Visual highlighting │
└─────────────┬───────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Python Engine (Backend) │
│ ┌───────────────────────────────┐ │
│ │ Planner (Gemini AI) │ │
│ │ - Task understanding │ │
│ │ - Plan generation │ │
│ └───────────────────────────────┘ │
│ ┌───────────────────────────────┐ │
│ │ Locators │ │
│ │ - OCR (Tesseract) │ │
│ │ - Region detection (Gemini) │ │
│ │ - Hybrid locator │ │
│ └───────────────────────────────┘ │
│ ┌───────────────────────────────┐ │
│ │ Region Manager │ │
│ │ - Screen region detection │ │
│ │ - Context awareness │ │
│ └───────────────────────────────┘ │
└─────────────────────────────────────┘
Beacon is currently in active development (v0.1.0). Core features are functional, but the project is evolving rapidly.
- ✅ Task planning with AI
- ✅ Screen capture and analysis
- ✅ Element detection (OCR + icons)
- ✅ Visual overlay system
- ✅ macOS support
- 🔄 Enhanced element detection accuracy
- 🔄 Interactive tutorial mode
- 🔄 Multi-monitor support
- 📋 Windows/Linux support
- 📋 Tutorial recording and playback
- 📋 Cloud sync for saved tutorials
Beacon is an open-source project. Contributions are welcome! See BUILD.md for development setup.
MIT License - See LICENSE file for details
Built with:
Note: Beacon requires accessibility permissions on macOS to capture screenshots and detect UI elements. You'll be prompted to grant these permissions when you first run the app.