Beacon

AI-Powered Screen Tutorial Guide

Beacon is an intelligent desktop assistant that provides real-time, step-by-step guidance for any task on your screen. Simply describe what you want to do, and Beacon will analyze your screen, create a plan, and guide you through each step with visual overlays.

Features

🤖 AI-Powered Task Planning

Powered by Google Gemini AI to understand natural language tasks
Automatically generates step-by-step plans based on your screen context
Adapts to your current application and UI state

👁️ Smart Element Detection

OCR-based text detection - Finds buttons, labels, and UI text
AI-powered region detection - Custom algorithm using Gemini for intelligent UI element identification
Hybrid locator - Combines multiple detection methods for accuracy
Region-aware - Understands screen regions (menu bar, dock, main content)

🎯 Visual Overlay Interface

Non-intrusive transparent overlay
Highlights UI elements you need to interact with
Shows step progress and instructions
Click-through design that doesn't interfere with your workflow

🚀 Real-time Guidance

Takes screenshots to understand your current context
Provides contextual help based on what's on screen
Guides you through complex multi-step workflows

Use Cases

Learn new software - Get guided tours of unfamiliar applications
Workflow assistance - Step-by-step help for complex tasks
UI automation preparation - Plan and visualize automation sequences
Accessibility - Visual guidance for navigation and interaction
Documentation - Record and share step-by-step procedures

How It Works

Describe your task - Type what you want to do in natural language
AI analyzes your screen - Beacon captures your screen and uses AI to understand the context
Plan generation - Creates a step-by-step plan to accomplish your task
Visual guidance - Highlights elements and shows instructions in real-time
Progress tracking - Follow along as Beacon guides you through each step

Tech Stack

Frontend: Electron - Cross-platform desktop app with transparent overlay
Backend: Python - AI processing, vision, and OCR
AI: Google Gemini - Natural language understanding and task planning
Vision:
- Tesseract OCR - Text detection
- Custom region detection algorithm with Gemini - AI-powered UI element identification
- PIL/Pillow - Image processing
Platform: macOS (with Quartz framework for window management)

Quick Start

Clone the repository
Follow the build instructions
Set up your Google API key
Run npm start in the client directory
Start getting guided!

For detailed setup instructions, see BUILD.md.

Architecture

┌─────────────────────────────────────┐
│   Electron Desktop App (Client)    │
│  - Transparent overlay              │
│  - User input capture               │
│  - Visual highlighting              │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│    Python Engine (Backend)          │
│  ┌───────────────────────────────┐  │
│  │  Planner (Gemini AI)          │  │
│  │  - Task understanding         │  │
│  │  - Plan generation            │  │
│  └───────────────────────────────┘  │
│  ┌───────────────────────────────┐  │
│  │  Locators                     │  │
│  │  - OCR (Tesseract)            │  │
│  │  - Region detection (Gemini)  │  │
│  │  - Hybrid locator             │  │
│  └───────────────────────────────┘  │
│  ┌───────────────────────────────┐  │
│  │  Region Manager               │  │
│  │  - Screen region detection    │  │
│  │  - Context awareness          │  │
│  └───────────────────────────────┘  │
└─────────────────────────────────────┘

Project Status

Beacon is currently in active development (v0.1.0). Core features are functional, but the project is evolving rapidly.

Current Capabilities

✅ Task planning with AI
✅ Screen capture and analysis
✅ Element detection (OCR + icons)
✅ Visual overlay system
✅ macOS support

Roadmap

🔄 Enhanced element detection accuracy
🔄 Interactive tutorial mode
🔄 Multi-monitor support
📋 Windows/Linux support
📋 Tutorial recording and playback
📋 Cloud sync for saved tutorials

Contributing

Beacon is an open-source project. Contributions are welcome! See BUILD.md for development setup.

License

MIT License - See LICENSE file for details

Credits

Built with:

Note: Beacon requires accessibility permissions on macOS to capture screenshots and detect UI elements. You'll be prompted to grant these permissions when you first run the app.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
client		client
engine		engine
.gitignore		.gitignore
BUILD.md		BUILD.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beacon

Features

🤖 AI-Powered Task Planning

👁️ Smart Element Detection

🎯 Visual Overlay Interface

🚀 Real-time Guidance

Use Cases

How It Works

Tech Stack

Quick Start

Architecture

Project Status

Current Capabilities

Roadmap

Contributing

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Beacon

Features

🤖 AI-Powered Task Planning

👁️ Smart Element Detection

🎯 Visual Overlay Interface

🚀 Real-time Guidance

Use Cases

How It Works

Tech Stack

Quick Start

Architecture

Project Status

Current Capabilities

Roadmap

Contributing

License

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages