FrameDocs

Inspiration

As developers, we've all been there—staring at walls of technical documentation, trying to understand complex APIs, frameworks, or libraries. Reading is great, but sometimes you need visual explanations, code walkthroughs, and step-by-step diagrams to truly grasp a concept. That's when we realized: what if documentation could teach itself?

With Chrome's Built-in AI Challenge 2025, we saw an opportunity to solve this problem using Gemini Nano and Chrome's suite of AI APIs. The idea was simple: turn any documentation page into an engaging educational video, completely client-side, with no servers, no costs, and total privacy.

What it does

FrameDocs is a Chrome Extension that transforms technical documentation into structured video tutorials using Chrome's Built-in AI:

🔍 Detects documentation pages automatically (API docs, guides, tutorials)
🎬 Floating button appears with your logo - click to generate a video about the page
🤖 Summarizer API analyzes content and suggests relevant video topics
📝 Prompt API (Gemini Nano) generates comprehensive video scripts with multiple scene types:
- Overview scenes explaining core concepts
- Code examples with syntax highlighting
- Process diagrams visualizing workflows
- Bullet point summaries
- Mathematical equations (LaTeX support)
- Comparison tables
- Infographics with metrics
🎥 Video generator (backend API) renders the final educational video
▶️ Watch and learn directly in your browser

All AI processing happens locally on your device - your documentation content never leaves your machine. The extension uses a RAG (Retrieval-Augmented Generation) system to provide context-aware video generation based on the actual page content.

How we built it

Frontend (Chrome Extension)

Built as a Manifest V3 Chrome Extension
Content Script (content.js) detects documentation pages using keywords (docs, API, guide, tutorial) and injects a floating button with custom branding
Generator Page (generator.html + generator.js) handles the AI workflow with a 5-stage progress indicator
Prompt API integration with Gemini Nano for structured JSON video script generation
Summarizer API for analyzing page content and extracting key topics
Simple RAG System stores documentation paragraphs and retrieves relevant chunks based on user queries

AI Script Generation Pipeline

// 1. Initialize Gemini Nano session
const session = await window.ai.languageModel.create({
  topK: 1,
  temperature: 0,
  systemPrompt: 'Generate educational video scripts in JSON format...'
});

// 2. Build prompt with page context
const prompt = `
  Topic: ${userTopic}
  Documentation Context: ${ragSystem.findRelevant(topic, 10)}
  Generate a structured video with these scene types:
  - overview, code, process_diagram, simple_bullets, etc.
`;

// 3. Stream response
const stream = session.promptStreaming(prompt);
for await (const chunk of stream) {
  fullResponse += chunk;
}

// 4. Clean and validate JSON
const cleanedJSON = cleanJSON(fullResponse);
const scriptData = JSON.parse(cleanedJSON);

JSON Cleaning System

Built a character-by-character parser to fix common AI output issues:

Converts single quotes to double quotes
Escapes control characters (newlines, tabs) inside strings
Fixes unquoted property names
Removes trailing commas

Backend (Video Generation API)

Python + Flask REST API at yeeplatform.top
Parallel scene processing for efficient video generation
Manim for rendering mathematical animations and equations
OpenCV for video composition and scene stitching
Wikipedia API integration for fetching contextual images
Pango Markup for rich text formatting with colors, sizes, and styles
Nginx reverse proxy with extended timeouts (600s) for long video generation

Deployment Infrastructure

Nginx configuration with SSL/HTTPS (Let's Encrypt)
Extended proxy timeouts to handle 1-3 minute video generation
Cloud VM hosting with GPU acceleration support
Separate /generate_video_json endpoint for the extension

Challenges we ran into

1. Invalid JSON from AI

The Prompt API would generate syntactically invalid JSON:

Single quotes: 'property': 'value' instead of "property": "value"
Literal newlines in code strings (JSON doesn't allow unescaped \n)
Unescaped quotes inside strings
Unquoted property names

Solution: Built a robust character-by-character JSON cleaner that:

Tracks whether we're inside or outside a string
Converts all quotes to double quotes while preserving string content
Escapes control characters only within string values
Validates output before parsing ```javascript let inString = false; let stringQuote = null;

while (i < clean.length) { const char = clean[i];

if (char === '"' || char === "'") { if (!inString) { inString = true; result += '"'; // Always use double quotes } else if (char === stringQuote) { inString = false; result += '"'; } }

if (inString && char === '\n') { result += '\n'; // Escape newlines } }


### 2. **504 Gateway Timeout Errors**
Video generation takes 30-120 seconds, but Nginx's default timeout was only 60s, causing requests to fail.

**Solution:** Extended Nginx proxy timeouts in the config:
```nginx
location /generate_video_json {
  proxy_read_timeout 600s;    # 10 minutes to read response
  proxy_send_timeout 600s;    # 10 minutes to send request
  proxy_connect_timeout 60s;  # 1 minute to connect
  proxy_buffering off;        # Stream responses
}

3. Generic vs. Context-Aware Videos

Initial videos were too generic and didn't reflect the specific documentation page content.

Solution: Implemented a lightweight RAG system:

class SimpleRAGSystem {
  constructor() {
    this.documents = [];
  }

  addDocuments(docs) {
    this.documents = docs.map(d => ({ text: d }));
  }

  findRelevant(query, topK=3) {
    const matches = this.documents.filter(d => 
      d.text.toLowerCase().includes(query.toLowerCase())
    );
    return matches.slice(0, topK);
  }
}

// Use page content as context
const paragraphs = pageContent.split(/\n{1,2}/).filter(p => p.trim());
ragSystem.addDocuments(paragraphs);
const relevantChunks = ragSystem.findRelevant(userTopic, 10);

4. Poor UX During Long Operations

Users had no idea what was happening during the 30-120 second video generation.

Solution: Built a 5-stage progress indicator with visual feedback:

✅ Initializing AI - Connecting to Gemini Nano
✅ Generating Script - Creating educational content
✅ Processing Scenes - Structuring video scenes
✅ Validating Output - Checking script quality
✅ Sending to Backend - Creating final video

Each stage shows:

Active spinner animation
Checkmark when complete
Error icon if it fails
Real-time status updates

5. Chrome Extension Permissions & API Access

Chrome's Built-in AI APIs require specific flags and early preview program access.

Solution:

Clear documentation in README for enabling flags
Graceful fallback messages if APIs aren't available
Detection of API availability before attempting to use them javascript const availability = await window.ai.languageModel.availability(); if (availability === 'no') { throw new Error('LanguageModel API not available'); }

Accomplishments that we're proud of

✅ Successfully integrated multiple Chrome Built-in AI APIs (Prompt API, Summarizer API) in a real-world application
✅ Built a robust JSON parser that handles AI-generated output inconsistencies
✅ Achieved 100% client-side privacy - documentation content never leaves the user's device
✅ Created a beautiful, intuitive UX with real-time progress stages and smooth animations
✅ Implemented RAG from scratch using simple JavaScript for context-aware generation
✅ Solved the timeout problem with proper Nginx configuration for long-running operations
✅ Generated diverse video content with 10+ scene types (code, diagrams, equations, tables, etc.)
✅ Made it work end-to-end - from Chrome extension → AI generation → video rendering → playback
✅ Zero API costs for users - all AI inference runs locally on Gemini Nano

What we learned

Technical Learnings

Client-side AI is incredibly powerful: Gemini Nano can generate complex, structured JSON outputs entirely on-device without any cloud calls
Prompt engineering is critical: Getting consistent, valid JSON output required very explicit instructions with examples of correct vs. incorrect formats
JSON validation is harder than expected: AI models don't naturally output perfectly valid JSON, so robust cleaning is essential
Timeouts matter in production: Default server timeouts (60s) are too short for AI/video operations - always configure for your use case
Progressive UX is essential for AI: Users need to see what's happening during long operations, not just a loading spinner

Chrome Built-in AI Insights

The Prompt API can handle complex structured outputs with proper system prompts
The Summarizer API works great for extracting key points from long text
Streaming responses provide better UX than waiting for the full output
Token limits require chunking or RAG for long documentation pages
Privacy-first AI resonates strongly with developers concerned about data security

Architecture Learnings

Hybrid approaches work best: Client-side AI for scripting + server-side for video rendering combines the benefits of both
RAG doesn't need to be complex: A simple keyword-matching system works well for focused use cases
Error handling is 80% of the work: Most development time went into handling edge cases and AI failures
Real-time feedback transforms UX: The 5-stage progress indicator made a huge difference in user perception

Development Process

Test with real data early: Documentation pages have unexpected formats that broke our initial assumptions
Version control for prompts: We iterated through 10+ versions of the system prompt to get reliable JSON output
User testing reveals edge cases: Real users found issues we never anticipated (special characters, very long pages, etc.)

What's next for FrameDocs

Short-term (Next Month)

🎙️ Text-to-Speech integration: Add voiceovers using the Web Speech API or Chrome's upcoming Speech API
📱 Mobile support: Implement hybrid AI with Firebase AI Logic or Gemini Developer API for mobile Chrome users
🎨 Custom video themes: Let users choose color schemes, fonts, and animation styles
💾 Save & bookmark: Allow users to save favorite videos for offline viewing

Medium-term (Next Quarter)

🔄 Script editing: Let users modify the AI-generated script before video generation
📊 Analytics dashboard: Show which topics get the most video generations to understand popular docs
🌍 Multi-language support: Use the Translator API to generate videos in different languages
🎯 Smart topic detection: Automatically suggest the most interesting parts of a page to turn into videos
⚡ Instant previews: Show a quick preview of the video structure before full generation

Long-term Vision

🤝 Community video library: Share generated videos with other developers (opt-in)
🎓 Learning paths: Generate series of videos that build on each other for complete learning journeys
🔌 Integration with dev tools: Generate videos from GitHub READMEs, Stack Overflow answers, etc.
🧠 Personalized learning: Adapt video complexity based on user's skill level and past interactions
🏢 Enterprise version: Help companies create internal training videos from their documentation
🎮 Interactive videos: Add quizzes, code sandboxes, and exercises within the videos

Technical Improvements

Optimize video generation speed (currently 30-120s → target 10-30s)
Add caching for frequently-requested documentation pages
Implement video compression for smaller file sizes
Support more scene types (mind maps, flowcharts, timeline animations)
Add A/B testing to improve AI prompt effectiveness

FrameDocs represents the future of developer education - turning static documentation into dynamic, engaging, privacy-first learning experiences. With Chrome's Built-in AI, we're just getting started! 🚀

Built With

chrome-built-in-ai-apis
chrome-extension-(manifest-v3)
css
custom-json-parser
flask
gemini-nano
html
javascript
manim
nginx
opencv
prompt-api
python
summarizer-api
wikipedia-api

Updates

Nkugwa Mark William started this project — Oct 30, 2025 05:17 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.