🪶 GemChat — Your AI-Powered Browser Assistant
Conversational AI that lives in your browser — Summarize webpages, listen to them aloud, or talk to Gemini directly.
🌟 Inspiration
We wanted to make AI truly conversational and context-aware — not just a chatbot in a separate tab, but a voice-driven assistant that lives right inside your browser.
Every time we read an article, research paper, or Wikipedia entry, we wished we could simply ask, “Summarize this,” or “Read this aloud,” — without leaving the page.
That’s how GemChat was born.
A lightweight browser extension powered by Google Gemini and FastAPI, enabling live text summarization, text-to-speech playback, and real-time voice conversation — directly from any webpage.
🧩 What It Does
🧠 Smart Summarizer
- Click the Summary icon to instantly get a TL;DR of any webpage (or selected text).
- Powered by Gemini 1.5-Pro — adaptive summaries in multiple styles:
- Short
- Medium
- Bullet Points
- Short
- Choose your tone:
neutral,student, orexecutive.
🔊 Text-to-Speech (TTS)
- Highlight text and click Listen — GemChat reads it aloud through your system speakers.
- Supports adjustable speed (default
1.2×) and tone (calm,enthusiastic,neutral). - Ideal for accessibility, multitasking, and auditory learning.
💬 Live Gemini Chat (Coming Soon)
- Engage in real-time, microphone-based conversation with Gemini using WebSocket audio streaming.
- Audio input captured through the extension, processed by the FastAPI backend, and streamed for response synthesis.
🏗️ Architecture Overview
Browser Extension (Vite + React + TypeScript)
↓
Content Script → FastAPI Backend (Python)
↓
Gemini API + gTTS + Readability-lxml
⚙️ Tech Stack
| Layer | Technologies Used |
|---|---|
| Frontend | Vite, React, TypeScript, Chrome MV3 APIs |
| Backend | FastAPI, Python 3.11, Uvicorn |
| AI/ML | Google Gemini (gemini-1.5-pro, gemini-2.5-flash), gTTS |
| Scraping | Readability-lxml, BeautifulSoup4 |
| Infra | Localhost / Docker-ready microservice |
🧠 How We Built It
🖥️ Backend (FastAPI)
/summary→ Summarizes full or partial webpage text using Gemini/tts→ Converts text to speech using gTTS (MP3 output)/scrape_summary→ Scrapes HTML, cleans content, and runs summarization- Structured modularly in
/app/api,/app/services, and/app/schemas
💡 Extension (Frontend)
- Framework: Vite + React + TypeScript
- Components:
background/— handles message relay, mic permissions, and contextcontent/— injects UI buttons (Summarize / Speak / Chat) into the webpagepopup/— a minimal dashboard interface
- Communicates with the FastAPI backend via REST (soon WebSocket for live audio)
🚀 What We Learned
- Integrating Gemini APIs into a custom FastAPI backend for streaming AI tasks
- Bridging TypeScript content scripts with a Python microservice
- Handling CORS and Manifest V3 service worker constraints
- Generating high-quality speech output dynamically using gTTS
- Managing cross-context communication between background, popup, and content scripts
⚡ Challenges Faced
- Chrome MV3 restrictions — background service workers cannot persist audio streams easily.
- CORS issues between browser scripts and the FastAPI backend.
- Text length limits on Gemini summarization — required implementing adaptive chunking.
- Dynamic build paths between Vite and Chrome manifest outputs.
- Latency tuning in the TTS pipeline for smoother playback.
🧩 Key Features Summary
| Feature | Description |
|---|---|
| 🧠 Summarizer | Context-aware TL;DRs with tone + length control |
| 🔊 Text-to-Speech | Adjustable tone, speed, and language |
| 💬 Live Chat (Upcoming) | Voice-driven interaction with Gemini |
| 🕸️ Smart Scraping | Clean, readable extraction of webpage content |
| 🧩 Modular Backend | Each service isolated under /app/services |
📈 What’s Next
- [ ] Enable bi-directional live audio chat with Gemini’s Live API
- [ ] Add multi-language summarization and TTS support
- [ ] Build UI indicators for “Listening…” and “Speaking…” states
- [ ] Package for Chrome Web Store and Edge Add-ons
🧰 Project Setup
Backend
cd server
pip install -r requirements.txt
uvicorn app.main:app --reload
Extension
cd extension
npm install
npm run build
Then load the built extension via chrome://extensions → Load unpacked → extension/dist/.
🏁 License
This project is licensed under the MIT License — free for educational and research use.
⭐ If you found GemChat inspiring, please consider giving this repo a star on GitHub!
Log in or sign up for Devpost to join the conversation.