Inspiration

With the rise in problematic media and increased technology usage, younger individuals have become more likely to engage with harmful content online. We were especially concerned with the number of young boys finding and consuming misogynistic red-pilled content and how their parents are often not aware of this until it is too late. Therefore, we wanted to create a product that would allow parental supervision over the content their child was engaging with online. However, we also wanted to provide as much privacy to the child when doing so. To handle this delicate balance, we created KoalaKite - A privacy-first parental supervision browser extension!

What it does

KoalaKite is a browser extension that will monitor the user's Chrome usage and alert the parent if it catches any harmful content. To do this, the following is done:

  • Periodic Screenshot Capture: The Chrome extension captures screenshots of the active tab at configurable intervals (default: every 10 seconds)
  • AI-Powered Classification: Screenshots are sent to a Fastify backend server that uses Google's Gemini 2.0 Flash Vision AI to classify content into safety categories (SAFE, SUSPECT, HARMFUL) with confidence scores
  • Smart Content Detection: The AI identifies 6 categories of harmful content: violence, hate speech, nudity, self-harm, drugs, and grooming
  • Privacy-First Architecture: SAFE images are immediately discarded and never stored. Only HARMFUL content above the confidence threshold triggers notifications
  • Intelligent Deduplication: Uses perceptual hashing (difference hash algorithm) to prevent duplicate alerts for similar content
  • Multi-Modal Alert System (Split by Audience)
    • What the parent receives (Email):
    • Email notification via SMTP with the screenshot attached.
    • Includes the detected category and confidence score.
    • What the child experiences (On-screen):
    • Friendly koala voice message using ElevenLabs text-to-speech, delivering age-appropriate and supportive guidance tailored to the detected category.
    • An animated “koala kite” overlay that appears in the child’s browser with natural, wind-like motion while the voice message plays.
  • Cooldown System: Per-category cooldown periods (default: 10 minutes) prevent alert fatigue while still allowing notifications for different types of harmful content
  • Dashboard: A React-based parent dashboard for managing devices, viewing alerts, and configuring settings
  • Device Management: Parents can register multiple devices and monitor them from a centralized dashboard with MongoDB-backed storage

How we built it

KoalaKite is built as a modern monorepo with four interconnected packages:

Extension (Chrome MV3)

  • Built with TypeScript and Vite using @crxjs/vite-plugin for hot-reload during development
  • Background Service Worker: Implements periodic screenshot capture using Chrome Alarms API, handles image preprocessing and perceptual hashing for deduplication, and manages communication with the backend server
  • Content Script: Injects the animated koala kite onto web pages with natural physics-based animations including entrance (1.5s), hovering (6-10s with figure-8 motion), and exit (2.5s) phases
  • Popup UI: Provides device registration, parental authentication with PIN protection, and configuration settings

Server (Fastify + TypeScript)

  • AI Classification: Integrates Google Gemini Vision API with strict JSON schema validation for consistent classification results
  • Email System: Uses Nodemailer with SMTP for sending alerts with screenshot attachments
  • Voice Alerts: Implements ElevenLabs text-to-speech integration with category-specific friendly koala messages and cross-platform audio playback (macOS, Linux, Windows)
  • Database: MongoDB with Mongoose for persisting users, devices, and alerts
  • Security: JWT-based authentication, rate limiting via @fastify/rate-limit, request size limits, and device token validation
  • Smart Features:
    • Per-category cooldown system using in-memory Map storage
    • Gemini rate limit detection and automatic backoff with retry-after parsing
    • Error handling with fallback to SUSPECT classification when AI fails

Dashboard (React + TypeScript + Vite)

  • Built with React 19 and React Router for navigation
  • Provides parent interface for device management and alert viewing
  • Uses Lucide React for consistent iconography
  • Implements the "Nostalgic Doodle" design aesthetic with custom typography (Alan Sans, Andika, Comic Relief) and a warm, hand-drawn visual style

Shared (TypeScript + Zod)

  • Centralized type definitions and validation schemas
  • Ensures type safety across frontend and backend using Zod
  • Defines enums for Severity (SAFE/SUSPECT/HARMFUL) and Category types
  • Provides strongly-typed interfaces for API requests and responses

Tech Stack Summary

  • Languages: TypeScript, HTML, CSS
  • Frontend: React 19, Vite, Chrome Extension APIs
  • Backend: Fastify, Node.js
  • Database: MongoDB with Mongoose ODM
  • AI/ML: Google Gemini 2.0 Flash Vision, ElevenLabs text-to-speech
  • Validation: Zod for runtime type checking
  • Build Tools: Vite, @crxjs/vite-plugin, tsc (TypeScript compiler)
  • Code Quality: ESLint, Prettier, TypeScript strict mode

Challenges we ran into

  1. Balancing Privacy and Safety: We had to carefully design a system that monitors for harmful content while respecting the child's privacy. Our solution was to immediately discard all SAFE images and only retain HARMFUL content long enough to send an alert - this required careful memory management to ensure buffers were zeroed out after use.

  2. AI Classification Reliability: Getting consistent, accurate classifications from Gemini Vision required extensive prompt engineering. We implemented a strict JSON schema with clear severity definitions and had to handle edge cases like malformed JSON responses, model compatibility issues, and rate limiting with exponential backoff.

  3. Perceptual Hashing for Deduplication: Implementing an efficient difference hash algorithm in the browser using OffscreenCanvas and ImageBitmap APIs was challenging. We had to convert images to grayscale, resize to 64×64 pixels, and compute a 4096-bit hash by comparing adjacent pixels.

Accomplishments that we're proud of

  • Privacy-First Architecture: We successfully built a monitoring system that respects user privacy by design - SAFE content is never stored, and only genuinely harmful content triggers notifications
  • Multi-Modal Alerting: The combination of email, voice, and visual animation creates a comprehensive alerting system that's both effective and child-friendly
  • Sophisticated AI Integration: Our Gemini Vision integration with confidence thresholds and per-category classification provides accurate, nuanced content detection
  • Monorepo Architecture: Building a cohesive system with shared types across extension, server, and dashboard shows excellent software engineering practices
  • Fun UX: The animated koala kite with natural physics-based movement and friendly voice messages creates a supportive, non-threatening experience for children
  • Production-Ready Features: Rate limiting, cooldown systems, error handling with graceful degradation, and cross-platform compatibility show attention to real-world deployment concerns

What we learned

  • AI Prompt Engineering: We learned how critical precise prompt engineering is for reliable AI classification - clear definitions, examples, and strict schema enforcement made a huge difference in output consistency
  • Browser Extension Architecture: Working with Chrome MV3 taught us about service worker lifecycles, background alarm scheduling, and the security model for extension permissions
  • Image Processing in Browser: We gained deep knowledge of Canvas APIs, ImageBitmap, OffscreenCanvas, and implementing computer vision algorithms (perceptual hashing) purely in client-side JavaScript
  • Async Communication Patterns: Coordinating between extension background scripts, content scripts, and remote servers required careful handling of async/await, message passing, and error propagation
  • Text-to-Speech Integration: Working with ElevenLabs API taught us about voice synthesis parameters (stability, similarity boost, style) and how to create natural, expressive AI voices
  • Animation Physics: Implementing natural motion required understanding easing functions, wave combinations for realistic movement, and timing synchronization
  • Full-Stack Type Safety: Using Zod for runtime validation alongside TypeScript gave us end-to-end type safety from database to UI
  • Security Best Practices: Implementing JWT authentication, rate limiting, token-based device authorization, and secure secret management taught us production security patterns

What's next for KoalaKite

  • Increased Support: Support other browsers as well as iOS and Android devices
  • Mobile App: Create companion iOS and Android apps for parents to receive push notifications and review alerts on the go
  • Advanced Filtering: Allow parents to customize sensitivity thresholds per category and set up allowed/blocked websites
  • Context-Aware Alerts: Use NLP to analyze not just images but also page text, URLs, and video content for more comprehensive monitoring
  • Behavioral Analytics: Implement trend detection to identify concerning patterns in browsing behavior over time using statistical analysis
  • Whitelist/Greylist: Add support for known-safe domains that skip scanning to reduce API costs and processing time
  • Multi-Language Support: Internationalize the voice messages and UI to support families worldwide

Built With

Share this project:

Updates