KoalaKite

Inspiration

With the rise in problematic media and increased technology usage, younger individuals have become more likely to engage with harmful content online. We were especially concerned with the number of young boys finding and consuming misogynistic red-pilled content and how their parents are often not aware of this until it is too late. Therefore, we wanted to create a product that would allow parental supervision over the content their child was engaging with online. However, we also wanted to provide as much privacy to the child when doing so. To handle this delicate balance, we created KoalaKite - A privacy-first parental supervision browser extension!

What it does

KoalaKite is a browser extension that will monitor the user's Chrome usage and alert the parent if it catches any harmful content. To do this, the following is done:

Periodic Screenshot Capture: The Chrome extension captures screenshots of the active tab at configurable intervals (default: every 10 seconds)
AI-Powered Classification: Screenshots are sent to a Fastify backend server that uses Google's Gemini 2.0 Flash Vision AI to classify content into safety categories (SAFE, SUSPECT, HARMFUL) with confidence scores
Smart Content Detection: The AI identifies 6 categories of harmful content: violence, hate speech, nudity, self-harm, drugs, and grooming
Privacy-First Architecture: SAFE images are immediately discarded and never stored. Only HARMFUL content above the confidence threshold triggers notifications
Intelligent Deduplication: Uses perceptual hashing (difference hash algorithm) to prevent duplicate alerts for similar content
Multi-Modal Alert System (Split by Audience)
- What the parent receives (Email):
- Email notification via SMTP with the screenshot attached.
- Includes the detected category and confidence score.
- What the child experiences (On-screen):
- Friendly koala voice message using ElevenLabs text-to-speech, delivering age-appropriate and supportive guidance tailored to the detected category.
- An animated “koala kite” overlay that appears in the child’s browser with natural, wind-like motion while the voice message plays.
Cooldown System: Per-category cooldown periods (default: 10 minutes) prevent alert fatigue while still allowing notifications for different types of harmful content
Dashboard: A React-based parent dashboard for managing devices, viewing alerts, and configuring settings
Device Management: Parents can register multiple devices and monitor them from a centralized dashboard with MongoDB-backed storage

How we built it

KoalaKite is built as a modern monorepo with four interconnected packages:

Extension (Chrome MV3)

Built with TypeScript and Vite using @crxjs/vite-plugin for hot-reload during development
Background Service Worker: Implements periodic screenshot capture using Chrome Alarms API, handles image preprocessing and perceptual hashing for deduplication, and manages communication with the backend server
Content Script: Injects the animated koala kite onto web pages with natural physics-based animations including entrance (1.5s), hovering (6-10s with figure-8 motion), and exit (2.5s) phases
Popup UI: Provides device registration, parental authentication with PIN protection, and configuration settings

Server (Fastify + TypeScript)

AI Classification: Integrates Google Gemini Vision API with strict JSON schema validation for consistent classification results
Email System: Uses Nodemailer with SMTP for sending alerts with screenshot attachments
Voice Alerts: Implements ElevenLabs text-to-speech integration with category-specific friendly koala messages and cross-platform audio playback (macOS, Linux, Windows)
Database: MongoDB with Mongoose for persisting users, devices, and alerts
Security: JWT-based authentication, rate limiting via @fastify/rate-limit, request size limits, and device token validation
Smart Features:
- Per-category cooldown system using in-memory Map storage
- Gemini rate limit detection and automatic backoff with retry-after parsing
- Error handling with fallback to SUSPECT classification when AI fails

Dashboard (React + TypeScript + Vite)

Built with React 19 and React Router for navigation
Provides parent interface for device management and alert viewing
Uses Lucide React for consistent iconography
Implements the "Nostalgic Doodle" design aesthetic with custom typography (Alan Sans, Andika, Comic Relief) and a warm, hand-drawn visual style

Shared (TypeScript + Zod)

Centralized type definitions and validation schemas
Ensures type safety across frontend and backend using Zod
Defines enums for Severity (SAFE/SUSPECT/HARMFUL) and Category types
Provides strongly-typed interfaces for API requests and responses

Tech Stack Summary

Languages: TypeScript, HTML, CSS
Frontend: React 19, Vite, Chrome Extension APIs
Backend: Fastify, Node.js
Database: MongoDB with Mongoose ODM
AI/ML: Google Gemini 2.0 Flash Vision, ElevenLabs text-to-speech
Validation: Zod for runtime type checking
Build Tools: Vite, @crxjs/vite-plugin, tsc (TypeScript compiler)
Code Quality: ESLint, Prettier, TypeScript strict mode

Challenges we ran into

Balancing Privacy and Safety: We had to carefully design a system that monitors for harmful content while respecting the child's privacy. Our solution was to immediately discard all SAFE images and only retain HARMFUL content long enough to send an alert - this required careful memory management to ensure buffers were zeroed out after use.
AI Classification Reliability: Getting consistent, accurate classifications from Gemini Vision required extensive prompt engineering. We implemented a strict JSON schema with clear severity definitions and had to handle edge cases like malformed JSON responses, model compatibility issues, and rate limiting with exponential backoff.
Perceptual Hashing for Deduplication: Implementing an efficient difference hash algorithm in the browser using OffscreenCanvas and ImageBitmap APIs was challenging. We had to convert images to grayscale, resize to 64×64 pixels, and compute a 4096-bit hash by comparing adjacent pixels.

Accomplishments that we're proud of

Privacy-First Architecture: We successfully built a monitoring system that respects user privacy by design - SAFE content is never stored, and only genuinely harmful content triggers notifications
Multi-Modal Alerting: The combination of email, voice, and visual animation creates a comprehensive alerting system that's both effective and child-friendly
Sophisticated AI Integration: Our Gemini Vision integration with confidence thresholds and per-category classification provides accurate, nuanced content detection
Monorepo Architecture: Building a cohesive system with shared types across extension, server, and dashboard shows excellent software engineering practices
Fun UX: The animated koala kite with natural physics-based movement and friendly voice messages creates a supportive, non-threatening experience for children
Production-Ready Features: Rate limiting, cooldown systems, error handling with graceful degradation, and cross-platform compatibility show attention to real-world deployment concerns

What we learned

AI Prompt Engineering: We learned how critical precise prompt engineering is for reliable AI classification - clear definitions, examples, and strict schema enforcement made a huge difference in output consistency
Browser Extension Architecture: Working with Chrome MV3 taught us about service worker lifecycles, background alarm scheduling, and the security model for extension permissions
Image Processing in Browser: We gained deep knowledge of Canvas APIs, ImageBitmap, OffscreenCanvas, and implementing computer vision algorithms (perceptual hashing) purely in client-side JavaScript
Async Communication Patterns: Coordinating between extension background scripts, content scripts, and remote servers required careful handling of async/await, message passing, and error propagation
Text-to-Speech Integration: Working with ElevenLabs API taught us about voice synthesis parameters (stability, similarity boost, style) and how to create natural, expressive AI voices
Animation Physics: Implementing natural motion required understanding easing functions, wave combinations for realistic movement, and timing synchronization
Full-Stack Type Safety: Using Zod for runtime validation alongside TypeScript gave us end-to-end type safety from database to UI
Security Best Practices: Implementing JWT authentication, rate limiting, token-based device authorization, and secure secret management taught us production security patterns

What's next for KoalaKite

Increased Support: Support other browsers as well as iOS and Android devices
Mobile App: Create companion iOS and Android apps for parents to receive push notifications and review alerts on the go
Advanced Filtering: Allow parents to customize sensitivity thresholds per category and set up allowed/blocked websites
Context-Aware Alerts: Use NLP to analyze not just images but also page text, URLs, and video content for more comprehensive monitoring
Behavioral Analytics: Implement trend detection to identify concerning patterns in browsing behavior over time using statistical analysis
Whitelist/Greylist: Add support for known-safe domains that skip scanning to reduce API costs and processing time
Multi-Language Support: Internationalize the voice messages and UI to support families worldwide