GhostScan

Inspiration

We were inspired by a simple but alarming reality: 14 billion+ records have been breached worldwide, and most people have no idea where their data has leaked or how to take it back. Data brokers like Spokeo, Whitepages, and BeenVerified buy, sell, and trade our personal information. Laws like GDPR and CCPA give us the right to request deletion, but drafting those requests is intimidating, and tracking dozens of brokers manually is nearly impossible.

We wanted to build something that would:

Demystify breach notifications and give people a clear picture of their digital exposure
Empower users with legally-sound, one-click deletion requests
Verify identity so only the real person accesses their sensitive report

What it does

GhostScan is a digital exposure and data removal platform that runs in under 2 minutes:

Verified identity — Users enter their email, verify via OTP, and optionally complete a liveness check for a "Verified" badge on their report.
Breach intelligence — We cross-reference 14B+ leaked credentials via Have I Been Pwned (HIBP), check Gravatar for public profile exposure, and run a deterministic risk engine.
4-dimensional risk score — We compute:
- Account Takeover — password exposure, 2FA, reuse habits
- Identity Theft — name, phone, address, DOB in breaches
- Phishing Risk — Gravatar, breach recency, volume
- Public Exposure — predictable email, disposable domain, breach presence
Interactive dashboard — Risk overview, radar chart, breach timeline, attack surface map, and a Mitigation Simulator that updates your score in real time when you toggle 2FA, password manager, or stop reusing passwords.
Data Removal Center — One place to send deletion requests to 47+ data brokers and breach sources.

Each email gets a unique reference ID (e.g., GS-XXXXXXXX-XXXX), and users can track status: Pending → Queued → Sent → Awaiting Response → Deleted

Screenshot Scam Checker — Upload suspicious emails or pop-ups; we use OCR + AI/heuristics to flag scam and breach-related language.
Password Breach Checker — Check if a password has appeared in HIBP’s Pwned Passwords database.
Gemini AI Assistant — In-app chat to ask questions about your exposure, trends, and next steps.

How we built it

Frontend: Next.js 14 (App Router), React 18, Tailwind CSS, Framer Motion for animations, Recharts for risk and timeline charts. Dark mode support with CSS variables.

Backend: Next.js API Routes for OTP (Supabase Auth), scan preview, screenshot analysis, legal email generation, deletion request tracking, and Gemini chat. Pure TypeScript risk engine with no external deps—fully unit-tested with Jest.

Database: Supabase (Postgres) with tables for users, verification_sessions, scans, breaches, deletion_requests, consent_events, and legal_exports. Row Level Security (RLS) ensures users only access their own data.

External APIs: Have I Been Pwned (HIBP) for breach lookup, Gravatar for public profile check. Optional: OpenAI for screenshot analysis (with heuristic fallback), Google Gemini for the chat widget. Without API keys, the app runs in demo/mock mode.

Infrastructure: Vercel for hosting

Legal templates: Static TypeScript modules with variable injection—GDPR Art. 17, CCPA, US state delete, breach erasure. State-aware: we detect residency (CA, CO, CT, VA, TX, OR, etc.) and recommend the right regime.

Challenges we ran into

Liveness verification without storing face data — We needed to prove the user was present (anti-bot) without persisting any biometrics. We used MediaPipe Face Detection and a head-movement challenge (center → left → right) with frames processed in-memory only. If the browser doesn’t support it, we fall back to a motion-based check and still allow a "Limited" report.
HIBP rate limits and cost — HIBP is paid for non-personal use. We designed a graceful fallback: if no API key or on failure, we return mock breach data so the full flow still works in demo mode. Real scans use the live API.
Legal template accuracy — We’re not lawyers. We based templates on public GDPR/CCPA language and added clear disclaimers. We iterated on structure (ref IDs, data classes, breach dates) so users could track requests.
State-by-state privacy laws — CCPA, VCDPA, TDPSA, etc. each have slightly different wording. We built a us-privacy-laws module with per-state profiles and map users to the right template. We kept templates as close as possible to official language while staying maintainable.
Large base64 images in the scam checker — Sending huge screenshots to the API caused timeouts. We added a 4MB cap and fallback to heuristic analysis when the image is too big or the AI call fails.

Accomplishments that we're proud of

End-to-end flow in under 2 minutes — From landing to deletion request, with real breach data (or demo fallback).
Rocket Money–style Deletion Center — 47+ brokers, one-click legal emails, status tracking, 30-day escalation. Users can download .eml for Outlook/Mail or .txt for any client.
Deterministic, testable risk engine — Pure functions, no side effects. Full Jest coverage for scoring, velocity, trend, momentum, and breach forecast.
Privacy-first — No face images stored, hashed emails in logs, 30-day retention, "Delete All Data" button. RLS on every table.
Graceful degradation — Works without HIBP key (mock), without OpenAI (heuristics), without Gemini (chat hidden). Demo mode (OTP 12345678) for hackathon demos.
Modern UI — Framer Motion animations, dark mode, responsive layout, print-friendly PDF report.

What we learned

Start with the user flow — We sketched the journey (email → verify → scan → report → delete) before writing code. That kept the scope tight and the demo punchy.
Fallbacks are essential — External APIs fail, keys expire, rate limits hit. Every integration has a fallback path so the product still works.
Legal templates need structure — Reference IDs, explicit data classes, and date fields make it possible to track and debug deletion requests. Generic "please delete my data" emails get lost.
Deterministic scoring is debuggable — A pure risk engine with clear formulas is easier to explain, test, and tweak than a black-box model.

What's next for GhostScan

Expand broker catalog — 47 → 100+ data brokers with verified opt-out contacts and portal links.
Automated follow-up — Reminder emails and escalation workflows when brokers don’t respond within 30 days.
Mobile experience — PWA or React Native app so users can check exposure and send requests on the go.
Dark web monitoring — Integrate dark web scan APIs for a fuller exposure picture.
Family/team plans — Scan multiple emails (e.g., family members) from one account.
Password manager integration — Sync with 1Password, Bitwarden for breach alerts and hygiene suggestions.
State-specific legal templates — Add TX, OR, DE, etc. with exact statutory language where available.
Breach letter aggregation — One dashboard where users can upload and track breach notifications they receive in the mail.