Slides Link : https://docs.google.com/presentation/d/1PqcyItnEVI0xH3srShviRGQtB896n8Ghy3IdhvwPuN4/edit?usp=sharing
Inspiration
A few days ago, one of my friend accidentally pasted her API key into GPT while debugging a crucial project. It wasn’t malicious—just rushed and human. But that single mistake could’ve caused unauthorized usage, unexpected billing, or exposure of critical services. That moment made the problem obvious: GenAI prompts are a new leak point, and most existing security protects logins—not the Send button.
What it does
Prompt Firewall is next-gen CAPTCHA + MFA for AI prompts. It intercepts prompts right when users hit Send on tools like ChatGPT/Gemini and applies the lightest safe action:
ALLOW normal prompts with zero friction
AUTO-REDACT sensitive data (PII/secrets) and send a safe version
STEP-UP / BLOCK high-risk prompts (tokens, private keys, repeated automation behavior) using rotating verification challenges (OTP / slider / hold) It also records decisions in a Trust Ledger (metadata only—no raw prompts stored) to support auditing and policy enforcement.
How we built it
Built as a Chrome Extension (Manifest V3) with:
Content script to detect AI chat inputs and intercept send events (Enter/click)
Background service worker to run classification, redaction, step-up selection, and ledger logging
Local detection engine (regex + heuristics) for secrets/PII (API keys, JWTs, private keys, emails/phones, etc.)
Implemented adaptive enforcement:
risk scoring → decision engine (allow/redact/step-up/block)
Added Invisible CAPTCHA (L1):
automation burst detection (rapid sends + paste-to-send too fast) triggers a lightweight verification
Added Rotating L2 step-up (MFA):
for secrets/tokens, challenges rotate (OTP/slider/hold) so it’s harder to automate and more robust
Built a Trust Ledger:
logs domain, risk, reasons, action, challenge type (metadata only)
Challenges we ran into
Reliably intercepting “Send” across different chat UIs without breaking normal behavior
Avoiding false positives (not annoying users) while still being strict with real secrets
Manifest V3 service worker constraints (state persistence + consistency across restarts)
Balancing security and usability: step-up must feel fast, not punitive
Ensuring the ledger never stores raw prompt content while still being useful for auditing
Accomplishments that we're proud of
Built a working “secure the Send button” system that’s demoable in under a minute
Implemented privacy-first protection (local detection + metadata-only logging)
Added Invisible CAPTCHA for automation-like behavior without image CAPTCHAs
Delivered rotating next-gen MFA challenges for high-risk prompts
Made decisions explainable with clear reasons instead of black-box blocking
What we learned
The biggest GenAI risk isn’t advanced hacking—it’s accidental leakage
Security products win when they’re adaptive: allow most, redact some, step-up only when needed
Explainability builds trust—users and judges accept controls when they understand “why”
MV3 extensions require careful handling of state and storage to stay reliable
What's next for Prompt Firewall
Enterprise managed policies: enforced defaults that orgs can deploy, with user add-on rules
Policy packs for schools/universities vs enterprise departments (Engineering/HR/Legal)
Safer substitution (format-preserving redaction) to keep prompts maximally useful for AI
Optional admin dashboard for aggregated, privacy-preserving metrics (no prompt content)
Built With
- ai
- api
- chrome
- chrome-storage-(local/managed)
- extension
- gemini
- javascript
- manifest
- mfa
- mv3
- nextgen-captcha
- trustworthy
Log in or sign up for Devpost to join the conversation.