Slides Link : https://docs.google.com/presentation/d/1PqcyItnEVI0xH3srShviRGQtB896n8Ghy3IdhvwPuN4/edit?usp=sharing

Inspiration

A few days ago, one of my friend accidentally pasted her API key into GPT while debugging a crucial project. It wasn’t malicious—just rushed and human. But that single mistake could’ve caused unauthorized usage, unexpected billing, or exposure of critical services. That moment made the problem obvious: GenAI prompts are a new leak point, and most existing security protects logins—not the Send button.

What it does

Prompt Firewall is next-gen CAPTCHA + MFA for AI prompts. It intercepts prompts right when users hit Send on tools like ChatGPT/Gemini and applies the lightest safe action:

ALLOW normal prompts with zero friction

AUTO-REDACT sensitive data (PII/secrets) and send a safe version

STEP-UP / BLOCK high-risk prompts (tokens, private keys, repeated automation behavior) using rotating verification challenges (OTP / slider / hold) It also records decisions in a Trust Ledger (metadata only—no raw prompts stored) to support auditing and policy enforcement.

How we built it

Built as a Chrome Extension (Manifest V3) with:

Content script to detect AI chat inputs and intercept send events (Enter/click)

Background service worker to run classification, redaction, step-up selection, and ledger logging

Local detection engine (regex + heuristics) for secrets/PII (API keys, JWTs, private keys, emails/phones, etc.)

Implemented adaptive enforcement:

risk scoring → decision engine (allow/redact/step-up/block)

Added Invisible CAPTCHA (L1):

automation burst detection (rapid sends + paste-to-send too fast) triggers a lightweight verification

Added Rotating L2 step-up (MFA):

for secrets/tokens, challenges rotate (OTP/slider/hold) so it’s harder to automate and more robust

Built a Trust Ledger:

logs domain, risk, reasons, action, challenge type (metadata only)

Challenges we ran into

Reliably intercepting “Send” across different chat UIs without breaking normal behavior

Avoiding false positives (not annoying users) while still being strict with real secrets

Manifest V3 service worker constraints (state persistence + consistency across restarts)

Balancing security and usability: step-up must feel fast, not punitive

Ensuring the ledger never stores raw prompt content while still being useful for auditing

Accomplishments that we're proud of

Built a working “secure the Send button” system that’s demoable in under a minute

Implemented privacy-first protection (local detection + metadata-only logging)

Added Invisible CAPTCHA for automation-like behavior without image CAPTCHAs

Delivered rotating next-gen MFA challenges for high-risk prompts

Made decisions explainable with clear reasons instead of black-box blocking

What we learned

The biggest GenAI risk isn’t advanced hacking—it’s accidental leakage

Security products win when they’re adaptive: allow most, redact some, step-up only when needed

Explainability builds trust—users and judges accept controls when they understand “why”

MV3 extensions require careful handling of state and storage to stay reliable

What's next for Prompt Firewall

Enterprise managed policies: enforced defaults that orgs can deploy, with user add-on rules

Policy packs for schools/universities vs enterprise departments (Engineering/HR/Legal)

Safer substitution (format-preserving redaction) to keep prompts maximally useful for AI

Optional admin dashboard for aggregated, privacy-preserving metrics (no prompt content)

Built With

Share this project:

Updates