Sentinel: The Web Extension for Real-Time Misinformative Content Detection

Inspiration

Ever since the 2024 elections, deceptive digital content has surged across major social media platforms. While deepfakes and AI-generated media get most of the attention, they are only part of a much larger issue. Users are now exposed to manipulated videos and images, impersonation scams, fake giveaways, phishing links, fabricated statistics, coordinated bot activity, and misleading financial or political claims.

Although platforms have introduced labeling and disclosure tools, these systems often rely on manual reporting and inconsistent moderation. Many posts go unlabeled or lack clear explanations, leaving users confused and unsure what to trust. Verifying a claim usually requires leaving the app, searching for outside sources, and interrupting the normal browsing experience on platforms like X, TikTok, or Instagram.

The real problem is not just AI-generated content. It is the lack of a seamless, real-time trust layer that helps users quickly understand whether something is manipulated, misleading, or part of a scam without forcing them to do deep research on their own.

We wanted to help everyday users "scroll smarter, not paranoid" by restoring trust to social media without draining their battery or demanding their data. Our mantra became: "Fight AI slop with AI smarts"

What it does

Sentinel is a Web (Chrome) Browser Extension that acts as a real-time trust and safety layer for social media. By parsing social media DOMs (starting with X/Twitter) as you scroll, Sentinel automatically analyzes images and videos for AI-generation artifacts and patterns.

Sentinel injects instant, glanceable side note badges right onto the posts on your feed—hovering alongside each tweet with extracted metadata including user info, media thumbnails, engagement metrics, and video URLs. As the extension spots technical signs of synthesis like unnatural diffusion patterns or GAN-generated face artifacts, each of these badges visually changes to reflect the level of risk a post might have of being AI-generated.

If a user wants to investigate further, they can manually trigger a "Deep Scan" which opens the Sentinel Copilot, an interactive Sphinx agent that can explain why an image is flagged using natural language reasoning and giving comparative forensic metrics.

How we built it

To maximize frontend performance and efficiency while minimizing API costs, we built a unique dual-layer routing architecture:

The Frontend

Built with vanilla JavaScript and Manifest V3 for modern extension standards. We use:

DOM interception via MutationObserver to passively watch for new tweets as users scroll
Dual-world injection: A MAIN-world script (interceptor.js) patches fetch() and XMLHttpRequest to intercept Twitter's GraphQL API responses and extract direct mp4 video URLs
ISOLATED-world content script (Content.js) that receives video URLs via CustomEvent, then extracts all tweet metadata (text, profile images, usernames, engagement metrics, media) and renders floating side-note badges
Advanced selectors that leverage Twitter's stable data-testid attributes for reliable extraction across UI updates
Intelligent positioning that dynamically places badges to the right of tweets using getBoundingClientRect() and auto-updates on scroll/resize

The Backend

We chose Python/FastAPI to natively support the Sphinx SDK and run local ML models. Our stack includes:

Layer 1 (The Live Feed): A passive MutationObserver watches the DOM. When a new image loads, we generate an OpenAI CLIP embedding using the sentence-transformers/clip-ViT-B-32 model and query our Actian VectorAI DB for blazing-fast KNN similarity search (<100ms) against known threats. If it's a cache miss, we fall back to two local Hugging Face Vision Transformers:

Organika/sdxl-detector – Best for detecting Stable Diffusion, Flux, Midjourney (the dominant AI content on Twitter)
dima806/deepfake_vs_real_image_detection – Specialized GAN face detector for StyleGAN and deepfake faces

Layer 2 (The Deep Scan): For heavy reasoning compute, we integrated the Sphinx Reasoning SDK via sphinx-cli bridge. Sphinx generates Jupyter notebooks containing structured JSON analysis, which we parse to extract trust scores, risk levels, and user-friendly reasoning summaries. The agent interprets signals from:

CLIP embedding similarity scores
Actian vector search neighbor clusters
Local detector model confidence scores
Textual pattern analysis from tweet content

We also configured environment hooks for SafetyKit API and Hive Moderation API for future external content moderation layer integration.

Infrastructure

Docker Compose orchestration with python:3.12-slim backend and williamimoh/actian-vectorai-db:1.0b container (Not Required but Additionally Included)
Actian VectorAI running on port 50051 as persistent vector database with cosine distance metrics
FastAPI backend on port 8000 with async httpx for image fetching

Challenges we ran into

Initially, testing revealed that waiting for AI analysis meant users could scroll past an image before our badge appeared.
Injecting our side notes into the DOM repeatedly triggered our own MutationObserver to kick in, sometimes causing infinite loops.
Unlike other social media platforms, Twitter's video architecture uses HLS streaming with nested m3u8 playlists. We had to intercept both GraphQL API responses (containing video metadata) and chrome.webRequest events (containing service worker video requests) to reliably extract direct mp4 URLs.
Chrome's Manifest V3 enforces strict content script isolation. We needed MAIN-world access to patch fetch() (since ISOLATED-world can't intercept native APIs), but also needed ISOLATED-world to safely inject DOM elements.
The Sphinx SDK generated Jupyter notebooks asynchronously, which often required careful file I/O coordination and revision.
Loading multiple large Vision Transformers (CLIP, SDXL detector, GAN detector) often overwhelmed memory and slowed the extension runtime.

Accomplishments that we're proud of...

Implementing a UX where users don't have to click, upload, or leave their feed (everything happens passively as they scroll).
Incorporating Actian VectorAI to deliver real-time pattern and memory comparison feel instant.
Designing a Dual Model architecture combining SDXL and GAN-face detectors gives us coverage across modern and legacy deepfake architectures.
Deploying a complex cross-world communication, API interception, and dynamic DOM injection within strict security constraints.
Engineering a functional backend (FastAPI, Sphinx, Actian) that can even run in Docker with functional vector storage.

What we learned

Manifest V3's security model forced us to really dig in and learn Chrome's extension development content script execution contexts and develop creative IPC solutions.
Utilizing Vector databases, such as Actian VectorAI, can be really efficient, as it optimized similarity search runtimes to fit a real-time pattern recognition viable in a browser extension.
We learned to constrain the context window and parse only structured JSON outputs to avoid hallucinations and keep explanations grounded in technical signals.
Using data-testid attributes was more reliable than CSS classes, but aggressive dynamic content loading required careful handling of race conditions and retry logic.
We quickly learned that no single model is perfect (combining CLIP embeddings, SDXL detection, GAN detection, and vector search gives a more confident signal than relying on any single score).

What's next for Sentinel

In the future, we would...

Expand beyond X/Twitter (due to time constraints) and implement better models and versions that might be able to handle other platforms like Instagram, TikTok, LinkedIn, and Reddit with platform-specific DOM parsers.
Try to develop a version that implements OpenCV frame extraction with temporal consistency scoring to detect video deepfakes with timeline heatmaps (further research on legal CV policy required).
Add a popup UI showing user's browsing patterns, most-encountered fake accounts, and cluster visualizations.
Allow users to opt-in to sharing anonymized threat patterns, building a stringer decentralized vector database of known fakes.
Port lightweight detectors to WASM for client-side inference, reducing backend API calls for common cases.
Expand Actian's clustering capabilities to identify coordinated inauthentic behavior across multiple accounts.
Provide alert to users when high-risk content from suspicious accounts appears in their feed.
Extend Sphinx agent capabilities to analyze non-English content and detect regional and culturally specific manipulation tactics.