The Story Behind Lens

A Chrome extension that highlights potentially biased language and helps readers see other angles.

What Inspired Us

We were inspired by two things: media literacy and the way we read online.

First, FAIR’s “How to Detect Bias in News Media” (fair.org) gave a clear framework: loaded language, unchallenged assumptions, false balance, headline-vs-reality, and emotional appeals. We wanted to bring that kind of thinking into the browser—not to tell people what to believe, but to nudge them to look further and think for themselves.

Second, how easy it is to stay in one lane. We often read one article, one headline, one take. We wanted a tool that would:

  • Surface where language might be pushing a view or skipping nuance
  • Suggest other perspectives via simple search links (e.g. “critics of [topic]”, “alternative view on [topic]”)
  • Flag unverified factual claims so readers can choose to fact-check

So the name Lens fit: viewing content through a clearer lens—spotting bias and seeing other angles without replacing your own judgment.


What We Learned

1. Chrome Extension Architecture (Manifest v3)

  • Service worker (background.js) as the only long-lived script: no background.html, so all API calls, caching, and throttling live in the worker. We had to get used to the worker lifecycle (e.g. it can be terminated, so in-memory state is limited).
  • Message passing: popup → content script (get page text, apply highlights), popup → background (run analysis), background → (indirectly) content via popup. Keeping request/response and error handling clear was important.
  • Content script injection: On tabs opened before the extension was installed, the content script might not be there. The popup now uses chrome.scripting.executeScript to inject content.js and CSS if the first message fails, then retries—so “Analyze” works even on those tabs after a click.

2. LLMs and Structured Output

  • Getting reliable JSON from Groq, Gemini, and OpenAI meant:
    • A strict system prompt that says “Output only valid JSON, no markdown or extra text.”
    • Post-processing: stripping optional markdown code fences (e.g. ```json ... ```) and a truncation repair in parseAnalysisJson() that closes unterminated strings and brackets when the model hits max_tokens—so a cut-off response still parses.
  • Bias detection is subjective. The prompt is grounded in FAIR’s categories, but the model’s choices are opinion-based. We learned to treat the extension as a prompt to think critically, not ground truth, and to say that clearly in the UI (“Lens doesn’t decide what’s true…”).

3. In-Page DOM and Highlights

  • Finding the main article: Heuristics (e.g. article, [role="main"], main, .article-body) plus a minimum length (~200 characters) to avoid nav/footer. Fallback is document.body.
  • Wrapping phrases without breaking layout: Using TreeWalker over text nodes, building a single concatenated string to find the first occurrence of the exact phrase, then mapping back to (startNode, startOffset) and (endNode, endOffset), and using Range.surroundContents(span). Wrapping only the first occurrence keeps the sidebar and in-page highlights in sync (each sidebar item scrolls to one span).

4. Caching and Throttling

  • Cache key: URL + a short hash of the first 200 characters of the body text, so small edits change the key. TTL is 30 minutes (CACHE_TTL_MS).
  • Throttling: A minimum delay (e.g. 2 seconds) between analyses per tab to avoid rate limits and accidental double-clicks. Both caching and throttling reduced API cost and made the extension feel more predictable.

5. Credibility Score (and a Bit of Math)

We wanted a simple, transparent “credibility” signal that combined:

  • Counts of highlighted bias, unverified, and emotional appeal phrases
  • Whether sources are cited and whether they’re credible (from the LLM’s sources assessment)

A simple linear penalty model works well. Start at 100 and subtract per issue, then clamp to ([0, 100]):

[ S_{\text{raw}} = \max\left(0,\; \min\left(100,\; 100 - 6 n_{\text{bias}} - 5 n_{\text{unverified}} - 5 n_{\text{emotional}}\right)\right) ]

Then adjust for sources (e.g. (+5) if “cited” and “credible” are both “yes”, (-8) if either is “no”):

[ S = \max(0,\; \min(100,\; S_{\text{raw}} + \Delta_{\text{sources}})) ]

So the sidebar shows a score that reflects both flagged language and source quality, with the formula described in the UI so it’s not a black box.


How We Built the Project

High-Level Flow

  1. User opens an article → clicks LensAnalyze this page.
  2. Popup asks the content script for title and main bodyText (with injection fallback if needed).
  3. Popup sends ANALYZE_PAGE to the background with { url, title, bodyText, tabId }.
  4. Background checks API config (env or chrome.storage), throttle, and cache. On cache miss, it calls Groq / Gemini / OpenAI with a single system prompt (FAIR-aligned) and parses JSON.
  5. Optional: for each fact_check highlight, Google Fact Check Tools API is called if a key is set, and results are attached to the highlight.
  6. Background responds with { ok, data }. Popup sends APPLY_ANALYSIS to the content script with highlights, topic, search queries, AI/source metadata.
  7. Content script wraps phrases in spans (bias / fact_check / emotional_appeal), shows tooltips on hover, and updates the sidebar (topic, summary, credibility score, highlighted sections, recommended searches, optional AI-written/AI-image and source notes).

Tech Stack

  • Manifest v3: one service worker, content script, popup, options page.
  • APIs: Groq (default, free tier), Google Gemini, OpenAI (gpt-4o-mini); optional Google Fact Check Tools.
  • Storage: chrome.storage.local for API keys and cache entries (key → { data, expiresAt }).
  • Env: Optional .env + node scripts/inject-env.js to generate env-config.js so the extension can use keys without pasting in the UI.

Design Choices

  • One shared system prompt for all three LLM providers so behavior is consistent; only the API calls and response shapes (e.g. Gemini’s systemInstruction vs OpenAI messages) differ.
  • Pink/teal for bias, blue-teal for unverified, teal for emotional appeal so the three types are distinct but harmonious (dark navy sidebar, same palette in content.css).
  • Sidebar is collapsible (◀ / ▶) so it doesn’t permanently occupy space; “Clear highlights” removes all spans and closes the sidebar.

Challenges We Faced

1. Truncated JSON from the LLM

With max_tokens: 2000, long articles sometimes produced cut-off JSON and JSON.parse threw. Fix: in parseAnalysisJson(), on SyntaxError with “position N”, take the substring up to N, detect if we’re inside an open string (track quotes and escapes), close the string and then close any open {/[ with }/] in reverse order, and re-parse. That recovered most truncated responses without changing the prompt.

2. Content Script Not Loaded on Some Tabs

If the user installed the extension after opening a tab, the content script wasn’t there, so LENS_GET_PAGE_CONTENT failed. Fix: in the popup, if the first sendMessage fails, call chrome.scripting.executeScript / insertCSS to inject content.js and content.css, then send the message again. Users can analyze without refreshing.

3. Phrase Matching in the DOM

The model returns exact substrings from the article, but the DOM can have different whitespace or hidden elements. I stuck to “exact phrase, first occurrence only” and used the same body text sent to the API so the content script and model see the same text. When the phrase isn’t found, that highlight is skipped for in-page wrapping but can still appear in the sidebar with a “Search to verify” link.

4. Rate Limits (e.g. Groq)

Free tiers often have TPM (tokens per minute) limits. We added throttling per tab and retry with backoff for Groq on 429: parse “try again in X seconds” from the error body, wait that long plus a second, then retry once. The popup also shows a user-friendly “Groq rate limit…” message so users know to wait and try again.

5. Keeping “Bias” Actionable and Fair

Bias detection can feel accusatory. We tried to keep the tone analytical: tooltips and sidebar use “This may be…” and “Worth verifying…”, and the sidebar explicitly says “Lens doesn’t decide what’s true. It nudges you to look further and think for yourself.” The goal is to support critical reading, not to label articles as “good” or “bad.”

6. Credibility Score Without False Precision

A single number can imply more precision than we have. We used a simple formula (linear penalties + source adjustment), documented it in the sidebar (“Based on bias, unverified claims, emotional appeal, and source quality”), and kept the score as a rough cue rather than an authoritative grade.


Summary

Lens was inspired by media literacy (FAIR) and the desire to help readers see bias and other perspectives without telling them what to think. Building it taught us Manifest v3 architecture, robust LLM JSON handling, DOM-safe highlighting, and the value of caching and throttling. The main challenges were truncated API responses, content script injection on existing tabs, and designing a credibility score and UX that stay honest about the limits of automated bias detection. The result is a Chrome extension that highlights potentially biased or unverified language and suggests searches so users can look further—through a clearer lens.

Built With

Share this project:

Updates