Inspiration
Identify the gap
Large language models (LLMs) are now primary sources of answers for many users.
Brands have SEO metrics for search engines but no easy way to measure visibility inside AI-generated answers.
Spark
We wanted a tool that treats LLMs like “search engines” and measures how often they cite or recommend a brand/website.
Real-world value
Marketing/PR teams, product owners, and SEO managers gain actionable insights about AI-driven discovery and reputation.
What it does (step-wise)
User submits a prompt (e.g., “best AI SEO tools”) or a list of queries.
Multi-LLM query: the system sends the prompt to several AI models (ChatGPT, Claude, Gemini, Perplexity — extendable).
Collect responses from each model and store raw output.
Parse responses to extract domain names, explicit brand mentions, and links.
Normalize and count occurrences (per model and overall).
Run a quick verification step with a web search API to validate any factual claims and confirm whether cited links actually exist.
Compute metrics:
Mentions per model
Overall mentions
GEO Score — a visibility percentage for a target domain (e.g., how often the domain appears across total model responses)
Display results on an interactive dashboard: charts, tables, and a fact-checked summary answer.
Allow export of results (CSV/JSON) and saving queries for historical tracking.
How we built it (step-wise, tech + implementation)
Project scaffold
Language: Python
Repo layout: /app (Streamlit UI) | /agents (LangChain logic) | /db (SQLite) | /utils (parsers/verifier)
Agents pipeline (LangChain / crew-style agents)
Prompt Agent — receives and normalizes user queries.
Multi-LLM Agent — concurrently calls configured LLM APIs and returns raw responses.
Analyzer Agent — extracts links, domains, and brand names via regex + lightweight NLP (spaCy / simple named-entity checks).
Verifier Agent — quick web lookups (e.g., using Bing or Google Custom Search API) to confirm sources/links and flag unsupported claims.
Visualization Agent — prepares dataframes and Plotly figure objects for the dashboard.
Final Answer Agent — synthesizes a concise, fact-checked summary combining model consensus and verification results.
UI & dashboard
Streamlit for rapid interactive UI.
Plotly for charts: bar charts (mentions per model), pie (share), time series (if tracking over time).
Interactive controls: target domain input, date range, model selection, number of prompts to run.
Storage & persistence
SQLite (lightweight) stores queries, raw responses, parsed mentions, verification results, timestamps, and GEO scores.
Concurrency & rate limits
Use async calls or threadpool for parallel LLM requests.
Rate-limit management and exponential backoff for API failures.
Security & privacy
Redact sensitive inputs before storage (opt-in).
Store API keys server-side in environment variables; do not expose in frontend.
Demo wiring
One-click demo mode: runs a few curated prompts and shows live charts for immediate presentation.
Challenges we ran into (and how we addressed them)
Inconsistent output formats across LLMs
Solution: a normalization layer (post-processing) that extracts domains/mentions with robust regex + token-based matching.
Noisy or hallucinated links in model answers
Solution: Verifier Agent that performs quick URL checks and flags broken or fabricated links; give models lower trust weight when unverified.
Rate limits & latency when querying multiple LLMs
Solution: parallelize calls, implement timeouts, and show partial results in the UI while slower models finish.
Ambiguous brand mentions (e.g., "apple" = fruit vs Apple Inc.)
Solution: context-aware entity disambiguation using simple heuristics + optional manual confirmation in the UI.
Balancing speed and verification depth
Solution: two verification modes — fast (domain existence + title match) and deep (page content checks) selectable by user.
Dashboard UX clarity
Solution: iterative feedback from teammates; emphasize simple charts, tooltips, and downloadable CSVs for judges.
Accomplishments that we're proud of
End-to-end prototype: from prompt input → multi-LLM responses → verification → dashboard visualization, all working in a single demo flow.
Accurate parsing: reliable domain/brand extraction across multiple LLM outputs with over ~90% precision in our test set.
Meaningful GEO metric: we designed a simple, interpretable GEO Score that helps stakeholders quickly understand AI visibility.
Fast demo-ready UI: Streamlit dashboard that loads results and interactive charts within seconds for common prompts.
Verification integration: implemented a lightweight verification step to reduce reliance on hallucinated model citations.
Extensibility: modular agent architecture makes it easy to add more LLMs or richer verification steps later.
What we learned
LLMs behave differently — each model has distinct writing style and citation habits; aggregating them yields richer insights than any single model.
Verification is essential — models can invent plausible-sounding sources; a verification pass prevents misleading conclusions.
Simplicity scales — a small set of robust parsing rules + a basic verifier gave much better demo value than over-engineered NLP.
UX matters for judges — clear visualizations and a concise final summary are what judges remember, not internal complexity.
Rate-limit planning — always design demos with fallback content or cached responses to avoid live API failures during presentation.
What’s next for Intelligent GEO (step-wise roadmap)
Add more LLMs & data sources
Integrate additional models (Mistral, LLaMA endpoints) and specialty Q&A engines.
Historical tracking & alerts
Save time series of GEO scores and send alerts when visibility increases or decreases for a domain.
Sentiment & context analysis
Add sentiment scoring to measure whether mentions are neutral, positive, or negative.
Ranking & attribution
Provide deeper attribution: did the mention come as a recommendation, part of a list, or a passing reference?
Team & enterprise features
Multi-user accounts, API for programmatic monitoring, and scheduled automated scans.
Improved verification
Fact extraction and structured citation mapping (match claim → citation → verification confidence).
Monetization & integrations
Export to Google Data Studio, Slack alerts, and integrations for PR/SEO tools.
Production hardening
Move DB to cloud (Postgres), add authentication, CI/CD pipeline, and cost monitoring for LLM calls.
Built With
- axios
- client:
- dom
- fastapi
- framer
- frontend:-react-18
- http
- icons:
- integration:
- lucide
- motion
- react
- reactbackend
- router
- routing:
- tailwind-css
- vite
Log in or sign up for Devpost to join the conversation.