Discover startup opportunities hidden in social conversations.
Built for AI Accelerate: Unlocking New Frontiers - helping entrepreneurs validate ideas and discover problems before building.
StartupRadar analyzes thousands of real discussions from Reddit, Hacker News, YouTube, and Product Hunt to help you:
- Discover trending problems people are struggling with
- Validate startup ideas with real market demand data
- Find early adopters before you build
- Analyze market opportunities with AI-powered insights
Enter any startup idea and get instant validation backed by real social media discussions:
- Market Demand Score (0-100)
- Problem Severity Analysis
- Competition Level Assessment
- Monetization Potential
- Target User Identification
- BUILD IT / MAYBE / DON'T BUILD verdict
Advanced aggregations showing:
- Trend Over Time - Discussion volume by hour/day
- Platform Breakdown - Where conversations happen most
- Sentiment Distribution - Positive/neutral/negative analysis
- Peak Activity Hours - When to post content (with timezone)
- Top Posts by Engagement - Most valuable discussions
Using Elastic's Open Inference API + Vertex AI:
- Stage 1: Fast BM25 + vector search → Top 100 results
- Stage 2: Vertex AI semantic reranking → Top 20 most relevant
- Result: 95% relevance vs 70% with standard search
Ask questions about search results with:
- Answers grounded in real discussions
- Live citations showing source posts
- Conversational follow-ups
Combines BM25 keyword matching with dense vector similarity using Vertex AI embeddings (text-embedding-004)
Showcases advanced features:
- date_histogram for time-series trends
- terms aggregations for categorical data
- avg metrics for statistics
- Painless scripting for custom hour extraction
- top_hits for document sampling
- Creates inference endpoint connecting Elasticsearch → Vertex AI
- Uses semantic-ranker-512@latest for reranking
- Implements retrievers API with text_similarity_reranker
- Two-stage retrieval for optimal relevance/speed balance
- Gemini 2.5 Flash - Fast idea validation and chat (high-volume queries)
- Gemini 2.5 Pro - Detailed opportunity analysis (deep analysis only)
- text-embedding-004 - 768-dimensional embeddings for semantic search
- semantic-ranker-512 - AI-powered result reranking via Elasticsearch
- Cost-optimized: Flash handles volume, Pro handles depth
| Category | Technologies |
|---|---|
| Search & Database | Elasticsearch 8.14+ (single database - handles search, vectors, analytics) |
| AI & ML | Vertex AI (Gemini 2.5 Flash/Pro, text-embedding-004, semantic-ranker-512) |
| Backend | Next.js 14 API Routes (serverless, stateless, no authentication) |
| Frontend | Next.js 14, React, TailwindCSS, TypeScript |
| Data Sources | Reddit, Hacker News, YouTube, Product Hunt |
| Deployment | Vercel (serverless functions), Elasticsearch Cloud |
- Node.js 18.17+
- Google Cloud account with Vertex AI enabled
- Elasticsearch Cloud account (free trial available)
- Google Cloud CLI installed
git clone <your-repo-url>
cd StartupRadar
npm installCreate .env:
# Google Cloud / Vertex AI
GOOGLE_CLOUD_PROJECT_ID="your-project-id"
GOOGLE_CLOUD_LOCATION="us-central1"
GOOGLE_APPLICATION_CREDENTIALS="<service-account-json or base64>"
# Elasticsearch
ELASTIC_CLOUD_ID="your-cloud-id"
ELASTIC_API_KEY="your-api-key"
# Reddit, ProductHunt, HackerNews and Youtube API Keys
REDDIT_CLIENT_ID=""
REDDIT_USER_AGENT=""
REDDIT_CLIENT_SECRET=""
REDDIT_USER_AGENT=""
PRODUCTHUNT_CLIENT_ID=""
PRODUCTHUNT_CLIENT_SECRET=""
PRODUCTHUNT_API_TOKEN=""
HACKERNEWS_API_URL=""
YOUTUBE_API_KEY=""
gcloud auth application-default login
gcloud config set project your-project-idnpm run setup-esThis creates the social_signals index with mappings for:
- Dense vectors (768 dimensions)
- Sentiment analysis fields
- Quality metrics
- Platform metadata
Prerequisites:
- Enable Discovery Engine API in Google Cloud Console
- Grant Discovery Engine Viewer role to service account
- Wait 2-3 minutes for API propagation
npm run setup-rerankingThis creates the Vertex AI reranking inference endpoint using Elastic's Open Inference API.
npm run collect-dataFetches ~100 posts from Reddit, Hacker News, YouTube, Product Hunt and indexes with embeddings. Takes about 5-10 minutes.
npm run devSearch: "struggling with meeting notes"
→ Finds 60 discussions about meeting note pain points
→ Shows trend: Rising 40% in last 30 days
→ Identifies: Remote teams, managers, consultants
Idea: "AI-powered meeting notes with action item extraction"
→ Market Demand: 85/100 (Strong)
→ Willingness to Pay: 72/100 (High)
→ Verdict: BUILD IT
→ Recommendation: Focus on async teams, $20/mo SaaS
Opportunity Analysis → Early Adopters Section
→ Shows top 10 users discussing the problem
→ Platforms: r/Entrepreneur, r/startups, Hacker News
→ Engagement levels: 50+ upvotes, 20+ comments
StartupRadar/
├── app/
│ ├── api/
│ │ ├── analytics/route.ts # Analytics aggregations
│ │ ├── analyze-opportunity/ # AI opportunity analysis
│ │ ├── validate-idea/ # AI idea validation
│ │ ├── chat/route.ts # Grounded AI chat
│ │ └── search/route.ts # Hybrid search + reranking
│ ├── components/
│ │ ├── AnalyticsDashboard.tsx # Visual analytics (charts)
│ │ ├── OpportunityReport.tsx # Opportunity analysis UI
│ │ └── SearchResults.tsx # Search results display
│ ├── dashboard/page.tsx # Main search interface
│ ├── validate/page.tsx # Idea validation page
│ └── page.tsx # Landing page
├── lib/
│ ├── ai/
│ │ ├── embeddings.ts # Vertex AI text-embedding-004
│ │ ├── grounding.ts # Vertex AI grounded chat
│ │ ├── idea-validator.ts # AI idea validation logic
│ │ └── opportunity-analyzer.ts # AI opportunity scoring
│ ├── elasticsearch/
│ │ ├── client.ts # ES client setup
│ │ ├── search.ts # Hybrid search + reranking
│ │ ├── analytics.ts # Advanced aggregations
│ │ └── reranking.ts # Open Inference API setup
│ ├── connectors/
│ │ ├── reddit.ts # Reddit API
│ │ ├── hackernews.ts # Hacker News API
│ │ ├── youtube.ts # YouTube API
│ │ └── producthunt.ts # Product Hunt API
│ └── types/index.ts # TypeScript interfaces
├── scripts/
│ ├── setup-elasticsearch.ts # Create ES index
│ ├── setup-reranking.ts # Setup reranking endpoint
│ └── collect-data.ts # Data collection job
└── package.json
-
Deduplication turned into a bigger problem than expected. The same discussion often appears multiple times, sometimes with slight title variations or different URLs. I built a normalization system that strips protocols and URL parameters, normalizes titles by removing punctuation and truncating to 100 characters. Even then, some duplicates slip through when titles are significantly reworded.
-
API rate limits hit hard once I started scaling data collection. YouTube's quota system is particularly brutal—you burn through your daily limit fast. Reddit blocks you if you make requests too quickly. I added delays between requests (1 second for Reddit, 2 seconds for embedding batches) and implemented
Promise.allSettledso one platform failure doesn't kill the entire collection job. -
Platform-specific quirks created a normalization nightmare. Product Hunt uses GraphQL with nested topic structures (
topics.edges[].node.name), Reddit has inconsistent formats whereselftextmight be empty and timestamps are in Unix seconds, YouTube requires two API calls (search, then fetch details) with engagement metrics as strings needing parsing, and HackerNews has two separate APIs (Firebase and Algolia) with stories that can bedeletedordead. I solved this with a unifiedSocialPostinterface that normalizes everything—YouTube'slikeCountand Reddit'sscoreboth map to a genericscorefield, all timestamp formats convert to JavaScriptDateobjects, and each connector has anormalizefunction. -
The background collection pipeline needed careful orchestration. Fetching from four platforms, running sentiment analysis, calculating quality scores, generating embeddings, and bulk indexing—all while handling partial failures gracefully. I use
Promise.allSettledeverywhere, so one platform timeout doesn't break the entire job.
-
Validation API performance - The
/api/validate-ideaendpoint takes nearly a minute to complete. The main bottleneck is sequential Gemini calls and Elasticsearch searches. Solution: Multi-layer caching. Cache Elasticsearch results for identical search queries (1-hour TTL), cache generated embeddings for common keywords, and cache entire validation reports for identical ideas (6-hour TTL). This could drop response time from 60 seconds to under 2 seconds for cache hits. Redis with query hash keys would handle this cleanly. -
Authentication & rate limiting - Currently everything is public with no usage tracking. Adding NextAuth or Clerk would enable user accounts, and token bucket rate limiting per user would prevent abuse.
-
Better deduplication - The current URL/title normalization still lets duplicates through when titles are significantly reworded. A similarity-based approach using embeddings could catch near-duplicates more effectively.
- Add more platforms (X, GitHub, Stackoverflow, Quora)
- Multi-layer caching (Redis) for validation API
- Authentication system (NextAuth/Clerk)
- Add export functionality (CSV, PDF reports)
- Multi-lingual support
- Browser extension for on-the-fly validation
Built with:
- Elasticsearch - For powerful hybrid search and aggregations
- Google Cloud Vertex AI - For embeddings, reranking, and Gemini models
- Next.js - For the amazing developer experience
Made for the AI Accelerate: Unlocking New Frontiers
Find problems worth solving. Build startups people want.