About the Project
Inspiration
Every hackathon participant faces the same two challenges: generating original, winning ideas and ensuring they're not accidentally recreating existing projects. After attending multiple hackathons and witnessing teams struggle with idea validation, we built Blueprint to solve both problems using advanced AI and semantic analysis.
Traditional plagiarism detectors rely on keyword matching, leading to 70% false positives—flagging "AI chatbot for therapy" as similar to "AI chatbot for customer service" despite solving completely different problems. We knew there had to be a better way.
What it does
Blueprint is a dual-purpose platform that empowers hackathon participants:
Generate Winning Ideas: Enter any hackathon URL and receive 7 tailored, competition-ready project ideas. Our AI analyzes the hackathon rules, studies past winning projects, and generates ideas complete with implementation roadmaps, tech stacks, and judge appeal strategies.
Verify Originality: Already have an idea? Our semantic plagiarism detector searches Devpost and GitHub in real-time, analyzing similarity across four dimensions—problem domain (35%), solution approach (40%), implementation details (15%), and use case (10%)—to give you an evidence-based originality score.
How we built it
Frontend: React + Vite with real-time Server-Sent Events (SSE) for live progress updates as projects are discovered and analyzed.
Backend: FastAPI with async/await architecture for concurrent processing. We implemented intelligent rate limiting with exponential backoff and 24-hour caching to respect API limits.
AI Engine: Claude Sonnet 4 along with Gemini 2.5 flash powers our multi-stage analysis pipeline—first extracting hackathon requirements, then analyzing winning patterns, and finally generating ideas or evaluating semantic similarity.
Web Scraping: Custom BeautifulSoup4 scrapers with smart duplicate detection and URL normalization. We search Devpost (hackathon-specific projects) and GitHub (open-source API implementations) using AI-generated search strategies that capture semantic meaning, not just keywords.
Semantic Analysis: Our breakthrough four-dimensional weighted scoring system evaluates projects on problem-solution decomposition rather than keyword matching. We apply intelligent corrections for project age (ideas older than 2 years score 15% lower), domain saturation (common patterns like chatbots score 10% lower), and different solution approaches to the same problem (capped at 45% similarity to avoid false positives).
Challenges we ran into
False Positives in Similarity Detection: Keyword-based matching flagged unrelated projects as similar. We solved this by implementing multi-dimensional semantic analysis that separately evaluates problem domain and solution approach, reducing false positives by 70%.
API Rate Limiting: Scraping 100+ projects caused 429 errors. We implemented adaptive exponential backoff (1s → 2s → 4s → 8s), request pooling, and intelligent caching to stay within limits.
Real-Time Streaming with Large Datasets: Users waited minutes without feedback. We implemented Server-Sent Events to stream projects as they're discovered, showing live progress bars and AI analysis updates.
Context Window Limitations: Claude's 200K token limit couldn't handle 100 projects × 500 words each. We truncate descriptions to 300 words, analyze only the top 20 most relevant projects, and use batch processing with smart prioritization.
Accomplishments that we're proud of
- 70% reduction in false positives compared to traditional keyword-based plagiarism detection
- 10x faster processing with async/await concurrent architecture
- Real-time streaming updates for responsive user experience during long operations
- Multi-dimensional semantic analysis with four weighted factors for accurate similarity scoring
- Production-ready implementation with comprehensive error handling, rate limiting, and security best practices
What we learned
Semantic NLP beats keyword matching: Understanding context and meaning is crucial for accurate similarity detection. Separately evaluating problem domain and solution approach dramatically improves accuracy. Weight distribution matters: Problem domain (35%) and solution approach (40%) are the most important factors in determining true similarity—implementation details alone don't indicate plagiarism. Temporal context is key: Ideas naturally evolve over time. Projects older than 2 years should be weighted less heavily in originality assessments. Streaming architecture improves UX: For operations taking 5-10 minutes, showing progressive results keeps users engaged and provides transparency into the AI's reasoning process. Multi-source validation: Combining Devpost (hackathon-specific) and GitHub (open-source) provides comprehensive coverage for detecting similar projects.
Built With
- bs4
- claude
- fastapi
- gemini
- javascript
- python
- react
- vite

Log in or sign up for Devpost to join the conversation.