Dev Dump

Inspiration

Hackathon listings are scattered across many platforms. Participants and organizers frequently miss deadlines or spend time searching multiple sites. Dev Dump was created to centralize listings from Devpost, Devfolio, MLH, and similar sources so it's easier to discover and compare events.

What it does

Dev Dump scrapes, normalizes, deduplicates, and stores hackathon listings from several platforms. It provides a unified, searchable list of upcoming hackathons with key metadata (dates, deadlines, organizers, location/format). Each listing links back to the original source.

How we built it

Framework: Next.js (App Router) for server-rendered pages and fast UX.
Storage: MongoDB for normalized hackathon records and metadata.
Scraping: a Scrapy-based bot (custom spiders) scrapes Devpost, Devfolio, and MLH. Spiders parse and normalize fields, then a Scrapy pipeline writes cleaned records into MongoDB. Scrapers are scheduled (cron or serverless jobs) to refresh listings.
API & UI: Next.js API routes serve the aggregated feed; server components render landing and listing pages. UI uses Tailwind CSS and modular components.
Integrations: Vercel analytics for usage insights.

Challenges we ran into

Source variability: differing page structures and fields per platform required custom parsing logic.
Rate limits & anti-scraping: handling throttling and respecting robots.txt while keeping listings fresh.
Deduplication: reliably matching the same event across multiple sources required fuzzy matching and heuristics.
Timezones & date formats: normalizing deadlines across formats and zones was tricky but essential.
Freshness vs. cost: choosing a refresh cadence that balances up-to-date data with resource limits.

Accomplishments we're proud of

A unified feed that surfaces hackathons from multiple major sources.
A modular scraper architecture that makes adding new sources straightforward.
Clean, fast server-rendered UI and useful stats for quick discovery.
Persistent data storage enabling future features like alerts and analytics.

What we learned

Heterogeneous source data benefits from defensive parsing and a small, consistent canonical schema.
Respectful scraping practices (rate limits, backoff, robots.txt) keep the service sustainable.
Next.js server components simplify rendering fresh, server-driven content without sacrificing performance.
Proper MongoDB indexing and schema design improve feed query performance.

What's next for Dev Dump

Add more sources (regional sites, university pages, RSS feeds).
Implement user features: saved searches, filters, and deadline alerts (email/push).
Improve deduplication via fuzzy matching and canonical IDs.
Add caching, pagination, and stricter rate-limiting for scalability.
Expose an RSS/JSON feed and a public API so others can consume the aggregated listings.
Add tests, monitoring, and CI for scrapers to detect breakages early.