Inspiration

As developers, we’ve all seen projects where the README lies—code changes fly in, but docs are left behind. Outdated instructions, missing API references, or broken links create friction for users and contributors. Since this hackathon is sponsored by Mintlify, a company that lives and breathes developer documentation, we wanted to build something that shines a light on this problem and helps teams keep their docs as fresh as their code.

What it does

Our project, DocFresh, automatically scans a GitHub repository and detects stale documentation. It computes a Repo Freshness Score using signals such as: Age Gap between code and docs commits. API Coverage: checks whether public classes/functions are mentioned in docs. Link Rot: detects broken external references. Version Skew: finds mismatches between documented versions and repo config. Scans run in two ways today: Manual: via a “Run Scan” button on the web UI. Scheduled Cron: every 5 minutes in the background.

Results show up live on the dashboard, with a trend of recent scans. We integrated Gemini AI to generate concise suggestions for fixing stale docs.

How we built it

Backend: Python + Flask. Exposed REST endpoints (/scan, /report, /history). Used GitPython to clone repos and read commit metadata. APScheduler runs scans on a fixed interval. Scanner Engine: Python modules. File discovery for docs vs code. AST parsing for Python, javalang for Java. Heuristic scoring for freshness.

Frontend: Simple Bootstrap + Jinja templates. One-page dashboard with Repo Score card + Recent Scans table. Auto-refresh every 15 seconds to show new cron/manual results. AI Suggestions: Optional integration with Google Gemini API. Summarizes report.json and outputs 2–3 actionable doc fixes. Storage: Lightweight — JSON files written to /tmp/docfresh/.

Challenges we ran into

Git history depth: shallow clones didn’t always have enough commits to measure doc vs code age. We had to implement relative paths + deeper fetches. Parsing APIs reliably: Python was easy with ast, but Java required javalang, and modern features sometimes broke parsing. Time pressure: balancing accuracy vs speed in <4 hours meant focusing on heuristics instead of heavy ML. LLM integration: prompt tuning Gemini to return short, clean bullets instead of long paragraphs. Accomplishments that we're proud of Built a full-stack app in just a few hours. Live scans triggered by cron and displayed in a clean UI. A scoring system that’s explainable and tunable. Integration with Gemini to give real AI-powered doc suggestions. Collaborated as a 4-person team, each contributing backend, frontend, engine, or AI integration.

What we learned

How to leverage Git commit metadata to quantify documentation freshness. That exact symbol coverage is a surprisingly good heuristic for doc staleness. How to integrate cron jobs and manual triggers into a single workflow. Best practices for prompting LLMs for concise, actionable developer doc suggestions. The importance of scope control in hackathons: better to ship 4 strong signals than over-engineer.

What’s next for Stale Documentation Detector

-Webhook integration: trigger scans automatically on every GitHub push or PR. Deeper language support: add TypeScript, Go, Rust via tree-sitter. Better scoring: weight signals dynamically or train on real doc health metrics. PR integration: comment inline on GitHub Pull Requests with freshness warnings. Mintlify synergy: export stale docs directly into Mintlify’s editor for one-click updates. Dashboard polish: trends over time, org-wide views, and Slack alerts. AI-powered autofixes: go beyond suggestions and auto-generate draft doc patches.

Built With

Share this project:

Updates