Inspiration
Modern developers don’t fail because they can’t code — they fail because debugging has become slow, fragmented, and mentally expensive. Stack traces, vague runtime errors, partial fixes from forums, hallucinated AI responses — all stitched together manually.
While experimenting with LLM-based coding assistants, one uncomfortable truth became obvious: Most tools generate answers, not explanations. They might fix code, but they don’t explain why it broke, how to verify the fix, or what the developer should learn from it.
CodeReason was built to attack that exact gap: Not “generate code”, but reason about failures like a senior engineer.
What it does
CodeReason is a production-oriented debugging assistant that analyzes failing code and returns:
Root cause analysis (not just symptoms)
Step-by-step diagnostic reasoning
Minimal, safe code fixes (diff-only)
Clear learning takeaways to prevent repeat mistakes
Instead of dumping raw code or speculative suggestions, every response follows a strict, validated schema:
Execution status
Structured runtime error
Ordered diagnostic steps
Unified diff for fixes
Test impact summary
Learning takeaway
This makes CodeReason predictable, auditable, and usable in real engineering workflows — not just demos.
How we built it
Architecture (High Level)
Backend: FastAPI (Python)
Schema enforcement: Pydantic (strict, extra="forbid")
LLM orchestration: Gemini 3 (primary), with fallback logic
Persistence: PostgreSQL + SQLAlchemy 2.0
Security:
API keys hashed using Argon2id
No raw code stored — only diffs and structured metadata
Error handling:
502 for malformed upstream responses
503 for model unavailability
Deterministic failure paths Key Design Decisions (Intentional, Not Accidental)
No raw code storage Prevents IP leakage and privacy risks.
Diff-only fixes Forces minimal, reviewable changes instead of code dumps.
Linear request flow Easier to debug, test, and reason about under failure.
Schema-first responses If the model outputs junk, the API rejects it — no silent corruption.
Challenges we ran into
- LLM Output Reliability
LLMs frequently violate schemas under edge cases. Solution: tolerant parsing + hard validation + explicit failure responses.
- API Rate Limits During Testing
Heavy testing quickly exhausted paid API quotas. Solution: designed the system to support local and open-weight model fallbacks (Ollama / Hugging Face).
- Avoiding “AI Hallucination as Truth”
LLMs confidently produce wrong fixes. Solution: enforce diagnostic steps before fixes and require justification for every change.
- Production vs Demo Tradeoffs
Most hackathon projects ignore real-world constraints. We didn’t. This slowed development — but massively increased credibility.
Accomplishments that we're proud of
Built a strictly validated, production-grade API
Zero raw-code persistence by design
Robust error classification (not generic “AI failed” messages)
Clean separation between analysis, fix, and learning
Fully testable backend with deterministic behavior
Designed to scale beyond a hackathon into a real product
What we learned
LLMs are powerful — but unreliable without guardrails
Schema enforcement is non-negotiable in production AI
Debugging tools must teach, not just fix
Fewer features + stronger guarantees beats flashy demos
Engineering discipline stands out more than UI polish
Most importantly: Good AI systems are constrained systems.
What's next for Code Reason
Short term:
Local model support via Ollama for unlimited testing
Confidence scoring per analysis
Multi-language expansion (beyond JS / Python / Java)
Mid term:
IDE integrations
Test-case auto-generation
CI/CD failure analysis mode
Long term:
Team-level debugging insights
Failure pattern analytics
On-prem deployments for enterprises
CodeReason is not finished — it’s engineered to grow.
Log in or sign up for Devpost to join the conversation.