Inspiration
TestPilot AI is an intelligent QA failure analysis platform that automatically detects, analyzes, and categorizes software failures using a hybrid AI approach. It helps development teams identify root causes, assess severity, and receive actionable fix suggestions in real-time.
What it does
TestPilot AI transforms manual log analysis into automated intelligence. Engineering teams can upload error logs or paste real-time errors, and within seconds, the platform:
Analyzes the root cause using a hybrid AI engine (Rule-based + Llama3)
Categorizes failures into 6 types: API/Backend, Payment/Financial, Concurrency/Race Conditions, Infrastructure/DevOps, AI/ML Pipeline, and Other Classifies severity as Critical, High, Medium, or Low with color-coded visual indicators Suggests fixes with 85-95% confidence scores Generates bug reports ready for Jira or Linear Displays live metrics including system health score, failure trends, and module frequency
How I built it
Frontend: Next.js 14 with App Router for the dashboard interface TypeScript for type safety Recharts for interactive data visualizations (Area charts, Bar charts, Pie charts) CSS-in-JS with custom gradients for modern UI/UX
Backend: FastAPI for high-performance API endpoints Uvicorn as ASGI server Hybrid AI Engine combining rule-based keyword matching and Ollama Llama3 for deep analysis Multi-tier storage with automatic fallback (MongoDB → JSON file → In-memory)
Deployment: Vercel for frontend hosting (serverless Next.js) Render for backend hosting (Python/FastAPI) GitHub for version control and CI/CD
AI Integration: Ollama Llama3 for complex log analysis Rule-based fallback for common error patterns (timeout, 500, database, auth, payment) Confidence scoring to show AI certainty levels
Challenges I ran into
MongoDB SSL handshake errors blocked database connections on Windows, so I built a multi-tier fallback system (MongoDB → JSON file → memory) that keeps the app running even without a database.
Recharts crashed on Vercel deployment with "width(-1) and height(-1)" errors because charts rendered during Next.js server-side rendering before containers had dimensions fixed by using flex layouts with minHeight: 0.
Railway couldn't handle my monorepo with both Python and Node.js, so I split the deployment frontend on Vercel (Next.js) and backend on Render (FastAPI), saving hours of debugging.
Accomplishments that I am proud of
Fully deployed production app - Live at vercel.app with zero setup required for judges Hybrid AI engine - 85-95% confidence scores using Llama3 with intelligent rule-based fallback 6 failure categories - Automatically detects API, Payment, Concurrency, Infrastructure, AI/ML, and Other failures Real-time dashboard - Auto-refreshes every 10 seconds with live data File upload + real-time paste - Two ways to analyze logs Exportable bug reports - Ready for Jira/Linear integration Professional UI/UX - Dark theme, gradients, animations, responsive design Multi-tier storage - Works with or without MongoDB (automatic fallback) 90% time reduction - From 2 hours of manual debugging to 2 seconds of AI analysis
What I learned
This hackathon taught me that resilience is more important than perfection. When MongoDB failed, I built a fallback system. When Railway couldn't handle my monorepo, I split services across platforms. The most valuable lesson was that a working demo with smart trade-offs beats a perfect architecture that never deploys. I also learned that Next.js SSR requires careful handling of browser APIs, Recharts needs explicit container dimensions, and FastAPI file uploads in production require the python-multipart library, a small detail that cost me an hour to debug. Most importantly, I learned that AI tools are only valuable if they're fast and actionable; engineers won't wait 10 seconds for analysis, but 2-3 seconds feels like magic.
What's next for TestPilot-AI
Next, I plan to add WebSocket support for truly real-time log streaming, so engineers can watch failures appear instantly without refreshing. I also want to integrate with Slack and Teams to send critical failure alerts directly to engineering channels, and build Jira and Linear integrations to auto-create tickets from bug reports. In the medium term, I'll fine-tune Llama3 on specific codebase patterns to improve accuracy beyond 95%, and add anomaly detection to identify unusual failure patterns before they become critical. Finally, I plan to open-source the core engine and release a VS Code extension so developers can analyze logs directly from their IDE without leaving their workflow.
Built With
- fastapi
- github
- mongodb
- next.js-14
- ollama-llama3
- python-3.11
- recharts
- typescript
Log in or sign up for Devpost to join the conversation.