Reasoning-first AI debugger for modern infrastructure
[
- Correlates events across time and services
- Reasons about causality using advanced AI
- Delivers structured RCA reports with actionable fixes
- Multi-source data ingestion (logs, metrics, traces, configs)
- Gemini 2.0 Flash integration with retry logic
- Root cause analysis with causal chains
- Next.js dashboard with file upload
- Actionable fix suggestions with validation steps
- Demo mode fallback for rate limits
- JSON repair for malformed AI responses
- PostgreSQL config file support
- Python 3.10+ - Download
- Node.js 18+ - Download
- Gemini API Key - Get one here (Free tier available)
- Available Ports - Ensure ports 8000 (API) and 3001 (UI) are not in use
# 1. Clone the repository
git clone https://github.com/vaish725/InfraMind.git
cd InfraMind
# 2. Set up Python backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
# 3. Configure environment
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY
# 4. Set up Next.js frontend
cd infra-mind-dashboard-ui
npm install
cd ..Terminal 1 - Start Backend:
source venv/bin/activate
uvicorn backend.api.main:app --host 0.0.0.0 --port 8000 --reloadTerminal 2 - Start Frontend:
cd infra-mind-dashboard-ui
npm run dev- Frontend Dashboard: http://localhost:3001
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Navigate to http://localhost:3001
- Click "New Analysis"
- Upload sample files from
sample-demo-files-3/:INCIDENT_DESCRIPTION.txtpayment-api.logpayment-metrics.csvpayment-traces.jsonapplication-config.json
- Click "Analyze Incident"
- View comprehensive RCA in 30-60 seconds!
┌─────────────────────────────────────┐
│ Next.js Dashboard (Port 3001) │
│ - File Upload Interface │
│ - Real-time Analysis Display │
│ - Causal Chain Visualization │
└──────────────┬──────────────────────┘
│ REST API
▼
┌─────────────────────────────────────┐
│ FastAPI Backend (Port 8000) │
│ ┌─────────────────────────────┐ │
│ │ Ingestion Layer │ │
│ │ - Log Parser │ │
│ │ - Metrics Parser │ │
│ │ - Trace Parser │ │
│ │ - Config Parser │ │
│ │ - Data Unifier │ │
│ └──────────┬──────────────────┘ │
│ │ │
│ ┌──────────▼──────────────────┐ │
│ │ Reasoning Engine │ │
│ │ - Gemini 2.0 Flash Client │ │
│ │ - Prompt Engineering │ │
│ │ - Response Parsing │ │
│ │ - JSON Repair Logic │ │
│ └──────────┬──────────────────┘ │
│ │ │
│ ┌──────────▼──────────────────┐ │
│ │ Output Formatter │ │
│ │ - RCA Model │ │
│ │ - Validation │ │
│ │ - Demo Mode Fallback │ │
│ └─────────────────────────────┘ │
└─────────────────────────────────────┘
│
▼
┌─────────────────┐
│ Gemini 2.0 API │
│ (Google Cloud) │
└─────────────────┘
Backend:
- FastAPI - Modern Python web framework
- Pydantic - Data validation and settings
- Google Generative AI SDK - Gemini integration
- Tenacity - Retry logic with exponential backoff
Frontend:
- Next.js 16.1 - React framework with Turbopack
- TypeScript - Type-safe development
- Tailwind CSS - Utility-first styling
- shadcn/ui - High-quality UI components
AI Model:
- Gemini 2.0 Flash - Fast, cost-effective reasoning
- Temperature: 0.3 - Focused, deterministic analysis
- Max tokens: 4096 - Comprehensive responses
InfraMind/
├── backend/ # Python Backend
│ ├── api/ # FastAPI Application
│ │ ├── main.py # App entry point & CORS config
│ │ └── routes/
│ │ ├── incident.py # Analysis endpoints
│ │ └── health.py # Health check
│ │
│ ├── ingestion/ # Data Parsers
│ │ ├── log_parser.py # JSON/text log parsing
│ │ ├── metrics_parser.py # CSV/JSON metrics
│ │ ├── trace_parser.py # Distributed traces
│ │ ├── config_parser.py # Multi-format configs (JSON/YAML/ENV/INI)
│ │ └── data_unifier.py # Create unified context
│ │
│ ├── reasoning/ # AI Reasoning
│ │ ├── gemini_client.py # Gemini API wrapper with retry
│ │ ├── prompts.py # Prompt templates
│ │ └── reasoning_engine.py # RCA orchestration & JSON repair
│ │
│ ├── models/ # Data Models
│ │ ├── incident.py # Incident data structures
│ │ ├── rca.py # RCA output models
│ │ └── schemas.py # API request/response schemas
│ │
│ └── core/ # Core Utilities
│ ├── config.py # Settings management
│ └── exceptions.py # Custom exceptions
│
├── infra-mind-dashboard-ui/ # Next.js Frontend
│ ├── app/
│ │ ├── page.tsx # Main dashboard page
│ │ └── layout.tsx # Root layout
│ │
│ ├── components/inframind/ # Custom Components
│ │ ├── dashboard.tsx # Main dashboard
│ │ ├── analysis-form.tsx # File upload form
│ │ ├── executive-summary.tsx # Results display
│ │ ├── causal-chain.tsx # Visual causal chain
│ │ ├── recommended-fixes.tsx # Fix suggestions
│ │ └── reasoning-process.tsx # AI reasoning steps
│ │
│ ├── lib/
│ │ ├── api-client.ts # Backend API client
│ │ └── transform.ts # Response transformation
│ │
│ └── package.json # Dependencies
│
├── sample-demo-files-3/ # Demo Incident Files
│ ├── INCIDENT_DESCRIPTION.txt # P0 payment system outage
│ ├── payment-api.log # Application logs
│ ├── payment-metrics.csv # System metrics
│ ├── payment-traces.json # Distributed traces
│ └── application-config.json # Config with bug details
│
├── tests/ # Test Suites
│ ├── test_api/ # API endpoint tests
│ ├── test_ingestion/ # Parser tests
│ └── test_reasoning/ # AI reasoning tests
│
├── .env.example # Environment template
├── requirements.txt # Python dependencies
└── README.md # This file
Supports diverse data formats and automatically detects file types:
| Data Type | Supported Formats | Auto-Detection |
|---|---|---|
| Logs | JSON, plain text, structured logs | Yes |
| Metrics | CSV, JSON time-series | Yes |
| Traces | JSON distributed traces | Yes |
| Configs | JSON, YAML, ENV, INI, PostgreSQL conf | Yes |
Intelligent Parsing:
- Handles malformed files gracefully
- Extracts timestamps, severity, service names
- Correlates events across data sources
Uses Gemini 2.0 Flash to:
- Identify Root Cause - Distinguishes symptoms from actual causes
- Build Causal Chains - Shows how failures propagate
- Assess Confidence - Provides confidence scores (LOW/MEDIUM/HIGH)
- Cite Evidence - References specific log lines and metrics
Each incident analysis includes:
- Prioritized Fixes - Ordered by impact and urgency
- Time Estimates - Expected effort for each fix
- Validation Steps - How to verify the fix worked
- Business Impact - Expected outcomes (e.g., "Reduce error rate to <1%")
View AI's step-by-step analysis:
- Reasoning Steps - How conclusions were reached
- Evidence Links - Which data led to each conclusion
- Confidence Scores - How certain the AI is about each finding
When Gemini API hits rate limits:
- Automatically switches to demo mode
- Returns realistic hardcoded analysis
- Analyzes uploaded files for contextual responses
- Clearly labeled as demo mode
InfraMind leverages Gemini 2.0 Flash's capabilities:
| Capability | How We Use It |
|---|---|
| Long Context Window | Analyze entire incident timelines with full logs |
| Fast Inference | Deliver RCA in 30-60 seconds |
| Structured Output | Generate consistent JSON responses |
| Advanced Reasoning | Understand causality across distributed systems |
| Cost Effective | Free tier suitable for demos and testing |
Our prompts are designed to:
- Provide Context - Full incident data in structured format
- Set Role - "Act as senior SRE performing root cause analysis"
- Specify Output - Exact JSON schema with required fields
- Guide Reasoning - Focus on causality, not just correlation
- Retry Logic - Exponential backoff for transient failures
- JSON Repair - Fixes truncated/malformed Gemini responses
- Rate Limit - Graceful degradation to demo mode
- Validation - Pydantic models ensure output consistency
Included in sample-demo-files-3/:
Incident Details:
- Severity: P0 (Critical)
- Impact: 100% payment failure rate, $2.5M/hour revenue loss
- Duration: 8 minutes before rollback
- Affected Users: 50,000+
Files Provided:
- INCIDENT_DESCRIPTION.txt - Business context and impact
- payment-api.log - OOM errors, connection pool exhaustion
- payment-metrics.csv - Error rate progression (0.2% → 100%)
- payment-traces.json - Distributed traces showing connection leaks
- application-config.json - Recent v3.5.0 deployment details
Expected RCA Output:
- Root Cause: FraudDetectionService v3.5.0 not closing database connections
- Contributing Factors: New driver v2.5.0, missing finally blocks, disabled monitoring
- Immediate Fix: Rollback to v3.4.9 (3 minutes)
- Long-term Fix: Add proper resource cleanup in finally blocks
- Confidence: 95% (HIGH)
# Activate virtual environment
source venv/bin/activate
# Run all tests
pytest
# Run specific test suite
pytest tests/test_ingestion/
pytest tests/test_reasoning/
pytest tests/test_api/
# Run with coverage report
pytest --cov=backend --cov-report=html
open htmlcov/index.html
# Test specific file
pytest tests/test_ingestion/test_config_parser.py -vOnce running, access interactive documentation:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Analyze an incident with uploaded files.
Request:
{
"incident_id": "incident-20260209T143000",
"log_files": [...],
"metric_files": [...],
"trace_files": [...],
"config_files": [...],
"time_window_minutes": 30
}Response:
{
"incident_id": "incident-20260209T143000",
"status": "COMPLETED",
"rca": {
"root_cause_description": "...",
"overall_confidence": 0.95,
"reasoning_steps": [...],
"causal_chain": [...],
"fix_suggestions": [...]
}
}Health check endpoint.
Create .env file with:
# Gemini API Configuration
GEMINI_API_KEY=your_api_key_here
GEMINI_MODEL=gemini-2.0-flash
# Application Settings
APP_ENV=development
DEBUG=True
LOG_LEVEL=INFO
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
API_WORKERS=1
# Processing Limits
MAX_FILE_SIZE_MB=10
MAX_CONTEXT_LENGTH=100000
REQUEST_TIMEOUT_SECONDS=30The config parser automatically detects and handles:
- JSON - Standard configuration files
- YAML - Kubernetes configs, docker-compose
- ENV - Environment variable files
- INI/CFG - Legacy application configs
- PostgreSQL .conf - Database configuration files
- Multi-source data ingestion (logs, metrics, traces, configs)
- Gemini 2.0 Flash integration with retry logic
- Root cause analysis with causal chains
- Next.js dashboard with file upload
- Actionable fix suggestions with validation steps
- Demo mode fallback for rate limits
- JSON repair for malformed AI responses
- PostgreSQL config file support
- Comprehensive test coverage
- Performance optimization for large files
- Enhanced error messages
- Live Log Streaming - Real-time incident detection
- Historical Analysis - Learn from past incidents
- GitHub Integration - Auto-generate fix PRs
- Slack/PagerDuty - Incident management integration
- Graph Visualization - Interactive causal chain explorer
- Custom Models - Fine-tuned industry-specific RCA
- Multi-Language - Support for non-English logs
- Anomaly Detection - Proactive incident prediction
This project was built for the Gemini 3 Global Hackathon, but contributions are welcome!
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 for Python code
- Use TypeScript for frontend code
- Add tests for new features
- Update documentation as needed
This project is licensed under the MIT License - see the LICENSE file for details.
Built for the Gemini 3 Global Hackathon (February 2026)
Category: Infrastructure & DevOps Tools
Submission Date: February 9, 2026
Team: Vaishnavi Kamdi
Production incidents cost businesses millions in lost revenue and engineering time. InfraMind accelerates incident resolution by automating the most time-consuming part of debugging: root cause analysis. By leveraging Gemini's advanced reasoning, we're making AI-powered SRE capabilities accessible to teams of all sizes.
Vaishnavi Kamdi
- Email: vaishnaviskamdi@gmail.com
- GitHub: @vaish725
- LinkedIn: Connect with me
- Google Gemini Team - For the powerful Gemini 2.0 Flash API
- FastAPI - For the excellent Python web framework
- Next.js - For the amazing React framework
- shadcn/ui - For beautiful, accessible UI components
Built with ❤️ and powered by Gemini 2.0 Flash