InfraMind

Reasoning-first AI debugger for modern infrastructure

[![Next.js](https://img.shields.io/badge/Next.js-16.1-black.- **Postgr## Roadmap

What is InfraMind?

InfraMind is an intelligent incident analysis platform that uses Google's Gemini 2.0 Flash AI to perform comprehensive root cause analysis on production incidents. By analyzing logs, metrics, traces, and configuration files together, InfraMind acts as a senior SRE, identifying not just what broke, but why it broke and how to fix it.

Key Capabilities

Automated Root Cause Analysis - Identify the true source of failures across distributed systems
Causal Chain Visualization - Understand how issues propagate through your infrastructure
Multi-Source Correlation - Analyze logs, metrics, traces, and configs simultaneously
Actionable Fix Suggestions - Get prioritized remediation steps with validation criteria
AI Reasoning Transparency - See step-by-step how conclusions were reached

The Problem We Solve

Modern infrastructure generates overwhelming amounts of telemetry data. Engineers can see what broke through dashboards and alerts, but determining why requires manually correlating information across multiple systems—a time-consuming and error-prone process during critical incidents.

Our Solution

InfraMind acts as an AI-powered SRE that:

Ingests multi-format incident data (logs, metrics, traces, configs)
Correlates events across time and services
Reasons about causality using advanced AI
Delivers structured RCA reports with actionable fixes

Completed (MVP)

Multi-source data ingestion (logs, metrics, traces, configs)
Gemini 2.0 Flash integration with retry logic
Root cause analysis with causal chains
Next.js dashboard with file upload
Actionable fix suggestions with validation steps
Demo mode fallback for rate limits
JSON repair for malformed AI responses
PostgreSQL config file support

Future Enhancementsabase configuration files

Quick Start

Prerequisites

Python 3.10+ - Download
Node.js 18+ - Download
Gemini API Key - Get one here (Free tier available)
Available Ports - Ensure ports 8000 (API) and 3001 (UI) are not in use

Installation

# 1. Clone the repository
git clone https://github.com/vaish725/InfraMind.git
cd InfraMind

# 2. Set up Python backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

# 3. Configure environment
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY

# 4. Set up Next.js frontend
cd infra-mind-dashboard-ui
npm install
cd ..

Running the Application

Terminal 1 - Start Backend:

source venv/bin/activate
uvicorn backend.api.main:app --host 0.0.0.0 --port 8000 --reload

Terminal 2 - Start Frontend:

cd infra-mind-dashboard-ui
npm run dev

Access the Application

Frontend Dashboard: http://localhost:3001
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs

Try a Demo Analysis

Navigate to http://localhost:3001
Click "New Analysis"
Upload sample files from sample-demo-files-3/:
- INCIDENT_DESCRIPTION.txt
- payment-api.log
- payment-metrics.csv
- payment-traces.json
- application-config.json
Click "Analyze Incident"
View comprehensive RCA in 30-60 seconds!

Architecture

┌─────────────────────────────────────┐
│     Next.js Dashboard (Port 3001)   │
│  - File Upload Interface            │
│  - Real-time Analysis Display       │
│  - Causal Chain Visualization       │
└──────────────┬──────────────────────┘
               │ REST API
               ▼
┌─────────────────────────────────────┐
│    FastAPI Backend (Port 8000)      │
│  ┌─────────────────────────────┐   │
│  │   Ingestion Layer           │   │
│  │  - Log Parser               │   │
│  │  - Metrics Parser           │   │
│  │  - Trace Parser             │   │
│  │  - Config Parser            │   │
│  │  - Data Unifier             │   │
│  └──────────┬──────────────────┘   │
│             │                       │
│  ┌──────────▼──────────────────┐   │
│  │   Reasoning Engine          │   │
│  │  - Gemini 2.0 Flash Client  │   │
│  │  - Prompt Engineering       │   │
│  │  - Response Parsing         │   │
│  │  - JSON Repair Logic        │   │
│  └──────────┬──────────────────┘   │
│             │                       │
│  ┌──────────▼──────────────────┐   │
│  │   Output Formatter          │   │
│  │  - RCA Model                │   │
│  │  - Validation               │   │
│  │  - Demo Mode Fallback       │   │
│  └─────────────────────────────┘   │
└─────────────────────────────────────┘
               │
               ▼
     ┌─────────────────┐
     │  Gemini 2.0 API │
     │  (Google Cloud) │
     └─────────────────┘

Tech Stack

Backend:

FastAPI - Modern Python web framework
Pydantic - Data validation and settings
Google Generative AI SDK - Gemini integration
Tenacity - Retry logic with exponential backoff

Frontend:

Next.js 16.1 - React framework with Turbopack
TypeScript - Type-safe development
Tailwind CSS - Utility-first styling
shadcn/ui - High-quality UI components

AI Model:

Gemini 2.0 Flash - Fast, cost-effective reasoning
Temperature: 0.3 - Focused, deterministic analysis
Max tokens: 4096 - Comprehensive responses

Project Structure

InfraMind/
├── backend/                          # Python Backend
│   ├── api/                         # FastAPI Application
│   │   ├── main.py                  # App entry point & CORS config
│   │   └── routes/
│   │       ├── incident.py          # Analysis endpoints
│   │       └── health.py            # Health check
│   │
│   ├── ingestion/                   # Data Parsers
│   │   ├── log_parser.py           # JSON/text log parsing
│   │   ├── metrics_parser.py       # CSV/JSON metrics
│   │   ├── trace_parser.py         # Distributed traces
│   │   ├── config_parser.py        # Multi-format configs (JSON/YAML/ENV/INI)
│   │   └── data_unifier.py         # Create unified context
│   │
│   ├── reasoning/                   # AI Reasoning
│   │   ├── gemini_client.py        # Gemini API wrapper with retry
│   │   ├── prompts.py              # Prompt templates
│   │   └── reasoning_engine.py     # RCA orchestration & JSON repair
│   │
│   ├── models/                      # Data Models
│   │   ├── incident.py             # Incident data structures
│   │   ├── rca.py                  # RCA output models
│   │   └── schemas.py              # API request/response schemas
│   │
│   └── core/                        # Core Utilities
│       ├── config.py               # Settings management
│       └── exceptions.py           # Custom exceptions
│
├── infra-mind-dashboard-ui/         # Next.js Frontend
│   ├── app/
│   │   ├── page.tsx                # Main dashboard page
│   │   └── layout.tsx              # Root layout
│   │
│   ├── components/inframind/       # Custom Components
│   │   ├── dashboard.tsx           # Main dashboard
│   │   ├── analysis-form.tsx       # File upload form
│   │   ├── executive-summary.tsx   # Results display
│   │   ├── causal-chain.tsx        # Visual causal chain
│   │   ├── recommended-fixes.tsx   # Fix suggestions
│   │   └── reasoning-process.tsx   # AI reasoning steps
│   │
│   ├── lib/
│   │   ├── api-client.ts           # Backend API client
│   │   └── transform.ts            # Response transformation
│   │
│   └── package.json                # Dependencies
│
├── sample-demo-files-3/             # Demo Incident Files
│   ├── INCIDENT_DESCRIPTION.txt    # P0 payment system outage
│   ├── payment-api.log             # Application logs
│   ├── payment-metrics.csv         # System metrics
│   ├── payment-traces.json         # Distributed traces
│   └── application-config.json     # Config with bug details
│
├── tests/                           # Test Suites
│   ├── test_api/                   # API endpoint tests
│   ├── test_ingestion/             # Parser tests
│   └── test_reasoning/             # AI reasoning tests
│
├── .env.example                     # Environment template
├── requirements.txt                 # Python dependencies
└── README.md                        # This file

Core Features

1. Multi-Source Ingestion

Supports diverse data formats and automatically detects file types:

Data Type	Supported Formats	Auto-Detection
Logs	JSON, plain text, structured logs	Yes
Metrics	CSV, JSON time-series	Yes
Traces	JSON distributed traces	Yes
Configs	JSON, YAML, ENV, INI, PostgreSQL conf	Yes

Intelligent Parsing:

Handles malformed files gracefully
Extracts timestamps, severity, service names
Correlates events across data sources

2. AI-Powered Root Cause Analysis

Uses Gemini 2.0 Flash to:

Identify Root Cause - Distinguishes symptoms from actual causes
Build Causal Chains - Shows how failures propagate
Assess Confidence - Provides confidence scores (LOW/MEDIUM/HIGH)
Cite Evidence - References specific log lines and metrics

3. Actionable Fix Suggestions

Each incident analysis includes:

Prioritized Fixes - Ordered by impact and urgency
Time Estimates - Expected effort for each fix
Validation Steps - How to verify the fix worked
Business Impact - Expected outcomes (e.g., "Reduce error rate to <1%")

4. Transparent Reasoning

View AI's step-by-step analysis:

Reasoning Steps - How conclusions were reached
Evidence Links - Which data led to each conclusion
Confidence Scores - How certain the AI is about each finding

5. Demo Mode Fallback

When Gemini API hits rate limits:

Automatically switches to demo mode
Returns realistic hardcoded analysis
Analyzes uploaded files for contextual responses
Clearly labeled as demo mode

Gemini Integration

InfraMind leverages Gemini 2.0 Flash's capabilities:

Why Gemini 2.0 Flash?

Capability	How We Use It
Long Context Window	Analyze entire incident timelines with full logs
Fast Inference	Deliver RCA in 30-60 seconds
Structured Output	Generate consistent JSON responses
Advanced Reasoning	Understand causality across distributed systems
Cost Effective	Free tier suitable for demos and testing

Prompt Engineering

Our prompts are designed to:

Provide Context - Full incident data in structured format
Set Role - "Act as senior SRE performing root cause analysis"
Specify Output - Exact JSON schema with required fields
Guide Reasoning - Focus on causality, not just correlation

Error Handling

Retry Logic - Exponential backoff for transient failures
JSON Repair - Fixes truncated/malformed Gemini responses
Rate Limit - Graceful degradation to demo mode
Validation - Pydantic models ensure output consistency

Demo Scenario

Sample Incident: Payment System Outage

Included in sample-demo-files-3/:

Incident Details:

Severity: P0 (Critical)
Impact: 100% payment failure rate, $2.5M/hour revenue loss
Duration: 8 minutes before rollback
Affected Users: 50,000+

Files Provided:

INCIDENT_DESCRIPTION.txt - Business context and impact
payment-api.log - OOM errors, connection pool exhaustion
payment-metrics.csv - Error rate progression (0.2% → 100%)
payment-traces.json - Distributed traces showing connection leaks
application-config.json - Recent v3.5.0 deployment details

Expected RCA Output:

Root Cause: FraudDetectionService v3.5.0 not closing database connections
Contributing Factors: New driver v2.5.0, missing finally blocks, disabled monitoring
Immediate Fix: Rollback to v3.4.9 (3 minutes)
Long-term Fix: Add proper resource cleanup in finally blocks
Confidence: 95% (HIGH)

Testing

# Activate virtual environment
source venv/bin/activate

# Run all tests
pytest

# Run specific test suite
pytest tests/test_ingestion/
pytest tests/test_reasoning/
pytest tests/test_api/

# Run with coverage report
pytest --cov=backend --cov-report=html
open htmlcov/index.html

# Test specific file
pytest tests/test_ingestion/test_config_parser.py -v

API Documentation

Backend API

Once running, access interactive documentation:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Key Endpoints

`POST /api/v1/incidents/analyze`

Analyze an incident with uploaded files.

Request:

{
  "incident_id": "incident-20260209T143000",
  "log_files": [...],
  "metric_files": [...],
  "trace_files": [...],
  "config_files": [...],
  "time_window_minutes": 30
}

Response:

{
  "incident_id": "incident-20260209T143000",
  "status": "COMPLETED",
  "rca": {
    "root_cause_description": "...",
    "overall_confidence": 0.95,
    "reasoning_steps": [...],
    "causal_chain": [...],
    "fix_suggestions": [...]
  }
}

`GET /api/v1/health`

Health check endpoint.

Configuration

Environment Variables

Create .env file with:

# Gemini API Configuration
GEMINI_API_KEY=your_api_key_here
GEMINI_MODEL=gemini-2.0-flash

# Application Settings
APP_ENV=development
DEBUG=True
LOG_LEVEL=INFO

# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
API_WORKERS=1

# Processing Limits
MAX_FILE_SIZE_MB=10
MAX_CONTEXT_LENGTH=100000
REQUEST_TIMEOUT_SECONDS=30

Supported Config Formats

The config parser automatically detects and handles:

JSON - Standard configuration files
YAML - Kubernetes configs, docker-compose
ENV - Environment variable files
INI/CFG - Legacy application configs
PostgreSQL .conf - Database configuration files

�️ Roadmap

✅ Hackathon Deliverables (Completed)

Multi-source data ingestion (logs, metrics, traces, configs)
Gemini 2.0 Flash integration with retry logic
Root cause analysis with causal chains
Next.js dashboard with file upload
Actionable fix suggestions with validation steps
Demo mode fallback for rate limits
JSON repair for malformed AI responses
PostgreSQL config file support

🚧 In Progress

Comprehensive test coverage
Performance optimization for large files
Enhanced error messages

🔮 Future Enhancements

Live Log Streaming - Real-time incident detection
Historical Analysis - Learn from past incidents
GitHub Integration - Auto-generate fix PRs
Slack/PagerDuty - Incident management integration
Graph Visualization - Interactive causal chain explorer
Custom Models - Fine-tuned industry-specific RCA
Multi-Language - Support for non-English logs
Anomaly Detection - Proactive incident prediction

Contributing

This project was built for the Gemini 3 Global Hackathon, but contributions are welcome!

How to Contribute

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow PEP 8 for Python code
Use TypeScript for frontend code
Add tests for new features
Update documentation as needed

License

This project is licensed under the MIT License - see the LICENSE file for details.

Hackathon Submission

Built for the Gemini 3 Global Hackathon (February 2026)

Category: Infrastructure & DevOps Tools
Submission Date: February 9, 2026
Team: Vaishnavi Kamdi

Why This Matters

Production incidents cost businesses millions in lost revenue and engineering time. InfraMind accelerates incident resolution by automating the most time-consuming part of debugging: root cause analysis. By leveraging Gemini's advanced reasoning, we're making AI-powered SRE capabilities accessible to teams of all sizes.

📞 Contact

Vaishnavi Kamdi

Acknowledgments

Google Gemini Team - For the powerful Gemini 2.0 Flash API
FastAPI - For the excellent Python web framework
Next.js - For the amazing React framework
shadcn/ui - For beautiful, accessible UI components

Built with ❤️ and powered by Gemini 2.0 Flash

⭐ Star this repo | 🐛 Report Bug | 💡 Request Feature

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
backend		backend
data/samples		data/samples
infra-mind-dashboard-ui		infra-mind-dashboard-ui
sample-demo-files-3		sample-demo-files-3
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_real_api.py		test_real_api.py

Folders and files

Latest commit

History

Repository files navigation

InfraMind

What is InfraMind?

Key Capabilities

The Problem We Solve

Our Solution

Completed (MVP)

Future Enhancementsabase configuration files

Quick Start

Prerequisites

Installation

Running the Application

Access the Application

Try a Demo Analysis

Architecture

Tech Stack

Project Structure

Core Features

1. Multi-Source Ingestion

2. AI-Powered Root Cause Analysis

3. Actionable Fix Suggestions

4. Transparent Reasoning

5. Demo Mode Fallback

Gemini Integration

Why Gemini 2.0 Flash?

Prompt Engineering

Error Handling

Demo Scenario

Sample Incident: Payment System Outage

Testing

API Documentation

Backend API

Key Endpoints

POST /api/v1/incidents/analyze

GET /api/v1/health

Configuration

Environment Variables

Supported Config Formats

�️ Roadmap

✅ Hackathon Deliverables (Completed)

🚧 In Progress

🔮 Future Enhancements

Contributing

How to Contribute

Development Guidelines

License

Hackathon Submission

Why This Matters

📞 Contact

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/v1/incidents/analyze`

`GET /api/v1/health`

Packages