Skip to content

DavidAOdim/codeRedAstra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ThermaMind - AI-Powered Data Center Optimization

A real-time GPU cluster monitoring and optimization platform with integrated AI assistant.

🚀 The Problem

Modern AI data centers are the backbone of innovation — but also major energy consumers:

  • GPUs burn massive amounts of power, even when idle
  • Cooling systems often overcompensate, wasting additional electricity and water
  • Inefficient workload scheduling leads to 40%+ energy waste
  • By 2030, data centers could consume as much power as entire countries

💡 The Solution

ThermaMind introduces an AI-driven orchestration layer that continuously monitors and optimizes compute and cooling efficiency in real-time.

Features

🖥️ Real-Time Telemetry Simulation

  • 32 GPU Nodes organized into 4 physical clusters (A, B, C, D)
  • Live WebSocket connection updating every 2 seconds
  • Heatmap visualization showing GPU load, temperature, and cooling efficiency
  • Time-series charts tracking energy usage, cooling efficiency, and workload distribution
  • Generates synthetic data for GPU load, temperature, and cooling metrics across all clusters

🧊 AI-Based Optimization Logic

  • AI-optimized cooling: 70% of nodes use predictive cooling (GPU load + 5%)
  • Legacy systems: 30% of nodes still catching up from over-cooling issues
  • Dynamic status indicators: Active, Idle, or Optimizing based on real metrics
  • Predictive node management: Automatically determines when to power down idle nodes and adjust cooling dynamically

📊 Sustainability Dashboard

  • Real-time metrics visualization: Energy savings, CO₂ offset, power usage
  • PUE (Power Usage Effectiveness) tracking for efficiency monitoring
  • Live performance indicators showing the environmental impact of optimization decisions
  • Actionable insights for reducing data center carbon footprint

📁 Data Logging & Analytics

  • Continuous telemetry streams stored in JSONL format (data_logs/telemetry_log.jsonl)
  • Ready for future analysis or ML model fine-tuning
  • Compact data summaries: Aggregates raw telemetry into hourly insights
  • Context-efficient logs optimized for LLM prompts and downstream processing

🤖 AI Assistant (Gemini + ElevenLabs)

  • Real-time analysis: Ask the AI to analyze current cluster status
  • Question answering: Get insights about power consumption, efficiency issues, and workload recommendations
  • Text-to-Speech: AI responses are spoken using ElevenLabs voice synthesis
  • Context-aware: The AI has access to live telemetry data, not canned responses

Setup

Prerequisites

Installation

  1. Clone the repository:
git clone <repo-url>
cd codeRedAstra
  1. Install backend dependencies:
cd backend
npm install
  1. Install frontend dependencies:
cd ../frontend
npm install
  1. Configure environment variables:
# Copy the example file
cd ../backend
cp .env.example .env

# Edit .env and add your API keys:
# GEMINI_API_KEY=your_key_here
# ELEVENLABS_API_KEY=your_key_here

Running the Application

  1. Start the backend server:
cd backend
node src/server.js

The backend will run on http://localhost:8080

  1. In a new terminal, start the frontend:
cd frontend
npm run dev

The frontend will run on http://localhost:5173

  1. Open your browser to http://localhost:5173

AI Assistant Usage

Analyzing Current Status

  1. Click the 🤖 floating button in the bottom-right corner
  2. Click "📊 Analyze Current Status"
  3. Wait for the AI to analyze real-time telemetry
  4. Read or listen to the AI's insights

Asking Questions

Example questions you can ask:

  • "Which cluster should I run my next job on?"
  • "Why is my power consumption high?"
  • "Are there any efficiency issues?"
  • "What's happening with Cluster C?"
  • "Should I be worried about the cooling on Cluster A?"

API Costs

  • Gemini API: ~$0.001-0.005 per request (flash model)
  • ElevenLabs TTS: $0.01 per 1000 characters ($0.005 per response)
  • Total: ~$0.006-0.015 per AI interaction with voice

Note: You can disable TTS by removing withAudio: true from the WebSocket messages in AIAssistant.tsx to reduce costs.

Architecture

Backend (/backend)

  • Express server: REST API and static file serving
  • WebSocket server: Real-time telemetry broadcast (2-second intervals)
  • Simulator: Generates realistic GPU workload and cooling data
  • AI Assistant: Gemini integration for telemetry analysis

Frontend (/frontend)

  • React + TypeScript: Modern component-based UI
  • Chart.js: Time-series visualization
  • TailwindCSS: Responsive styling
  • WebSocket hook: Real-time data connection

Data Flow

Simulator → WebSocket Server → Frontend Components
              ↓
         AI Assistant (on demand)
              ↓
         Gemini API → ElevenLabs TTS

Project Structure

backend/
  src/
    server.js          # Express + WebSocket server
    simulator.js       # GPU telemetry generator
    webSocket.js       # WebSocket handlers
    aiAssistant.js     # Gemini + ElevenLabs integration
    
frontend/
  src/
    App.tsx            # Main dashboard
    hooks/
      useTelemetry.ts  # WebSocket connection hook
    components/
      AIAssistant.tsx  # AI chat interface

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors