GridVeda

Inspiration

Grid operators work in high-stakes environments where failures cascade fast. Today, 46% of U.S. distribution infrastructure is at or beyond its useful life, contributing to an annual economic loss of $150 billion. The DOE warns that without intervention, the risk of major outages could increase 30-fold by 2030. When a single transformer fails under these conditions, often due to the overloading seen in 34% of recent asset failures, it can knock out substations and leave communities dark for days.

Power interruptions are becoming more frequent and more severe. Since 2000, the number of major weather-related outages has increased dramatically, with extreme weather now responsible for over 80% of large-scale blackouts in the U.S. A single transformer failure can overload neighboring assets, knock out substations, and leave entire communities without power for days. Recent events, from the 2021 Texas grid crisis to the 2023 North Carolina substation attacks and extreme weather-driven outages, reveal a common reality: we are still reacting to failures instead of predicting them.

Power outages trend

Growing up across California, Oregon, and Maryland, our team has witnessed firsthand how fragile infrastructure can amplify disaster impacts, from wildfire-driven outages in the West to storm-related grid disruptions on the East Coast. These experiences reinforced the need for on-device intelligence that continues operating even when connectivity is unreliable: especially during storms, heat waves, or grid stress events.

Our goal with GridVeda: AI-Powered Grid Intelligence is to empower operators with real-time, AI-driven decision support at the edge - detecting transformer degradation early, classifying fault types, estimating time-to-failure, and guiding mitigation, all without requiring cloud dependency.

What it does

GridVeda is an AI-powered early warning system for electrical transformers that runs on-site at substations, predicting failures before blackouts occur.

Real-Time Transformer Monitoring: Monitors 20 transformers simultaneously via two AI pipelines

Physics Informed ETT Anomaly Analysis

ETT detector processes sensor readings every 15 minutes (oil temp + 6 load channels)
Uses feature engineering to compute 36 physics features: thermal stress, Joule heating, insulation aging
Four neural network based Gradient Boosting Ensemble (LightGBM, CatBoost, Random Forest, XGBoost) architecture
Alerts operators at >50% risk to schedule gas testing based on Mild and High stakes

Quantum-Classical Fault Diagnosis

Implemented 6-qubit variational quantum circuit with 72 trainable parameters across 4 entangled layers
Trained with gradient-free Nelder-Mead optimization to avoid barren plateaus in 64-dimensional Hilbert space
Architected quantum-classical hybrid meta-ensemble merging Born rule measurement probabilities with IEEE C57.104 standards
Integrated tri-method plurality voting: quantum predictions + Rogers Ratios + Duval Triangle classification
Built 2nd weighted ensemble across XGBoost, LightGBM, CatBoost, and RandomForest with confidence-aware routing
Developed 8→4 class probability mapping using modular arithmetic aggregation of quantum measurement outcomes
Implemented strategic label transformation and prediction consensus mechanisms for robust classification
DGA Summary: 98.09% ± 0.80% accuracy, 96.99% ± 2.07% F1-macro, 98.08% ± 0.75% F1-weighted

Conversational Grid Interface

Nemotron Nano 4B provides plain-English explanations of fault diagnostics and risk scores through live visual feedback loop of the active dashboard interface
Answers context-aware queries like "Why is T047 flagged as high risk?" by analyzing current screen state and transformer-specific DGA patterns in real-time
Offers interactive guidance that responds to displayed data, helping operators understand which gas concentrations and ratios drove specific fault predictions
Supports voice-enabled hands-free operation for field technicians to query diagnostics, request explanations, and navigate the system without manual input

Web-Grounded Spatial Intelligence

Perplexity auto-searches "transformer discharge failures [C2H2 elevated] Texas 2024" when faults are detected. Retrieves NERC reports and regional failure data at ~1,200 tok/s.
Cross-references DGA signatures against historical recalls and weather-correlated failures. Identifies similar fault progressions from past incidents.
Renders interactive 3D transformer models with real-time fault probability heat maps. Overlays risk zones on bushings, windings, tap changers and other various components.
Maps gas diffusion physics to spatial failure zones using thermal signatures. Localizes acetylene (>700°C arcing) to probable discharge points.
Spins up isolated virtual environments within the web app using three.js for fault simulation. Operators test "what-if" scenarios, worldwide past occurrences and model fault progression.
Executes sandboxed Python/NumPy environments for custom DGA scripts. Engineers run proprietary algorithms without leaving the browser.

Responsible AI

Built on fine-tuned open-source GPT-oss models to explain neural network decisions and fault predictions in plain language.
Breaks down how quantum ensemble, XGBoost, DGA methods, etc. reached specific diagnoses.
Provides interactive onboarding for new operators through adaptive tutorials on transformer diagnostics. Explains DGA interpretation, Rogers Ratios, Duval Triangle classification, etc. based on current workflow context.
Maintains audit trails of predictions, model weights and decision factors for regulatory compliance. -
Full traceability from input features through ensemble voting to final risk scores.

Edge AI Without Cloud

Runs entirely on RTX 5090 (dev) or Jetson Orin Nano Super (25W field deployment)
Works during storms/outages when connectivity fails
Direct connection to raw sensory transformer units
GPT-4 orchestrates training, bias monitoring, human-in-the-loop safeguards

How we built it

1. NVIDIA Edge AI Stack

Deployed Nemotron Nano 4B, Perplexity Sonar, GPT-oss on RTX 5090 alongside two gradient boosting ensembles
INT8 quantization + TensorRT optimization for 25W Jetson Orin field deployment
Ollama continuous batching hits 200-400ms latency, cuQuantum provides 5-10× quantum speedup
Zero cloud dependency for core detection

2. Real-Time Telemetry

FastAPI + WebSocket streams 180 data points (20 transformers × 9 channels) every 2 seconds
Next.js dashboard displays live health scores, risk gauges, AI predictions
Fault injection simulates thermal runaway, acetylene spikes, cascades for training
Push-based architecture for sub-second detection

3. Screen-Aware Conversational AI

HTML5 Canvas snapshots dashboard every 5s; Tesseract OCR extracts IDs and alerts
Nemotron processes visual + parsed JSON, system-prompted with IEEE C57.104 standards
Translates SHAP values to plain English, responds to queries like "Why is T047 high risk?"
Web Speech API for hands-free voice control

4. Perplexity Sonar: Spatial Fault Visualization

Auto-triggers on faults, queries "transformer discharge failures [C2H2 elevated] Texas 2024" at ~1,200 tok/s
Retrieves NERC reports, recalls, weather events with citation tracking
Python parses CAD files (STEP/IGES) via Open CASCADE → OBJ → Three.js 3D rendering
Gas diffusion physics maps acetylene (>700°C arcing) to bushings/tap changers
WebGL volumetric heat maps fuse chemistry + Perplexity failure frequencies

5. Dual Gradient Boosting Ensembles

ETT-NN: XGBoost/LightGBM/CatBoost/RF (150 each) on 36 physics features—thermal stress, Joule heating, Arrhenius aging. RobustScaler preprocessing, 3-fold CV weighting, outputs 0-100% risk scores.

DGA-NN: XGBoost/LightGBM/CatBoost/RF (200 each) on gas concentrations + Rogers ratios + Duval percentages. StandardScaler normalization, soft voting for fault prediction, 2:1 meta-voting with quantum ensemble.

6. Quantum Fault Classifier

6-qubit VQC: Hadamard → 9-feature encoding → 4 variational layers (72 params) → CNOT ring
Tri-method voting: Quantum + Rogers Ratio + Duval Triangle → plurality across 8 fault classes
cuQuantum parallelizes 64 state amplitudes on CUDA, 50-100ms inference
Consensus scoring: unanimous fault=60-90%, split=30-50%, normal=5-15%

7. GPT-4 Responsible AI

Adaptive tutorials (physics for techs, architecture for engineers)
Layered explanations: voting analogies → circuit details → LaTeX derivations
Bias monitoring, A/B testing, human-in-the-loop enforcement for critical actions

8. Multi-Model Fusion

Parallel XGBoost/LightGBM/CatBoost/RandomForest ensembles for ETT anomaly detection
Quantum VQC (72 parameters, 6 qubits) combines with classical gradients via tri-method plurality voting including 2nd Gradient Boosting Ensemble, Rogers Ratios and Duval TriangleDGA fault classification with weighted soft voting
Async parallel execution of quantum/boosting/LLM without blocking
TensorRT quantization for production deployment

Challenges we ran into

Running 5+ AI models simultaneously on 25W Jetson required aggressive memory management. Conflicts between cuQuantum state vectors, gradient boosting trees, and LLM layers forced us to build careful GPU allocation with TensorRT quantization to maintain edge operation.

Coordinating dual data streams—continuous ETT monitoring and on-demand DGA testing (H2, CH4, C2H2, C2H4, C2H6, CO, CO2 concentrations)—into unified risk scores was challenging. Balancing quantum-classical ensemble weights through cross-validated F1 scores while maintaining high precision required iterative tuning. First time integrating two completely different pipelines (time-series anomaly detection + gas chemistry classification) into one diagnostic system.

Physics-informed feature engineering meant translating Arrhenius aging, Joule heating, and gas diffusion into robust numerical features. Edge cases like division-by-zero in Rogers ratios needed epsilon regularization. We had to validate that our 36 ETT features actually captured fault mechanisms better than raw sensors.

Getting quantum inference under 100ms on edge hardware pushed cuQuantum hard. Hand-tuning 72 variational parameters across 4 layers and minimizing CNOT depth took extensive experimentation. Making the tri-method ensemble (quantum + Rogers + Duval) produce coherent predictions required principled tiebreaking rules.

Real-time 3D fault visualization was complex—parsing CAD files, computing gas diffusion PDEs over voxel grids, fusing chemistry with Perplexity failure data via Bayesian inference, and rendering volumetric heat maps in Three.js with WebGL shaders while maintaining smooth framerates.

Ensuring zero-cloud resilience meant building graceful degradation when Perplexity/GPT-4 are unreachable. Nemotron handles core diagnostics offline while we cache Perplexity results in Redis with priority queuing for fault-triggered research.

Accomplishments that we're proud of

5 AI models + Neural Networks running simultaneously on a single RTX 5090 -- quantum VQC, dual gradient boosting ensembles (ETT + DGA), and LLM architectures cooperating in real-time
100% NVIDIA-native edge story - Nemotron + cuQuantum VQC + gradient boosting ensembles all run without any cloud dependency, deployable on a $249 Jetson
Sub-second anomaly detection - 0.21ms quantum inference per sample, 50-200ms ETT ensemble across 20 transformers with ensemble confidence scoring
Perplexity Sonar at ~1,200 tok/s - enabling real-time incident research and 3D spatial fault visualization that's fast enough for grid decisions
Web-grounded spatial intelligence - Perplexity Sonar retrieves NERC reports and failure case studies, rendering 3D transformer models with physics-based fault probability heat maps
Voice-controlled grid monitoring - fully hands-free operation for field technicians
98% DGA fault classification accuracy with 97% F1-score - Quantum-classical hybrid ensemble achieves 98.09% ± 0.80% accuracy across 5-fold cross-validation, with 96.99% macro F1 and 98.08% weighted F1 on multi-class transformer diagnostics
Interactive Real-Time Dashboard: The web-based monitoring interface provides live transformer health visualization with color-coded risk indicators, gas concentration trends, and fault probability heat maps updated in real-time as new sensor data arrives. Built with React and D3.js, the dashboard displays ETT anomaly scores, DGA fault classifications, and ensemble confidence metrics across all monitored transformers simultaneously. Operators can drill down into individual units to view historical gas chemistry plots, Rogers Ratio trends, and Duval Triangle trajectories over time. The interface integrates the conversational AI overlay where Nemotron Nano 4B answers queries like "Why is T047 high risk?" by analyzing the current screen state, making complex diagnostics accessible to field technicians without deep expertise in quantum machine learning or IEEE standards.

What we learned

Ensemble AI Improves Reliability

No single model was consistently correct. Our quantum VQC occasionally misclassified edge cases, but the Nemotron predictors compensated. A weighted ensemble produced more stable, higher-confidence results than any standalone model.

Edge AI Is Essential for Infrastructure

In real grid failures, internet connectivity cannot be assumed. Deploying Nemotron and anomaly detection locally on NVIDIA Jetson eliminates cloud dependency and ensures continuous operation during storms or cascading outages.

Inference Speed Directly Impacts Safety

Reducing trend prediction latency from 30 seconds to under 1 second can materially change outcomes in a cascading grid event. Faster inference enables earlier load shedding and preventive intervention.

Web-Grounded Context Enhances Decision-Making

Local models understand transformer chemistry and fault theory, but integrating Perplexity Sonar adds real-world awareness—recent incidents, recalls, weather threats, and regulatory updates—improving operator situational awareness.

Modular Architecture Enables Rapid Iteration

Separating subsystems (telemetry ingestion, ensemble inference, agent interface, web intelligence layer) allowed us to experiment with model weighting, async fusion, and GPU optimization without destabilizing the full system.

Infrastructure AI Requires Security by Design

Because grid systems are critical infrastructure, we implemented strict input validation, authentication controls, and controlled model invocation to prevent misuse or unsafe command generation.

What's next for GridVeda

Live Edge Demonstrations with NVIDIA Hardware: We plan to optimize and deploy GridVeda on NVIDIA Jetson Orin Nano Super for fully autonomous substation deployment. This includes TensorRT quantization of Nemotron to achieve 2–3x faster inference and production-grade reliability, proving that edge AI can deliver enterprise performance in the field.

Expanded Web-Grounded Intelligence with Perplexity Sonar: We will deepen our integration with Perplexity's Sonar API to provide real-time incident correlation—automatically linking transformer anomalies to NERC reports, weather events, and regional outage data during live operations. Our goal is to showcase GridVeda as the first grid intelligence system that combines edge AI with continuously updated global infrastructure knowledge.

Utility Pilot Programs & Field Validation: We intend to partner with regional utility providers to deploy GridVeda alongside real SCADA feeds. Initial pilots will focus on early transformer degradation detection and substation-level anomaly triage, collecting operational data to validate failure prediction accuracy and refine our models against real-world grid conditions.

Conference & Research Publication: We are preparing a technical paper detailing our hybrid ensemble architecture (Quantum VQC + Ensemble-based NN + LLM) for submission to infrastructure resilience and applied AI conferences. Showcasing GridVeda at technical venues will help bridge academia, utilities, and industry while contributing to the broader research community.

Federated Learning Across Utilities: Next iterations will introduce federated training across multiple substations—allowing utilities to improve anomaly detection collectively without sharing sensitive operational data. This privacy-preserving approach enables grid-wide learning while respecting the security requirements of critical infrastructure. Integration with SCADA systems and IEC 61850 protocols will enable real-time data ingestion from substation sensors, replacing simulated datasets with live transformer telemetry for continuous model updates and immediate fault detection at scale.

From Hackathon to Company: Beyond TreeHacks, we are actively exploring pathways to turn GridVeda into a venture-backed startup. We plan to pursue the Human Capital Fellowship for long-term company building, Neo Accelerator for early-stage product-market validation and strategic partnerships with NVIDIA and infrastructure-focused investors. Our vision is to evolve GridVeda from a 36-hour prototype into a deployable AI infrastructure platform protecting national energy systems.

Built With

gpt-oss
html
jetson-nano
nemotron
node.js
perplexity
python
react
sonar
three.js
vercel

Submitted to

TreeHacks 2026
- Winner [Stanford Ecopreneurship] Sustainability: Best prototyping process

Created by

Independently formulated idea and context. Base system infrastructure and UI. Pitching and marketing strategy.

Ishaan Busireddy
Rehaan Kadhar
Neil Chandran
Shreyan Paliwal

Updates

Neil Chandran started this project — Feb 15, 2026 12:28 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.