Inspiration
Grid operators work in high-stakes environments where failures cascade fast. Today, 46% of U.S. distribution infrastructure is at or beyond its useful life, contributing to an annual economic loss of $150 billion. The DOE warns that without intervention, the risk of major outages could increase 30-fold by 2030. When a single transformer fails under these conditions, often due to the overloading seen in 34% of recent asset failures, it can knock out substations and leave communities dark for days.
Power interruptions are becoming more frequent and more severe. Since 2000, the number of major weather-related outages has increased dramatically, with extreme weather now responsible for over 80% of large-scale blackouts in the U.S. A single transformer failure can overload neighboring assets, knock out substations, and leave entire communities without power for days. Recent events, from the 2021 Texas grid crisis to the 2023 North Carolina substation attacks and extreme weather-driven outages, reveal a common reality: we are still reacting to failures instead of predicting them.

Growing up across California, Oregon, and Maryland, our team has witnessed firsthand how fragile infrastructure can amplify disaster impacts, from wildfire-driven outages in the West to storm-related grid disruptions on the East Coast. These experiences reinforced the need for on-device intelligence that continues operating even when connectivity is unreliable: especially during storms, heat waves, or grid stress events.
Our goal with GridVeda: AI-Powered Grid Intelligence is to empower operators with real-time, AI-driven decision support at the edge - detecting transformer degradation early, classifying fault types, estimating time-to-failure, and guiding mitigation, all without requiring cloud dependency.
What it does
GridVeda is an AI-powered early warning system for electrical transformers that runs on-site at substations, predicting failures before blackouts occur.
Real-Time Transformer Monitoring: Monitors 20 transformers simultaneously via two AI pipelines
Physics Informed ETT Anomaly Analysis
- ETT detector processes sensor readings every 15 minutes (oil temp + 6 load channels)
- Uses feature engineering to compute 36 physics features: thermal stress, Joule heating, insulation aging
- Four neural network based Gradient Boosting Ensemble (LightGBM, CatBoost, Random Forest, XGBoost) architecture
- Alerts operators at >50% risk to schedule gas testing based on Mild and High stakes
Quantum-Classical Fault Diagnosis
- Implemented 6-qubit variational quantum circuit with 72 trainable parameters across 4 entangled layers
- Trained with gradient-free Nelder-Mead optimization to avoid barren plateaus in 64-dimensional Hilbert space
- Architected quantum-classical hybrid meta-ensemble merging Born rule measurement probabilities with IEEE C57.104 standards
- Integrated tri-method plurality voting: quantum predictions + Rogers Ratios + Duval Triangle classification
- Built 2nd weighted ensemble across XGBoost, LightGBM, CatBoost, and RandomForest with confidence-aware routing
- Developed 8→4 class probability mapping using modular arithmetic aggregation of quantum measurement outcomes
- Implemented strategic label transformation and prediction consensus mechanisms for robust classification
- DGA Summary: 98.09% ± 0.80% accuracy, 96.99% ± 2.07% F1-macro, 98.08% ± 0.75% F1-weighted
Conversational Grid Interface
- Nemotron Nano 4B provides plain-English explanations of fault diagnostics and risk scores through live visual feedback loop of the active dashboard interface
- Answers context-aware queries like "Why is T047 flagged as high risk?" by analyzing current screen state and transformer-specific DGA patterns in real-time
- Offers interactive guidance that responds to displayed data, helping operators understand which gas concentrations and ratios drove specific fault predictions
- Supports voice-enabled hands-free operation for field technicians to query diagnostics, request explanations, and navigate the system without manual input
Web-Grounded Spatial Intelligence
- Perplexity auto-searches "transformer discharge failures [C2H2 elevated] Texas 2024" when faults are detected. Retrieves NERC reports and regional failure data at ~1,200 tok/s.
- Cross-references DGA signatures against historical recalls and weather-correlated failures. Identifies similar fault progressions from past incidents.
- Renders interactive 3D transformer models with real-time fault probability heat maps. Overlays risk zones on bushings, windings, tap changers and other various components.
- Maps gas diffusion physics to spatial failure zones using thermal signatures. Localizes acetylene (>700°C arcing) to probable discharge points.
- Spins up isolated virtual environments within the web app using three.js for fault simulation. Operators test "what-if" scenarios, worldwide past occurrences and model fault progression.
- Executes sandboxed Python/NumPy environments for custom DGA scripts. Engineers run proprietary algorithms without leaving the browser.
Responsible AI
- Built on fine-tuned open-source GPT-oss models to explain neural network decisions and fault predictions in plain language.
- Breaks down how quantum ensemble, XGBoost, DGA methods, etc. reached specific diagnoses.
- Provides interactive onboarding for new operators through adaptive tutorials on transformer diagnostics. Explains DGA interpretation, Rogers Ratios, Duval Triangle classification, etc. based on current workflow context.
- Maintains audit trails of predictions, model weights and decision factors for regulatory compliance. -
- Full traceability from input features through ensemble voting to final risk scores.
Edge AI Without Cloud
- Runs entirely on RTX 5090 (dev) or Jetson Orin Nano Super (25W field deployment)
- Works during storms/outages when connectivity fails
- Direct connection to raw sensory transformer units
- GPT-4 orchestrates training, bias monitoring, human-in-the-loop safeguards
How we built it
1. NVIDIA Edge AI Stack
- Deployed Nemotron Nano 4B, Perplexity Sonar, GPT-oss on RTX 5090 alongside two gradient boosting ensembles
- INT8 quantization + TensorRT optimization for 25W Jetson Orin field deployment
- Ollama continuous batching hits 200-400ms latency, cuQuantum provides 5-10× quantum speedup
- Zero cloud dependency for core detection
2. Real-Time Telemetry
- FastAPI + WebSocket streams 180 data points (20 transformers × 9 channels) every 2 seconds
- Next.js dashboard displays live health scores, risk gauges, AI predictions
- Fault injection simulates thermal runaway, acetylene spikes, cascades for training
- Push-based architecture for sub-second detection
3. Screen-Aware Conversational AI
- HTML5 Canvas snapshots dashboard every 5s; Tesseract OCR extracts IDs and alerts
- Nemotron processes visual + parsed JSON, system-prompted with IEEE C57.104 standards
- Translates SHAP values to plain English, responds to queries like "Why is T047 high risk?"
- Web Speech API for hands-free voice control
4. Perplexity Sonar: Spatial Fault Visualization
- Auto-triggers on faults, queries "transformer discharge failures [C2H2 elevated] Texas 2024" at ~1,200 tok/s
- Retrieves NERC reports, recalls, weather events with citation tracking
- Python parses CAD files (STEP/IGES) via Open CASCADE → OBJ → Three.js 3D rendering
- Gas diffusion physics maps acetylene (>700°C arcing) to bushings/tap changers
- WebGL volumetric heat maps fuse chemistry + Perplexity failure frequencies
5. Dual Gradient Boosting Ensembles
ETT-NN: XGBoost/LightGBM/CatBoost/RF (150 each) on 36 physics features—thermal stress, Joule heating, Arrhenius aging. RobustScaler preprocessing, 3-fold CV weighting, outputs 0-100% risk scores.
DGA-NN: XGBoost/LightGBM/CatBoost/RF (200 each) on gas concentrations + Rogers ratios + Duval percentages. StandardScaler normalization, soft voting for fault prediction, 2:1 meta-voting with quantum ensemble.
6. Quantum Fault Classifier
- 6-qubit VQC: Hadamard → 9-feature encoding → 4 variational layers (72 params) → CNOT ring
- Tri-method voting: Quantum + Rogers Ratio + Duval Triangle → plurality across 8 fault classes
- cuQuantum parallelizes 64 state amplitudes on CUDA, 50-100ms inference
- Consensus scoring: unanimous fault=60-90%, split=30-50%, normal=5-15%
7. GPT-4 Responsible AI
- Adaptive tutorials (physics for techs, architecture for engineers)
- Layered explanations: voting analogies → circuit details → LaTeX derivations
- Bias monitoring, A/B testing, human-in-the-loop enforcement for critical actions
8. Multi-Model Fusion
- Parallel XGBoost/LightGBM/CatBoost/RandomForest ensembles for ETT anomaly detection
- Quantum VQC (72 parameters, 6 qubits) combines with classical gradients via tri-method plurality voting including 2nd Gradient Boosting Ensemble, Rogers Ratios and Duval TriangleDGA fault classification with weighted soft voting
- Async parallel execution of quantum/boosting/LLM without blocking
- TensorRT quantization for production deployment
Challenges we ran into
Running 5+ AI models simultaneously on 25W Jetson required aggressive memory management. Conflicts between cuQuantum state vectors, gradient boosting trees, and LLM layers forced us to build careful GPU allocation with TensorRT quantization to maintain edge operation.
Coordinating dual data streams—continuous ETT monitoring and on-demand DGA testing (H2, CH4, C2H2, C2H4, C2H6, CO, CO2 concentrations)—into unified risk scores was challenging. Balancing quantum-classical ensemble weights through cross-validated F1 scores while maintaining high precision required iterative tuning. First time integrating two completely different pipelines (time-series anomaly detection + gas chemistry classification) into one diagnostic system.
Physics-informed feature engineering meant translating Arrhenius aging, Joule heating, and gas diffusion into robust numerical features. Edge cases like division-by-zero in Rogers ratios needed epsilon regularization. We had to validate that our 36 ETT features actually captured fault mechanisms better than raw sensors.
Getting quantum inference under 100ms on edge hardware pushed cuQuantum hard. Hand-tuning 72 variational parameters across 4 layers and minimizing CNOT depth took extensive experimentation. Making the tri-method ensemble (quantum + Rogers + Duval) produce coherent predictions required principled tiebreaking rules.
Real-time 3D fault visualization was complex—parsing CAD files, computing gas diffusion PDEs over voxel grids, fusing chemistry with Perplexity failure data via Bayesian inference, and rendering volumetric heat maps in Three.js with WebGL shaders while maintaining smooth framerates.
Ensuring zero-cloud resilience meant building graceful degradation when Perplexity/GPT-4 are unreachable. Nemotron handles core diagnostics offline while we cache Perplexity results in Redis with priority queuing for fault-triggered research.
Accomplishments that we're proud of
5 AI models + Neural Networks running simultaneously on a single RTX 5090 -- quantum VQC, dual gradient boosting ensembles (ETT + DGA), and LLM architectures cooperating in real-time
100% NVIDIA-native edge story - Nemotron + cuQuantum VQC + gradient boosting ensembles all run without any cloud dependency, deployable on a $249 Jetson
Sub-second anomaly detection - 0.21ms quantum inference per sample, 50-200ms ETT ensemble across 20 transformers with ensemble confidence scoring
Perplexity Sonar at ~1,200 tok/s - enabling real-time incident research and 3D spatial fault visualization that's fast enough for grid decisions
Web-grounded spatial intelligence - Perplexity Sonar retrieves NERC reports and failure case studies, rendering 3D transformer models with physics-based fault probability heat maps
Voice-controlled grid monitoring - fully hands-free operation for field technicians
98% DGA fault classification accuracy with 97% F1-score - Quantum-classical hybrid ensemble achieves 98.09% ± 0.80% accuracy across 5-fold cross-validation, with 96.99% macro F1 and 98.08% weighted F1 on multi-class transformer diagnostics
Interactive Real-Time Dashboard: The web-based monitoring interface provides live transformer health visualization with color-coded risk indicators, gas concentration trends, and fault probability heat maps updated in real-time as new sensor data arrives. Built with React and D3.js, the dashboard displays ETT anomaly scores, DGA fault classifications, and ensemble confidence metrics across all monitored transformers simultaneously. Operators can drill down into individual units to view historical gas chemistry plots, Rogers Ratio trends, and Duval Triangle trajectories over time. The interface integrates the conversational AI overlay where Nemotron Nano 4B answers queries like "Why is T047 high risk?" by analyzing the current screen state, making complex diagnostics accessible to field technicians without deep expertise in quantum machine learning or IEEE standards.
What we learned
Ensemble AI Improves Reliability
No single model was consistently correct. Our quantum VQC occasionally misclassified edge cases, but the Nemotron predictors compensated. A weighted ensemble produced more stable, higher-confidence results than any standalone model.
Edge AI Is Essential for Infrastructure
In real grid failures, internet connectivity cannot be assumed. Deploying Nemotron and anomaly detection locally on NVIDIA Jetson eliminates cloud dependency and ensures continuous operation during storms or cascading outages.
Inference Speed Directly Impacts Safety
Reducing trend prediction latency from 30 seconds to under 1 second can materially change outcomes in a cascading grid event. Faster inference enables earlier load shedding and preventive intervention.
Web-Grounded Context Enhances Decision-Making
Local models understand transformer chemistry and fault theory, but integrating Perplexity Sonar adds real-world awareness—recent incidents, recalls, weather threats, and regulatory updates—improving operator situational awareness.
Modular Architecture Enables Rapid Iteration
Separating subsystems (telemetry ingestion, ensemble inference, agent interface, web intelligence layer) allowed us to experiment with model weighting, async fusion, and GPU optimization without destabilizing the full system.
Infrastructure AI Requires Security by Design
Because grid systems are critical infrastructure, we implemented strict input validation, authentication controls, and controlled model invocation to prevent misuse or unsafe command generation.
What's next for GridVeda
Live Edge Demonstrations with NVIDIA Hardware: We plan to optimize and deploy GridVeda on NVIDIA Jetson Orin Nano Super for fully autonomous substation deployment. This includes TensorRT quantization of Nemotron to achieve 2–3x faster inference and production-grade reliability, proving that edge AI can deliver enterprise performance in the field.
Expanded Web-Grounded Intelligence with Perplexity Sonar: We will deepen our integration with Perplexity's Sonar API to provide real-time incident correlation—automatically linking transformer anomalies to NERC reports, weather events, and regional outage data during live operations. Our goal is to showcase GridVeda as the first grid intelligence system that combines edge AI with continuously updated global infrastructure knowledge.
Utility Pilot Programs & Field Validation: We intend to partner with regional utility providers to deploy GridVeda alongside real SCADA feeds. Initial pilots will focus on early transformer degradation detection and substation-level anomaly triage, collecting operational data to validate failure prediction accuracy and refine our models against real-world grid conditions.
Conference & Research Publication: We are preparing a technical paper detailing our hybrid ensemble architecture (Quantum VQC + Ensemble-based NN + LLM) for submission to infrastructure resilience and applied AI conferences. Showcasing GridVeda at technical venues will help bridge academia, utilities, and industry while contributing to the broader research community.
Federated Learning Across Utilities: Next iterations will introduce federated training across multiple substations—allowing utilities to improve anomaly detection collectively without sharing sensitive operational data. This privacy-preserving approach enables grid-wide learning while respecting the security requirements of critical infrastructure. Integration with SCADA systems and IEC 61850 protocols will enable real-time data ingestion from substation sensors, replacing simulated datasets with live transformer telemetry for continuous model updates and immediate fault detection at scale.
From Hackathon to Company: Beyond TreeHacks, we are actively exploring pathways to turn GridVeda into a venture-backed startup. We plan to pursue the Human Capital Fellowship for long-term company building, Neo Accelerator for early-stage product-market validation and strategic partnerships with NVIDIA and infrastructure-focused investors. Our vision is to evolve GridVeda from a 36-hour prototype into a deployable AI infrastructure platform protecting national energy systems.
Log in or sign up for Devpost to join the conversation.