Aegis

Inspiration

Power outages and natural disasters generate an overwhelming stream of text: incident notes, status updates, weather bulletins, and operator reports. We were inspired by a simple question:

Can we turn that noisy text into a reliable, real-time signal that helps people make better decisions under pressure?

We wanted to build something practical for grid operations, where minutes matter and clear prioritization is critical.

What it does

Aegis is an applied NLP decision-support pipeline for outage and disaster response.

Ingests live outage telemetry and disaster/event text feeds
Parses incident text into structured fields:
event_type, cause, severity, region, confidence
Converts those signals into a normalized risk score
Applies explicit decision logic to produce alert actions
Generates AI response suggestions for operators (assessment + actionable steps)

What it does

Aegis is an applied NLP decision-support pipeline for outage and disaster response.

Ingests live outage telemetry and disaster/event text feeds
Parses incident text into structured fields (event_type, cause, severity, region, confidence)
Converts those signals into a normalized risk score
Applies explicit decision logic to produce alert actions
Generates AI response suggestions for operators (assessment + actionable steps)
In short: text in -> risk signal -> decision out.

How we built it

We built Aegis as a full-stack pipeline.

Backend

FastAPI with modular services for:
- parsing
- scoring
- alerts
- replay consumption
- AI suggestions

NLP Parser

Rule-based extraction of:

event
cause
severity
region

with confidence scoring.

Risk Engine

A weighted scoring model combining:

severity
event type
cause
confidence

Decision Layer

Threshold-based alert policy producing:

send_alert
priority
channel
audience

AI Suggestions

Gemini-powered tactical recommendations with:

schema-constrained outputs
deterministic fallback behavior

Replay Support

Timestamp-aware feed consumption with cursor reset to simulate information arriving over time.

Frontend

React dashboard with:

risk score cards
explanation factors
region risk
incident/disaster feeds
replay controls
inline AI recommendations

Risk Score Dynamics

To avoid pathological saturation, we used the following scoring function:

Our outage-to-score transform:

$$ S(m) = \min\left(100,\ \mathrm{round}\left(12 \cdot \log_{10}(1 + m)\right)\right) $$

where m is the aggregated affected-customer count.

Temporal Score Updates

To smooth volatility and prevent single spikes from permanently pinning risk:

$$ S_t = (1 - \alpha) S_{t-1} + \alpha S_{\text{new}} $$

where:

(S_t) = current risk score
(S_{t-1}) = previous score
(S_{\text{new}}) = newly computed score
(\alpha) = smoothing factor controlling responsiveness

This produces a stable but responsive risk signal.

Challenges we ran into

Calibrating score behavior so it is responsive but not noisy
Preventing score saturation at 100 for large outage bursts
Handling negation in language (e.g., “no failure”)
Resolving async UI state races between live feed updates and NLP updates
Making recommendations structured, robust, and explainable

Accomplishments that we're proud of

Delivered a complete end-to-end decision pipeline, not just a classifier
Built explicit, auditable alert logic with clear operator-facing outputs
Added replay-driven evaluation workflow for realistic time-order testing
Integrated AI recommendations while preserving deterministic fallback behavior
Designed a clear dashboard that ties evidence to decisions

What we learned

The hard part of applied NLP is decision reliability, not extraction alone
Deterministic fallbacks and transparent rules are crucial for operator trust
Time-ordered evaluation prevents leakage and improves realism
UX clarity (signal + rationale + action) is as important as model quality

What's next for Aegis

Baseline-vs-model evaluation metrics:
- precision
- recall
- false alarms
- decision utility
Confidence calibration and uncertainty reporting
Drift detection as incident language evolves
Policy simulation for threshold tuning under different risk appetites
Production-grade controls:
- audit trails
- security hardening
- governance