Skip to content

aaf091/peak-charge

Repository files navigation

peak-charge

Predicting high-demand EV charging sessions at workplace facilities before they occur.

Python scikit-learn Dataset AUC-ROC Recall


The Problem

Workplace EV charging drives significant electricity demand charges — costs determined not by total energy consumed, but by the highest 15-minute interval of consumption in a billing period. When employees plug in simultaneously during morning arrival (8–10am), the resulting demand spike can dominate the monthly electricity bill.

High-demand sessions — the top 25% by energy delivered (above 13.7 kWh) — account for 50.3% of all energy consumed despite being only one quarter of all sessions. The challenge: these sessions need to be identified at plug-in time, before charging begins, so that interventions (scheduling nudges, rate adjustments, managed charging) can be deployed in time to matter.


Solution

A Random Forest classifier trained on the NREL Workplace Charging Dataset that flags high-demand sessions at the moment of plug-in, using only information available before charging starts.

Two agentic POC systems built on top of the model:

Agent Problem What it does
session_monitoring_agent.py Real-time demand flagging Scores each plug-in event and fires alerts for high-demand sessions
model_maintenance_agent.py Model health monitoring Detects accuracy drift, retrains automatically, validates before swapping

Results

Metric Value
AUC-ROC (5-fold TimeSeriesSplit CV) 0.860
AUC-ROC (last fold) 0.87
Recall at threshold 0.40 92.7%
Precision at threshold 0.40 54.2%
F1 score at threshold 0.40 0.684
High-demand energy captured 87.1%
Share of total site energy flaggable 43.8%

The classification threshold was lowered from 0.50 to 0.40 to prioritise recall — missing a high-demand session costs money in demand charges; a false alarm costs only an unnecessary nudge.


Dataset

NREL Workplace Charging Dataset — publicly available from the National Renewable Energy Laboratory.

  • Source: https://data.openei.org/submissions/4538
  • Size: 40,179 sessions after cleaning (from 40,979 raw)
  • Period: November 2016 — October 2021
  • Sites: Single workplace facility, 141 charging stations

Key variables

Variable Type Role
kwh_requested float Driver's stated energy need at plug-in
max_charge_power float Charger's maximum delivery rate (kW)
miles_requested int Miles of range requested
planned_duration float Expected dwell time (minutes)
vehicle_model categorical EV make and model
hour / day_of_week / month int Temporal features from session start
afterPaid bool Free vs. paid charging regime
energy_charged float Target — actual kWh delivered
high_demand binary Target — 1 if energy_charged > 13.72 kWh (75th percentile)

Project Structure

peak-charge/
│
├── README.md
│
├── data/
│   ├── sessions.csv              # Cleaned session data (200-session replay sample)
│   └── all_sessions.csv          # Full cleaned dataset for model maintenance
│
├── models/
│   ├── rf_model.pkl              # Trained Random Forest classifier
│   ├── rf_model_backup.pkl       # Auto-backup before each model swap
│   ├── feature_cols.pkl          # Ordered feature list
│   └── baseline_auc.pkl          # Current baseline AUC for drift detection
│
├── agents/
│   ├── session_monitoring_agent.py    # Problem 1 — real-time session flagging
│   └── model_maintenance_agent.py    # Problem 4 — drift detection + retraining
│
├── logs/
│   ├── session_log.csv           # Per-session scoring decisions
│   └── drift_log.jsonl           # Model health check audit trail
│
└── notebooks/
    └── peak_charge_analysis.ipynb    # Full EDA, modelling, and evaluation

Quickstart

Requirements

pip install pandas numpy scikit-learn joblib matplotlib seaborn

Note: Model files (*.pkl) are not committed to this repo. Run the notebook end-to-end to generate them locally before running the agents.

1. Train the model (notebook)

Open notebooks/peak_charge_analysis.ipynb and run all cells. This produces rf_model.pkl, feature_cols.pkl, baseline_auc.pkl, sessions.csv, and all_sessions.csv.

2. Run the session monitoring agent

# Default — replay 200 sessions at threshold 0.40
python agents/session_monitoring_agent.py
 
# Replay all sessions
python agents/session_monitoring_agent.py --n 6696
 
# Slower replay for demos
python agents/session_monitoring_agent.py --delay 0.1
 
# Custom threshold
python agents/session_monitoring_agent.py --threshold 0.50

Sample output:

#0071  Driver 16      Stn 07B   kWh_req  24.0  Score 0.800  → HIGH  ✓
 
  [HIGH DEMAND ALERT]
  Driver:   16
  Station:  07B
  Time:     2016-11-14 07:00:00
  kWh req:  24.0   Miles req: 80
  Score:    0.800  (threshold: 0.4)
  Action → Notify driver / flag for managed charging

3. Run the model maintenance agent

# Normal run — check model health across 6 time windows
python agents/model_maintenance_agent.py
 
# Simulate concept drift to see full retraining cycle
python agents/model_maintenance_agent.py --simulate-drift
 
# Custom evaluation window and drift threshold
python agents/model_maintenance_agent.py --window 90 --threshold 0.84

Methodology

Feature engineering

All features are constructed from information available at plug-in time — no post-session data is used as input.

Feature Construction
hour Extracted from session_start timestamp
day_of_week 0 = Monday, 6 = Sunday
month 1–12
is_weekend Binary flag
wait_minutes Gap between request_entry_time and session_start, clipped 0–480 min
planned_duration Gap between request_entry_time and expected_departure, clipped 0–1440 min
vehicle_model_encoded Top 10 models kept, remainder grouped as "Other", label-encoded
max_charge_power Median-imputed for 3,687 missing values

Target variable

q75 = df['energy_charged'].quantile(0.75)   # = 13.72 kWh
df['high_demand'] = (df['energy_charged'] > q75).astype(int)
# Class distribution: 75% low-demand, 25% high-demand

Model selection

Model CV AUC (mean ± std) CV F1
Logistic Regression 0.8625 ± 0.0158 0.6199 ± 0.0813
Random Forest 0.8600 ± 0.0449 0.6353 ± 0.0942

Random Forest selected as final model — higher last-fold AUC (0.87 vs 0.84), better recall on high-demand class (0.83 vs 0.78), and more robust to nonlinear feature interactions.

Validation strategy

TimeSeriesSplit with 5 folds — training always on the past, testing always on the future. Prevents data leakage from future sessions into training.

Fold 1: train=6,699   test=6,696   (earliest data)
Fold 2: train=13,395  test=6,696
Fold 3: train=20,091  test=6,696
Fold 4: train=26,787  test=6,696
Fold 5: train=33,483  test=6,696   (most recent data)

Key Findings

Feature importance (permutation, AUC-ROC drop)

Feature Importance Interpretation
kwh_requested 0.0599 Driver's stated need — strongest single signal
max_charge_power 0.0550 Hard ceiling on energy delivery rate
miles_requested 0.0490 Correlated with kwh_requested; confirms intent
planned_duration 0.0069 Longer stays allow more energy to accumulate
vehicle_model_encoded 0.0131 Tesla/Bolt = large battery; Volt/Spark = small
hour 0.0100 Some nonlinear time signal
afterPaid, is_weekend ~0.000 No predictive value — candidates for removal

Vehicle model signal

Tesla Model 3 sessions are high-demand 85% of the time. Prius Prime, Audi A3 E-Tron, and Ford Fusion (PHEVs with small batteries) are near 0%. Vehicle type is a strong proxy for battery capacity and therefore demand potential.

Pricing regime

Sessions under paid charging (afterPaid=1) show only marginally higher demand rates than free sessions (27% vs 24%). Pricing regime is not a meaningful predictor.


Agentic AI Framework

This project demonstrates how prediction alone is insufficient — acting on predictions at scale requires agentic systems. Four problems and their agent-based solutions:

Problem Root cause Agent solution POC status
Predictions not acted on in real time Human operators can't monitor 100s of sessions/day Session monitoring agent Built
One-size-fits-all interventions Different drivers respond differently Driver personalisation agent Planned
Simultaneous arrival spikes Individual scoring misses aggregate load Load coordination agent Deferred
Model degrades as fleet evolves Static model, shifting vehicle population Model maintenance agent Built

Limitations

  • Temporal drift: The dataset spans 2016–2021. The vehicle fleet has changed significantly since then — Tesla/Bolt share has grown, increasing average battery sizes and shifting the demand distribution.
  • Single-site data: All sessions come from one facility. Station-level features (station_encoded) may not generalise to other sites.
  • kwh_requested caveat: This is a strong predictor (r=0.456 with target) but also highly correlated with actual energy delivered. While it is genuinely available at plug-in time, it may partially reflect self-fulfilling driver behaviour.
  • No real charger integration: Both agents use CSV replay as a simulated event stream. Production deployment would require an OCPP webhook or charger management system API.

AI Assistance

This project was developed with AI assistance using Claude (Anthropic) as a pair-programming and analysis tool — a workflow sometimes referred to as vibe coding.

Claude was used throughout the project lifecycle: exploring and cleaning the dataset, engineering features, selecting and evaluating models, writing the agentic POC scripts, and drafting documentation. All outputs were reviewed, validated, and tested by the author at each step.

Guardrails kept in mind throughout:

  • No data leakage — features were explicitly restricted to information available at plug-in time; session outcomes were never used as predictors
  • Time-aware validationTimeSeriesSplit was used throughout to ensure the model was always evaluated on future data, never on data it had already seen
  • Human-in-the-loop review — every model decision, threshold choice, and agent behaviour was examined and confirmed before being accepted
  • Honest evaluation — metrics were computed on a held-out temporal fold, not on training data; limitations are documented explicitly
  • Reproducibility — all random seeds are fixed; the full pipeline from raw data to trained model is contained in a single notebook

AI assistance accelerated development but did not replace critical thinking about the problem, the data, or the validity of results.


Acknowledgements

Dataset provided by the National Renewable Energy Laboratory (NREL) via the Open Energy Data Initiative (OEDI).

A. Burrell, N. Rustagi, et al. Workplace EV Charging Dataset. NREL, 2021. https://data.openei.org/submissions/4538

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors