peak-charge

Predicting high-demand EV charging sessions at workplace facilities before they occur.

The Problem

Workplace EV charging drives significant electricity demand charges — costs determined not by total energy consumed, but by the highest 15-minute interval of consumption in a billing period. When employees plug in simultaneously during morning arrival (8–10am), the resulting demand spike can dominate the monthly electricity bill.

High-demand sessions — the top 25% by energy delivered (above 13.7 kWh) — account for 50.3% of all energy consumed despite being only one quarter of all sessions. The challenge: these sessions need to be identified at plug-in time, before charging begins, so that interventions (scheduling nudges, rate adjustments, managed charging) can be deployed in time to matter.

Solution

A Random Forest classifier trained on the NREL Workplace Charging Dataset that flags high-demand sessions at the moment of plug-in, using only information available before charging starts.

Two agentic POC systems built on top of the model:

Agent	Problem	What it does
`session_monitoring_agent.py`	Real-time demand flagging	Scores each plug-in event and fires alerts for high-demand sessions
`model_maintenance_agent.py`	Model health monitoring	Detects accuracy drift, retrains automatically, validates before swapping

Results

Metric	Value
AUC-ROC (5-fold TimeSeriesSplit CV)	0.860
AUC-ROC (last fold)	0.87
Recall at threshold 0.40	92.7%
Precision at threshold 0.40	54.2%
F1 score at threshold 0.40	0.684
High-demand energy captured	87.1%
Share of total site energy flaggable	43.8%

The classification threshold was lowered from 0.50 to 0.40 to prioritise recall — missing a high-demand session costs money in demand charges; a false alarm costs only an unnecessary nudge.

Dataset

NREL Workplace Charging Dataset — publicly available from the National Renewable Energy Laboratory.

Source: https://data.openei.org/submissions/4538
Size: 40,179 sessions after cleaning (from 40,979 raw)
Period: November 2016 — October 2021
Sites: Single workplace facility, 141 charging stations

Key variables

Variable	Type	Role
`kwh_requested`	float	Driver's stated energy need at plug-in
`max_charge_power`	float	Charger's maximum delivery rate (kW)
`miles_requested`	int	Miles of range requested
`planned_duration`	float	Expected dwell time (minutes)
`vehicle_model`	categorical	EV make and model
`hour` / `day_of_week` / `month`	int	Temporal features from session start
`afterPaid`	bool	Free vs. paid charging regime
`energy_charged`	float	Target — actual kWh delivered
`high_demand`	binary	Target — 1 if energy_charged > 13.72 kWh (75th percentile)

Project Structure

peak-charge/
│
├── README.md
│
├── data/
│   ├── sessions.csv              # Cleaned session data (200-session replay sample)
│   └── all_sessions.csv          # Full cleaned dataset for model maintenance
│
├── models/
│   ├── rf_model.pkl              # Trained Random Forest classifier
│   ├── rf_model_backup.pkl       # Auto-backup before each model swap
│   ├── feature_cols.pkl          # Ordered feature list
│   └── baseline_auc.pkl          # Current baseline AUC for drift detection
│
├── agents/
│   ├── session_monitoring_agent.py    # Problem 1 — real-time session flagging
│   └── model_maintenance_agent.py    # Problem 4 — drift detection + retraining
│
├── logs/
│   ├── session_log.csv           # Per-session scoring decisions
│   └── drift_log.jsonl           # Model health check audit trail
│
└── notebooks/
    └── peak_charge_analysis.ipynb    # Full EDA, modelling, and evaluation

Quickstart

Requirements

pip install pandas numpy scikit-learn joblib matplotlib seaborn

Note: Model files (*.pkl) are not committed to this repo. Run the notebook end-to-end to generate them locally before running the agents.

1. Train the model (notebook)

Open notebooks/peak_charge_analysis.ipynb and run all cells. This produces rf_model.pkl, feature_cols.pkl, baseline_auc.pkl, sessions.csv, and all_sessions.csv.

2. Run the session monitoring agent

# Default — replay 200 sessions at threshold 0.40
python agents/session_monitoring_agent.py
 
# Replay all sessions
python agents/session_monitoring_agent.py --n 6696
 
# Slower replay for demos
python agents/session_monitoring_agent.py --delay 0.1
 
# Custom threshold
python agents/session_monitoring_agent.py --threshold 0.50

Sample output:

#0071  Driver 16      Stn 07B   kWh_req  24.0  Score 0.800  → HIGH  ✓
 
  [HIGH DEMAND ALERT]
  Driver:   16
  Station:  07B
  Time:     2016-11-14 07:00:00
  kWh req:  24.0   Miles req: 80
  Score:    0.800  (threshold: 0.4)
  Action → Notify driver / flag for managed charging

3. Run the model maintenance agent

# Normal run — check model health across 6 time windows
python agents/model_maintenance_agent.py
 
# Simulate concept drift to see full retraining cycle
python agents/model_maintenance_agent.py --simulate-drift
 
# Custom evaluation window and drift threshold
python agents/model_maintenance_agent.py --window 90 --threshold 0.84

Methodology

Feature engineering

All features are constructed from information available at plug-in time — no post-session data is used as input.

Feature	Construction
`hour`	Extracted from `session_start` timestamp
`day_of_week`	0 = Monday, 6 = Sunday
`month`	1–12
`is_weekend`	Binary flag
`wait_minutes`	Gap between `request_entry_time` and `session_start`, clipped 0–480 min
`planned_duration`	Gap between `request_entry_time` and `expected_departure`, clipped 0–1440 min
`vehicle_model_encoded`	Top 10 models kept, remainder grouped as "Other", label-encoded
`max_charge_power`	Median-imputed for 3,687 missing values

Target variable

q75 = df['energy_charged'].quantile(0.75)   # = 13.72 kWh
df['high_demand'] = (df['energy_charged'] > q75).astype(int)
# Class distribution: 75% low-demand, 25% high-demand

Model selection

Model	CV AUC (mean ± std)	CV F1
Logistic Regression	0.8625 ± 0.0158	0.6199 ± 0.0813
Random Forest	0.8600 ± 0.0449	0.6353 ± 0.0942

Random Forest selected as final model — higher last-fold AUC (0.87 vs 0.84), better recall on high-demand class (0.83 vs 0.78), and more robust to nonlinear feature interactions.

Validation strategy

TimeSeriesSplit with 5 folds — training always on the past, testing always on the future. Prevents data leakage from future sessions into training.

Fold 1: train=6,699   test=6,696   (earliest data)
Fold 2: train=13,395  test=6,696
Fold 3: train=20,091  test=6,696
Fold 4: train=26,787  test=6,696
Fold 5: train=33,483  test=6,696   (most recent data)

Key Findings

Feature importance (permutation, AUC-ROC drop)

Feature	Importance	Interpretation
`kwh_requested`	0.0599	Driver's stated need — strongest single signal
`max_charge_power`	0.0550	Hard ceiling on energy delivery rate
`miles_requested`	0.0490	Correlated with kwh_requested; confirms intent
`planned_duration`	0.0069	Longer stays allow more energy to accumulate
`vehicle_model_encoded`	0.0131	Tesla/Bolt = large battery; Volt/Spark = small
`hour`	0.0100	Some nonlinear time signal
`afterPaid`, `is_weekend`	~0.000	No predictive value — candidates for removal

Vehicle model signal

Tesla Model 3 sessions are high-demand 85% of the time. Prius Prime, Audi A3 E-Tron, and Ford Fusion (PHEVs with small batteries) are near 0%. Vehicle type is a strong proxy for battery capacity and therefore demand potential.

Pricing regime

Sessions under paid charging (afterPaid=1) show only marginally higher demand rates than free sessions (27% vs 24%). Pricing regime is not a meaningful predictor.

Agentic AI Framework

This project demonstrates how prediction alone is insufficient — acting on predictions at scale requires agentic systems. Four problems and their agent-based solutions:

Problem	Root cause	Agent solution	POC status
Predictions not acted on in real time	Human operators can't monitor 100s of sessions/day	Session monitoring agent	Built
One-size-fits-all interventions	Different drivers respond differently	Driver personalisation agent	Planned
Simultaneous arrival spikes	Individual scoring misses aggregate load	Load coordination agent	Deferred
Model degrades as fleet evolves	Static model, shifting vehicle population	Model maintenance agent	Built

Limitations

Temporal drift: The dataset spans 2016–2021. The vehicle fleet has changed significantly since then — Tesla/Bolt share has grown, increasing average battery sizes and shifting the demand distribution.
Single-site data: All sessions come from one facility. Station-level features (station_encoded) may not generalise to other sites.
kwh_requested caveat: This is a strong predictor (r=0.456 with target) but also highly correlated with actual energy delivered. While it is genuinely available at plug-in time, it may partially reflect self-fulfilling driver behaviour.
No real charger integration: Both agents use CSV replay as a simulated event stream. Production deployment would require an OCPP webhook or charger management system API.

AI Assistance

This project was developed with AI assistance using Claude (Anthropic) as a pair-programming and analysis tool — a workflow sometimes referred to as vibe coding.

Claude was used throughout the project lifecycle: exploring and cleaning the dataset, engineering features, selecting and evaluating models, writing the agentic POC scripts, and drafting documentation. All outputs were reviewed, validated, and tested by the author at each step.

Guardrails kept in mind throughout:

No data leakage — features were explicitly restricted to information available at plug-in time; session outcomes were never used as predictors
Time-aware validation — TimeSeriesSplit was used throughout to ensure the model was always evaluated on future data, never on data it had already seen
Human-in-the-loop review — every model decision, threshold choice, and agent behaviour was examined and confirmed before being accepted
Honest evaluation — metrics were computed on a held-out temporal fold, not on training data; limitations are documented explicitly
Reproducibility — all random seeds are fixed; the full pipeline from raw data to trained model is contained in a single notebook

AI assistance accelerated development but did not replace critical thinking about the problem, the data, or the validity of results.

Acknowledgements

Dataset provided by the National Renewable Energy Laboratory (NREL) via the Open Energy Data Initiative (OEDI).

A. Burrell, N. Rustagi, et al. Workplace EV Charging Dataset. NREL, 2021. https://data.openei.org/submissions/4538

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
drift_log.jsonl		drift_log.jsonl
e=mc_2.ipynb		e=mc_2.ipynb
jupyterNotebook		jupyterNotebook
maintenance_agent.py		maintenance_agent.py
monitor_agent.py		monitor_agent.py
session_log.csv		session_log.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

peak-charge

The Problem

Solution

Results

Dataset

Key variables

Project Structure

Quickstart

Requirements

1. Train the model (notebook)

2. Run the session monitoring agent

3. Run the model maintenance agent

Methodology

Feature engineering

Target variable

Model selection

Validation strategy

Key Findings

Feature importance (permutation, AUC-ROC drop)

Vehicle model signal

Pricing regime

Agentic AI Framework

Limitations

AI Assistance

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

peak-charge

The Problem

Solution

Results

Dataset

Key variables

Project Structure

Quickstart

Requirements

1. Train the model (notebook)

2. Run the session monitoring agent

3. Run the model maintenance agent

Methodology

Feature engineering

Target variable

Model selection

Validation strategy

Key Findings

Feature importance (permutation, AUC-ROC drop)

Vehicle model signal

Pricing regime

Agentic AI Framework

Limitations

AI Assistance

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages