An anonymized case study from a UK professional football academy environment.
Improve player availability and support longer-horizon decisions (e.g., development planning and retain/release discussions) by combining physical benchmarks with contextual performance signals. The hard part is messy data (gaps, changing definitions, small samples) and making outputs usable for non-technical stakeholders.
- Sustained >90% player availability across high-intensity competition periods via improved monitoring and decision routines.
- Drove an aggregate ~12% improvement across key physical output metrics through data-led periodisation and longitudinal tracking.
- Reduced manual data entry by ~40% by automating ingestion/ETL from multiple external data sources.
- Longitudinal load and exposure (training + match) from wearable tracking systems.
- Availability and medical context (time-loss / restricted training windows).
- Tactical context signals (e.g., xG/xT/VAEP-style aggregates) used as covariates, not as the sole driver.
- Benchmarks and standards by position/age band.
- SQL/Postgres for modeling-ready tables and longitudinal views
- Python/R for feature engineering, modeling, and statistical evaluation
- Dashboards/reports: Shiny/Dash-style reporting surfaces for staff-facing outputs
- KPI redesign: defined actionable "availability" features to replace legacy, fragile KPIs.
- Feature engineering: acute vs chronic exposure, trend breaks, monotonic constraints where appropriate, and standardized benchmarking (percentiles / z-scores) for comparability.
- Modeling: started with transparent baselines (rules/linear/regularized) before evaluating non-linear alternatives.
- Validation: time-aware splits and leakage control; calibration checks to make probability outputs decision-ready.
- Retain/release: combined tactical-context aggregates with physical benchmarks to support objective conversations.
- Load management: translated model outputs into staff-friendly decision cards (what changed, why it matters, what to do next).
- Designed for sensitive, human-centric data: access control, redaction-by-default reporting, and audit-friendly outputs.
flowchart LR
A[Wearables + availability records + context metrics] --> B[Ingest + normalize]
B --> C[Feature engineering]
C --> D[Baselines + candidate models]
D --> E[Time-aware validation + calibration]
E --> F[Decision cards + dashboards]
- A decision-support view that combines (1) availability risk/likelihood, (2) drivers/attribution at feature-group level, and (3) suggested actions (load management, monitoring focus, or further review).
- A benchmarking layer for recruitment/development conversations (transparent percentiles and distributions).
Example reporting outputs (anonymized visuals):
- Benchmarking to determine asymmetry status:
assets/benchmark_asymmetry.png - Game/week difficulty context:
assets/game_difficulty.png - HR-integrated GPS training load monitoring:
assets/hr_integrated_training_load.png - Sprint exposure monitoring:
assets/sprint_exposure.png - Benchmarking cardiovascular capacities:
assets/benchmark_cardio_capacity.png - Standardised per-minute match report (apples-to-apples):
assets/per_minute_match_report.png
Code, raw data, and internal identifiers are private. This repository documents the methodology, validation approach, and example artifacts without exposing proprietary details.





