ShelterTO is a platform for resource allocation for homeless shelters. The goal is to provide a quantitative framework for both running and funding shelters: providing likely future insights and early warnings for surges, and allocating donations to the shelters most in need of help.
Our vision was a website that acts as a hub for NGOs to allocate their resources: they would have their own accounts, which would give them insights as to how the shelters might be operating in the near-time future.
This data is then used to allocate donor money, with it going to the shelters and NGOs that are most in-need and at-risk.
If the provided link doesn't work, you can still use the website. Refer to the SETUP.md file for instructions.
- Shelter data: City of Toronto Daily Shelter & Overnight Service Occupancy (2024-2025 from SDSS; 2023 from CKAN API). Rows are at daily grain per shelter identity:
LOCATION_POSTAL_CODE,SECTOR,PROGRAM_MODEL,CAPACITY_TYPE. We useACTUAL_CAPACITY,OCCUPIED_CAPACITY,OCCUPANCY_RATE.- We only predict occupancy rate: this avoids the problem of room-based and bed-based shelters in one dataset.
- Refugee flow: Toronto Shelter System Flow (CKAN API), filtered to population group "refugees". Monthly series:
newly_identified,moved_to_housing,actively_homeless. We forward-fill to make the data daily for the feature pipeline, as there is no future data. - Weather: Open-Meteo: We use the archive API for past dates, and the forecast API for future. Daily:
apparent_temperature_max/min,precipitation_sum,snowfall_sum, all evaluated at the lat/lon of Bahen Centre.
Pipeline: Ingest → align to daily grain → feature engineering → XGBoost → allocation → product.
- We do not predict OCCUPIED_CAPACITY or UNAVAILABLE_CAPACITY to avoid leakage; unit-level metrics are always Predicted_Units = Predicted_Rate × ACTUAL_CAPACITY.
- Features:
- Weather:
apparent_temperature_max,apparent_temperature_min,precipitation_sum,snowfall_sum. - Engineered:
thermal_stress= apparent_temperature_min × (snowfall_sum + 1);net_refugee_pressure= newly_identified − moved_to_housing,demographic_weight= SECTOR_SHARES × SECTOR_GENDER_RATIO × net_refugee_pressure,daylight_hoursfrom sunrise/sunset, and we had refugee columns forward-filled to daily. - Lags (per shelter group):
occupancy_rate_lag1,lag2,lag7(strictly past days). 7-day rolling mean and std of occupancy rate, computed with shift(1) so the current day is never used. - Categoricals: SECTOR, PROGRAM_MODEL, PROGRAM_AREA, CAPACITY_TYPE.
- Weather:
- Training: Walk-forward time-series cross-validation (
TimeSeriesSplit, n_splits=5). Final model is trained on the full training window with early-stopping iteration fixed from CV. Model: XGBoost Regressor (n_estimators, max_depth=5, learning_rate=0.05, subsample/colsample 0.8, L1/L2 regularization). Predictions are clipped to [0, 1]. - Evaluation: When tested on data out-of-distribution, we get an R^2 of 0.7800 and an MAE of less than 0.4, which is less than half a unit (room/bed). This is a very good prediction, while not being overfit.
- Predictions are aggregated to sector level (sum of predicted_units and actual capacity per day per SECTOR).
- Per-sector residual σ is computed from CV residuals (actual_occupancy − predicted_units) per sector.
- Optimal allocation (sector-day):
optimal_allocation = predicted_units + 1.96 × σ_sector
(95% one-sided buffer above point prediction). 1.96 is hard-coded based on our previous data. - Surge warning:
surge_warning = (optimal_allocation > actual_capacity)for that sector-day. Used to flag when the buffered demand could exceed available capacity.
- Inputs: The trained model, column list pulled from the SDSS spreadsheet, a shelter identified by postal code (first matching group in data: LOCATION_POSTAL_CODE, SECTOR, PROGRAM_MODEL, CAPACITY_TYPE), and a forecast_start_date.
- Data: Refugee data (CKAN) forward-filled over the 7-day window; weather from Open-Meteo forecast API for future dates (or archive for past). Shelter history from the same group in the occupancy dataset.
- Lags for forecast: For each of the 7 days we need lag1, lag2, lag7 and 7-day rolling mean/std. We maintain a short rolling occupancy series (last 7–14 days). It is initialised from historical occupancy up to the day before forecast_start_date. Optional anchor_rate: if "today's" actual rate is provided, it is appended so day 1 of the forecast uses it as lag1. For each forecast day we compute features from that series, predict rate, then append the predicted rate to the series and advance (drop oldest if length > 14). So day 2 uses day 1's prediction as lag1, etc.
- Output: One row per day:
date,predicted_occupancy_rate,predicted_units,optimal_allocation(predicted_units + 1.96×σ_sector when sector_sigmas provided),surge_warning, plus optional cost and volunteer metrics.
- Map of shelters —
frontend/update_shelters_map.pyreads all the shelter data and aggregates by postal code. The final result is a map of the average across a year. Filters by PROGRAM_AREA and SECTOR are injected into the HTML. Output:frontend/public/toronto_shelters_map.html, embedded in the site. - To make this map precise, we found the coordinates of every postal code listed in the data, and wrote them in a json, postal_code_coords.json. This let us quickly look up locations without expensive API calls.
- Web app — React frontend with dashboard and map; an API used in the website for forecast and allocation summaries. We set up a sandbox Stripe wallet for payments.
- Sector-level variance: Uncertainty is approximated by the per-sector standard deviation of CV residuals. The 1.96σ buffer is hard-coded as a one-sided 95% interval, as demand may exceed prediction with ~2.5% probability if residuals are normal.
- No reallocation across sectors: Optimal allocation and surge are computed per sector; we do not model shifting capacity between sectors. I.e. we do not model those in at-capacity women's shelters going to emptier men's shelters.
- Monthly → daily: Refugee flow is monthly; we forward-fill to daily so all days in a month share the same newly_identified, moved_to_housing, actively_homeless until the next month.
- SECTOR_SHARES and SECTOR_GENDER_RATIO are fixed constants, as they change very little over time.
- Weather: Future days use Open-Meteo forecast API, as we cannot observe weather in the future.
- Refugee data: Latest available month is forward-filled for the whole 7-day window; we do not have daily refugee counts.
- Shelter identity: When there are multiple shelters with one postal code, their occupancies are summed and we take a weighted average of the occupancy rate. Multiple category tags are assigned to this combined shelter, if the shelters making it up have different ones.
- Autoregressive lags: Each day's prediction becomes the next day's lag1, so our model's outputs become our model's inputs, creating a decent proxy for this data..
- Room vs bed: We never multiply rooms by a factor to equivocate them with beds. Capacity and occupancy are used as reported, and our prediction is always a rate, which we may then multiplied by that shelter's ACTUAL_CAPACITY.
- Time: Model is trained on 2024–2025; 2023 is used only for backtesting to check overfitting and drift.
- Repo layout:
backend/—backtest.py(load data, build_features, train_model, evaluate_model, run_backtest, load_2023_data),main-notebook.ipynb,data/(Excel + optional Parquet).api_functions/—forecast.py(forecast_7day, refugee/weather loaders), optional FastAPImain.py.frontend/— Web app,update_shelters_map.py(regenerate heatmap based on filters),public/toronto_shelters_map.html. (display heatmap)- Project root:
postal_code_coords.json(geocoding cache for precise and fast maps of postal codes to latitude and longitude),requirements.txt.
- Run backtest (train on 2024–2025, evaluate on 2023):
python backend/backtest.py
(uses default data paths and APIs; 2023 shelter/refugee/weather are fetched inside the script.) Should return a result with an R^2 of 0.7800. - Regenerate heatmap:
From project root:python frontend/update_shelters_map.py
Readsbackend/data/public_services_dataset.xlsxandpostal_code_coords.json. and then it writesfrontend/public/toronto_shelters_map.html. - 7-day forecast (in code):
from api_functions.forecast import forecast_7day
forecast_df = forecast_7day(model, feature_cols, postal_code='M5A1A1', forecast_start_date='2025-02-01')
Model andfeature_colscome fromtrain_model()/run_backtest()inbackend/backtest.py. - Dependencies: See
requirements.txt