Inspiration

Access to shelter is a critical lifeline, especially during extreme weather. Our project aims to better understand how qualitative features and daily patterns affect shelter occupancy, so that services can be planned more effectively, resources allocated efficiently, and vulnerable populations can be supported before crises hit. By turning data into actionable insight, we hope to help shelters stay one step ahead of demand.

What it does

The model takes relevant public service features such as actual capacity, unavailable capacity, month, day of week, sector, overnight service type and program model to predict shelter occupancy rate. We suspect a strong nonlinear relationship between our independent variables and our dependent variables.

How we built it

Our project consisted of 4 main steps : Data preprocessing ─ removing NaN values and empty values were crucial for proper model training, and using feature engineering to turn qualitative data into numerical data using one-hot vectors. EDA ─ We experimented with various models including Linear and Logistic Regression, Gradient Boosting and Random Forest. We searched for a model that could most accurately take our 7 independent variables as input and predict the occupancy rate of a given location. Model Evaluation and Selection ─ We measured "goodness-of-fit" using ROC-AUC scores, and we discovered Gradient Boosting had the best performance with 0.965 ROC-AUC. Insights ─ We focused on communicating exactly what the model tells us, and brainstormed practical steps the government can take to better allocate resources.

Challenges we ran into

We are a team of 2 math + stats students with some ML experience, and this was our first datathon. Although knowing some stats theory, learning various statistical software and methods on the fly was both challenging and fun. We faced challenges making our map visual tool user friendly, as well as learning software to create UI demos such as Streamlit in the final hour of the hack. At one point we tried to bring in external weather data to build a model predicting occupancy rate from weather conditions but had data formatting problems. From this, I learned that data practices that sound simple on paper were actually difficult to implement on a time crunch.

Accomplishments that we're proud of

Using good data processing and documentation practices

Constant problem solving and picking up software/libraries we've never heard of

Focusing on interpretability and user-friendly experience, we wanted the viewer to understand exactly what we were trying to do

Continuously uplifting each other and not giving up when the situation seemed lost

What we learned

We learned that Toronto's shelters operate at critical capacity 88% of the time, and immediate action should be taken to allocate resources to those in need. A strong non-linear pattern was present between our predictors and dependent variable with month and day of week strongly influence occupancy rate, and this can help the system plan ahead for seasonal peaks. Gradient boosted methods proved to have better accuracy than Random Forest methods due to the complex structure of our dataset.

What's next for Group 23

Moving forward, we aim to:

Implement a web app UI for a more seamless user experience Explore more location-based metrics such as shelter-to-shelter walkability, crucial for winters. Add more graphs for easier to access insights Use our model as a function to optimize over Toronto to find locations that are most in need for extra shelters

Special shoutout to our mentor Vincent for your support, we couldn't have done it without you! This project was built with the help of ChatGPT and Claude models

Built With

Share this project:

Updates