Predicting Driver Signups

Introduction Slide
Exploratory Data Analysis
Data Manipulation/Wrangling
Predictive Model
Our Findings

Inspiration

Our team chose this project because we were passionate about using data to help businesses make informed decisions that could drive real-world impact. This Data Challenge presented an exciting opportunity to apply data science techniques to solve a problem for Uber, predicting which new driver signups would go on to become active drivers.

What it does

The final deliverable of our project is a business-oriented presentation for the Uber Driver’s team. We’ve identified key factors that influence whether a driver will go from signing up to completing their first trip. Our findings are designed to help Uber improve their signup process and increase driver activation rates by providing actionable insights on what areas to focus on.

How we built it

To build our predictive model, we applied machine learning and data preprocessing techniques. We began by cleaning and transforming the data, creating new features that better captured time-related events. Then, we selected and trained various models to identify the most influential factors in the signup-to-driver conversion process.

Challenges we ran into

One of the major challenges we faced was the skewed nature of the dataset — around 90% of the data consisted of drivers who did not complete their first trip. This imbalance made it difficult to train accurate models. To address this, we employed resampling techniques, threshold adjustments, and ensemble methods to improve model performance and reduce bias towards the non-driver class.

Accomplishments that we're proud of

We successfully overcame the class imbalance challenge by implementing effective resampling strategies and adjusting thresholds, leading to robust model performance.

What we learned

We gained hands-on experience in handling imbalanced datasets, experimenting with techniques like SMOTE and threshold adjustments to improve model accuracy.
We learned how to select and apply the right predictive models for binary classification tasks, specifically focusing on models like Logistic Regression and XGBoost.
We also improved our ability to present data-driven findings in a way that resonates with non-technical stakeholders, honing our skills in communicating complex analysis in a business context.

What's next for Predicting Driver Signups

Reflect on what we learned during this project and focus on applying it in future endeavours.

Built With

canva
deepnote
matplotlib
numpy
pandas
python
seaborn
sklearn

Submitted to

Soar Datathon 2025
- Winner Best Data Visualization

Created by

Contributed to the data manipulation and the development of predictive models. I also framed our presentation findings to ensure insights were clear and actionable.

Allison Su
Assisted with manipulating the data and creating the predictive models. Also helped in understanding analyzing the outcomes of the models and how well they performed.

Akriti Singh
Assisted with data exploration, validation and cleaning.

Ivan Gonzalez
I worked on the data visualization. I developed the original visuals using Python libraries (matplotlib) and then transferred the data onto Canva to create modern/interactive data visuals. I also helped with creating the predictive models.

Arianna Gonzalez

Updates

Akriti Singh started this project — Apr 13, 2025 12:43 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.