Ubering into Data

Inspiration

We chose to pursue StrataScratch's data challenge because we would like to support Uber and their drivers, and as beginners, it was helpful to have a prompt to guide us.

What it does

Our project predicts if and when a person will begin driving based on other details of their registration.

How we built it

We programmed in Python with libraries like numpy, pandas, and matplotlib to analyze and visualize data, and scikit-learn to train models. This was all done on Deepnote to facilitate collaboration.

Challenges we ran into

As first time datathon attendees, we had to learn a lot about different models and data analysis techniques. For example, the data preprocessing was challenging because there were many null values that we did not how to handle (if we deleted them all we would have lost most of the dataset). The dataset was pretty rigid, so it limited our creativity when it came to finding connections between different variables.

Accomplishments that we're proud of

Building and training a machine learning model without given skeleton code.

What we learned

Looking at the bigger picture when it comes to data, not focusing on the code or the model, but what it tells us about the dataset and what our "client" should do with that information. Learned to handle challenging datasets that mimicked real-word scenarios.

What's next for Ubering into Data

Proposals to Uber on how best to optimize the signup-to-driver pipeline to improve accessibility to rideshare services to Uber customers. We could also apply more logic and model training for better data cleaning, especially with the signup_os column.