Inspiration

Our inspiration for the DunkMetrics project stemmed from the desire to leverage historical play-by-play data to predict score margins for the 2022-2023 NBA season. We aimed to explore the potential of data analytics in enhancing performance predictions in sports, particularly basketball.

What it does

Our project utilizes a combination of data engineering, data analysis, and data science techniques to achieve its objectives:

  1. Data Engineering: Ingests raw data from S3, processes it using Databricks Delta, and outputs refined datasets for downstream analytics.
  2. Data Analysis: Procures and examines datasets using SQL, constructs interactive dashboards, and derives key insights from the data.
  3. Data Science: Conducts exploratory data analysis, develops regression models for score margin predictions, evaluates model performance, and generates forecasts.

How we built it

  1. Data Engineering: We transferred raw data from S3 to Databricks Delta tables, processed it using the Delta Live Table pipeline based on the Medallion Architecture, and delivered cleaned and aggregated datasets.
  2. Data Analysis: Leveraging SQL within Databricks lakehouse, we compiled datasets, constructed graphical interfaces, and interpreted data to extract crucial insights.
  3. Data Science: We embarked on exploratory data analysis and feature engineering, developed regression models using scikit-learn, assessed model reliability, and forecasted outcomes.

Challenges we ran into

  1. Data Processing Complexity: Managing and processing large volumes of historical play-by-play data posed challenges in terms of scalability and efficiency.
  2. Model Optimization: Tuning regression models for optimal performance required extensive experimentation and fine-tuning of hyperparameters.
  3. Interpreting Insights: Deriving actionable insights from the complex basketball data required domain expertise and meticulous analysis.

Accomplishments that we're proud of

  1. Mean Absolute Error (MAE) of 9: Our best-performing model showcased a commendable MAE of approximately 9 when predicting on the test set, demonstrating the effectiveness of our predictive analytics approach.
  2. Interactive Dashboard: The development of an interactive dashboard provided a user-friendly interface for exploring individual and team metrics, enhancing the accessibility of our insights.
  3. Comprehensive Data Science Process: From exploratory data analysis to model development and evaluation, we covered the entirety of the data science process, producing comprehensive notebooks and comparison tables.

What we learned

  1. Data Engineering Best Practices: Through ingesting, processing, and outputting refined datasets, we gained insights into best practices for managing and transforming data at scale.
  2. Advanced Data Analysis Techniques: Leveraging SQL and graphical interfaces, we learned to compile datasets, construct dashboards, and interpret complex data effectively.
  3. Regression Modeling and Evaluation: Developing regression models, optimizing hyperparameters, and evaluating model performance provided valuable insights into predictive analytics in sports.

What's next for DunkMetrics

  1. Enhanced Predictive Models: We aim to further refine our regression models by incorporating additional features and exploring more advanced algorithms to improve prediction accuracy.
  2. Real-time Analytics: Exploring the integration of real-time data streams for dynamic analysis and prediction of in-game scenarios.
  3. Collaboration and Deployment: Collaborating with NBA teams or sports analytics firms to deploy our predictive models in real-world settings for decision support and performance optimization.

Built With

Share this project:

Updates