Inspiration
Our inspiration for the DunkMetrics project stemmed from the desire to leverage historical play-by-play data to predict score margins for the 2022-2023 NBA season. We aimed to explore the potential of data analytics in enhancing performance predictions in sports, particularly basketball.
What it does
Our project utilizes a combination of data engineering, data analysis, and data science techniques to achieve its objectives:
- Data Engineering: Ingests raw data from S3, processes it using Databricks Delta, and outputs refined datasets for downstream analytics.
- Data Analysis: Procures and examines datasets using SQL, constructs interactive dashboards, and derives key insights from the data.
- Data Science: Conducts exploratory data analysis, develops regression models for score margin predictions, evaluates model performance, and generates forecasts.
How we built it
- Data Engineering: We transferred raw data from S3 to Databricks Delta tables, processed it using the Delta Live Table pipeline based on the Medallion Architecture, and delivered cleaned and aggregated datasets.
- Data Analysis: Leveraging SQL within Databricks lakehouse, we compiled datasets, constructed graphical interfaces, and interpreted data to extract crucial insights.
- Data Science: We embarked on exploratory data analysis and feature engineering, developed regression models using scikit-learn, assessed model reliability, and forecasted outcomes.
Challenges we ran into
- Data Processing Complexity: Managing and processing large volumes of historical play-by-play data posed challenges in terms of scalability and efficiency.
- Model Optimization: Tuning regression models for optimal performance required extensive experimentation and fine-tuning of hyperparameters.
- Interpreting Insights: Deriving actionable insights from the complex basketball data required domain expertise and meticulous analysis.
Accomplishments that we're proud of
- Mean Absolute Error (MAE) of 9: Our best-performing model showcased a commendable MAE of approximately 9 when predicting on the test set, demonstrating the effectiveness of our predictive analytics approach.
- Interactive Dashboard: The development of an interactive dashboard provided a user-friendly interface for exploring individual and team metrics, enhancing the accessibility of our insights.
- Comprehensive Data Science Process: From exploratory data analysis to model development and evaluation, we covered the entirety of the data science process, producing comprehensive notebooks and comparison tables.
What we learned
- Data Engineering Best Practices: Through ingesting, processing, and outputting refined datasets, we gained insights into best practices for managing and transforming data at scale.
- Advanced Data Analysis Techniques: Leveraging SQL and graphical interfaces, we learned to compile datasets, construct dashboards, and interpret complex data effectively.
- Regression Modeling and Evaluation: Developing regression models, optimizing hyperparameters, and evaluating model performance provided valuable insights into predictive analytics in sports.
What's next for DunkMetrics
- Enhanced Predictive Models: We aim to further refine our regression models by incorporating additional features and exploring more advanced algorithms to improve prediction accuracy.
- Real-time Analytics: Exploring the integration of real-time data streams for dynamic analysis and prediction of in-game scenarios.
- Collaboration and Deployment: Collaborating with NBA teams or sports analytics firms to deploy our predictive models in real-world settings for decision support and performance optimization.
Log in or sign up for Devpost to join the conversation.