- Context and Preamble Description
- File Descriptions
- Data Description
- Data Dictionary
- Hackathon Goal
- Guidance for Different Profiles
- Submission Outputs
- Evaluation Criteria
- Quick Links and Reference
You're a novice in an NBA fantasy league competing against seasoned NBA enthusiasts. With limited basketball knowledge, you aim to predict last quarter score margins using data science and statistical analysis. Leveraging historical NBA play-by-play data from 1996 to 2023, you seek to gain a competitive edge in the league.
- Play_by_play_YYYY-YY.parquet: Contains all play-by-play data for the specified season.
- Example_game_data.xlsx: Sample game data for reference.
- Each file contains play-by-play data for the specified season, comprising both play-by-play and game-level data.
- Data_Dictionary_Hackathon.xlsx: Provides detailed information for each step of the hackathon process.
The goal is to predict NBA score margins effectively for the 2022-2023 season using historical data. Participants will gain experience across all aspects of analysis from the perspectives of a data engineer, data analyst, and data scientist.
- Objective: Migrate raw Parquet data files from an S3 bucket to Delta Live Tables within Unity Catalog in Databricks.
- Tasks: Ingest data, perform data cleaning, and aggregate data for downstream analysis.
- Objective: Create SQL queries to build datasets and develop a lakehouse dashboard to derive insights into player stats, team stats, and seasonal trends.
- Objective: Perform exploratory data analysis, create reusable datasets with feature engineering, and employ various regression modeling techniques to predict NBA score margins.
- Data Engineer: Notebook containing Delta Live Table Pipeline for data migration.
- Data Analyst: Lakehouse Dashboard with visualizations of Gold Level Data for Insights.
- Data Scientist: Notebook(s) containing exploratory data analysis, regression modeling, and visualizations of actual vs. predicted results.
Submissions will be evaluated on the following components:
- Data Engineering: Bronze table built, silver table built, gold tables built.
- Data Analyst: Use of SQL queries, descriptive/exploratory dashboard built, creativity, user experience.
- Data Scientist: Lowest MAE (Mean Absolute Error), bonus points for additional features.