The Arc is a small Streamlit app that predicts an NBA player’s Year 5 points per game (PPG) using their Year 2 “sophomore season” stats.
The idea: the jump from Year 1 → Year 2 contains strong signals about a player’s long‑term ceiling. This app turns those “sophomore signals” into a simple Year 5 projection.
- Lets you search for an NBA player and:
- See their Year 2 stats (PPG, RPG, APG, efficiency, minutes, etc.).
- Get Year 5 PPG predictions from two different modeling paths.
- Compare those predictions to the actual Year 5 PPG (when available).
- Groups players into data‑driven archetypes (via clustering) so you can see what “type” of player they are.
- Shows model performance:
- Error distributions for each path.
- Feature importance to see which stats drive the predictions.
- Best and worst individual predictions.
The project runs two competing modeling strategies:
- Uses a small, standard feature set:
- Year 2 box score stats (points, rebounds, assists, minutes, shooting splits).
- Simple growth metrics (deltas from Year 1).
- Basic context (draft position, AST/TOV).
- Trains a regression model with default hyperparameters.
- Goal: fast, interpretable, “good enough” baseline.
- Uses all Path 1 features plus engineered features, such as:
- Skill Diversity Index (improvement across multiple categories).
- Usage‑to‑Efficiency ratio.
- Draft overperformance.
- Minutes trajectory.
- Free‑throw improvement.
- Adds hyperparameter tuning to squeeze out extra accuracy.
- Goal: maximum accuracy and richer basketball intuition.
-
Home
High‑level project overview and quick comparison of Path 1 vs Path 2 performance. -
Scouting Report
- Select a player and see:
- Year 2 stats.
- Dual Path 1 vs Path 2 Year 5 predictions.
- Actual Year 5 PPG and errors (when data exists).
- Career trajectory chart and archetype radar.
- Select a player and see:
-
The DNA Explorer
- View the 5 data‑driven archetypes.
- See average stats per archetype.
- Explore a 2D map of players colored by archetype.
-
Model Analysis
- Detailed methodology for both paths.
- Side‑by‑side metrics (MAE, R², training time, overfitting checks).
- Feature importance charts.
- Error histograms and best/worst predictions.
- Python
- Streamlit for the web app UI
- pandas / NumPy for data handling
- scikit‑learn for modeling + clustering
- Plotly for visualizations