A comprehensive music recommendation system implementing multiple approaches from baseline methods to advanced deep learning architectures.
This project implements a complete music recommendation pipeline with four different approaches, each building upon the previous one to create increasingly sophisticated recommendation systems.
Music Recommendation/
├── 0.data_exploration.py # Data analysis and exploration
├── 1.KNN_based.py # KNN-based recommendations
├── 2.RandomForest_based.py # RandomForest ranking model
├── 3.SASRec_sequential.py # SASRec sequential recommendations
├── 4.Two_tower.py # Two-tower architecture with FAISS
├── requirements.txt # Python dependencies
└── README.md
- Python 3.8+
- CUDA-compatible GPU (optional, for GPU acceleration)
- 8GB+ RAM recommended
-
Clone or download the project
-
Install dependencies:
pip install -r requirements.txt
-
Run any stage:
python 0.data_exploration.py python 1.KNN_based.py python 2.RandomForest_based.py python 3.SASRec_sequential.py python 4.Two_tower.py
Purpose: Understand datasets and log insights
Features:
- Dataset statistics and distributions
- Audio feature analysis
- User behavior patterns
- Artist popularity analysis
- Weights & Biases integration
Output:
- Comprehensive data insights
- W&B logged metrics
- Dataset summaries
Purpose: Baseline recommendation methods
Methods:
- Popularity-based: Most popular artists
- Content-based kNN: Audio feature similarity
- Hybrid approach: Combines both methods
Features:
- Cosine similarity for audio features
- StandardScaler for feature normalization
- Comprehensive evaluation metrics
Output:
- Hit Rate and NDCG metrics
- Top popular artists
- Similar song recommendations
Purpose: Learning-to-Rank approach
Features:
- Feature Engineering: User behavior + audio features
- Random Forest: 100 estimators for ranking
- Derived Features: Weight ratios, diversity metrics
- Feature Importance: Identifies key factors
Key Features:
- User activity patterns
- Artist popularity metrics
- Interaction strength analysis
- Relative activity measures
Output:
- RMSE, MAE, Correlation metrics
- Feature importance rankings
- Personalized recommendations
Purpose: Sequential recommendation modeling
Architecture:
- SASRec Model: Self-Attentive Sequential Recommendation
- Transformer: 4 heads, 2 layers, 128 hidden size
- Embeddings: Item + Position embeddings
- Sequence Length: Up to 50 items per sequence
Features:
- GPU acceleration with mixed precision
- Sequence padding and truncation
- Cross-entropy loss with padding ignore
- Comprehensive evaluation (Hit Rate@K, NDCG@K)
Output:
- Sequential recommendations
- Training/validation loss curves
- Ranking metrics at multiple K values
Purpose: Advanced neural collaborative filtering
Architecture:
- User Tower: Neural network for user embeddings
- Item Tower: Neural network for item embeddings
- FAISS Retrieval: Fast similarity search
- Cold-Start Handling: New users/items support
Features:
- GPU Optimization: Mixed precision training
- Feature Engineering: Log-scaled heavy-tailed features
- Comprehensive Evaluation: Precision@K, Recall@K, NDCG@K
- Cold-Start Solutions: Average embeddings fallback
- FAISS Index: IVF-based fast retrieval
Output:
- High-quality embeddings
- Fast retrieval recommendations
- Comprehensive evaluation metrics
- Cold-start recommendations
- CUDA Support: Automatic GPU detection
- Mixed Precision: 16-bit training for speed
- Memory Optimization: Efficient tensor operations
- Batch Processing: Optimized batch sizes
- Precision@K: Accuracy of top-K recommendations
- Recall@K: Coverage of relevant items
- NDCG@K: Ranking quality measure
- Hit Rate@K: Success rate of recommendations
- New Users: Average user embedding fallback
- New Items: Average item embedding fallback
- Feature Engineering: Robust feature creation
- Graceful Degradation: System continues working
Below data recorded for less epochs(~10)
| Stage | Method | Hit Rate@5 | NDCG@5 | Training Time |
|---|---|---|---|---|
| 1 | KNN-based | ~0.15 | ~0.10 | < 1 min |
| 2 | RandomForest | ~0.25 | ~0.18 | ~2 min |
| 3 | SASRec | ~0.78 | ~0.54 | ~5 min |
| 4 | Two-Tower | ~0.85 | ~0.65 | ~8 min |
Performance may vary based on hardware and data
- Core: pandas, numpy, scikit-learn
- Deep Learning: torch, transformers
- Retrieval: faiss-cpu
- Tracking: wandb
- Visualization: matplotlib, seaborn, plotly
# Run data exploration
python 0.data_exploration.py
# Run KNN recommendations
python 1.KNN_based.py
# Run RandomForest ranking
python 2.RandomForest_based.py
# Run SASRec sequential
python 3.SASRec_sequential.py
# Run Two-Tower architecture
python 4.Two_tower.py- Automatic Logging: Metrics, losses, and visualizations
- Experiment Tracking: Compare different runs
- Hyperparameter Tuning: Track parameter effects
- Model Comparison: Side-by-side performance analysis
- Progress Bars: Training progress visualization
- Console Output: Real-time metrics and status
- Error Handling: Graceful failure management
- SASRec: Self-Attentive Sequential Recommendation
- Two-Tower: Neural Collaborative Filtering
- FAISS: Facebook AI Similarity Search
- Last.fm Dataset: HetRec 2011 Challenge
- Spotify API: Audio Features Documentation