PyTorch implementation of music genre classification using MambaVision architecture
Video: YouTube
The idea of our approach is to combine the sequential and time-dependent nature of music data with the MambaVision architecture for enhanced music genre classification. We leverage spectrograms as input, which are then processed by the MambaVision model, a lightweight transformer-like architecture tailored for feature extraction and patching. By doing so, we achieve superior results compared to traditional transformers and CNNs, even those pre-trained on different data types. The MambaVision model's ability to effectively handle the unique characteristics of musical spectrograms, coupled with its efficient feature extraction and patching capabilities, leads to significant improvements in classification performance. For detailed insights and theoretical underpinnings, please refer to our complete work.
The GTZAN dataset was used. The data set consists of 1000 songs in length of 30[sec] divided to 10 classes
| Library | Version |
|---|---|
Python |
3.5.5 |
torch |
2.1.1 |
kornia |
0.7.3 |
matplotlib |
3.7.2 |
transformers |
4.42.3 |
numpy |
1.23.5 |
h5py |
3.10.0 |
librosa |
0.10.2 |
pandas |
2.1.1 |
seaborn |
0.13.0 |
| File name | Purpsoe |
|---|---|
data_analysis.ipynb |
analysing the Model's results |
genre_predictor.py |
main script for spesific song prediction |
models.py |
contains all the models |
Paras.py |
initialize parameters for the project |
train_models.ipynb |
notebook for training the different models |
train.py |
helper script for training the different models |
Build Dataset.ipynb |
notebook for step by step data prepearing |
data_loader.py |
data loading script |
music_dealer.py |
your own data loading script |
util.py |
utils for data use |
- Clone the repo:
git clone https://github.com/ovedtal1/MambaVision-Genre-Classification.git- Download the free GTZAN dataset form Kaggle: GTZAN
- Place the data in the main folder of the repo
- Run the 'Build Dataset.ipynb' step by step for dataset creation
- Follow the 'train_models.ipynb' for training the different models (MambaVision based, Transformer based & CNN)
- Analyze your trained models with the 'data_analysis.ipynb' script
- Run the 'genre_predictor.ipynb' scripy with you own music and classify it!
- Compare the MambaVision with more architectures
- Search for more custom augmentations
- Test the MambaVision architecture on different tasks
- Ali Hatamizadeh, Jan Kautz MambaVision: A Hybrid Mamba-Transformer Vision Backbone
- Lianghui Zhu et al. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
- Tri Dao, Albert Gu Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
- Mathilde Caron et al. Emerging Properties in Self-Supervised Vision Transformers
- Pytorch implementation and pre-trained wieght for MambaVision [NVIDIA Research]
- Self-Supervised Vision Transformers with DINO - pytorch [Facebook Research]
- Yuval Hoffman, Roee Hadar Music-Genre-Classification-using-Transformers project
