Skip to content

ItaiAllouche/MusicGenreClassification

Repository files navigation

MusicGenreClassification


Technion ECE 046211 - Deep Learning

Itai AlloucheAdam katav

Background

For our final project in the Technion DL course (046211), we chose to classify music genres over the GTZAN dataset.
The approach to the problem is using a pre-trained Wav2Vec2 transformer model.
As appose to most existing models, a transformer will use the raw time series data which is the reason we predicted an improvement over existing methods

The Model

We used the model facebook/wav2vec2-large-100k-voxpopuli from huggingface, Facebooks Wav2Vec2 model pre-trained on 100k unlabeled subset of speech data.

Dataset

We used the femiliar GTZAN dataset.
The dataset consists of 1000 audio tracks each 30 seconds long.
It contains 10 genres, each represented by 100 tracks:
The genres are: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock
The tracks are all 22050Hz Mono 16-bit audio files in .wav format.

Agenda

File Purpsoe
img Contains images for README.md file
train_30s_model.py train the model on 30s tracks
train_15s_model.py train the model on 15s tracks
train_10s_model.py train the model on 10s tracks
eval_model.py evaluate the model
rolling_stones.wav example audio file

Results

30s model

The model was trained on 30s tracks.
performance:
87% accuracy on validation set


77% accuracy on test set

15s model


The model was trained on 15s long tracks. Each 30s track was divided into 2 sub-tracks 15s long
performance:
78.85% accuracy on validation set


75.5% accuracy on test set

10s model

The model was trained on 10s tracks. Each 30s track was divided into 3 sub-tracks 10s long
performance:


78% accuracy on validation set


74.5% accuracy on test set

Docker

The project is intended to run in huggingface docker image
For instructions on how to install docker:
https://docs.docker.com/engine/install/

Training

Train 30s model

Replace train_30s_model.py with your chosen model

docker run --name gtzan --rm -it --ipc=host --gpus=all -v $PWD:/home huggingface/transformers-pytorch-gpu python3 /home/train_30s_model.py

This command spins up a docker container from the official huggingface image, mounts the repo directory and run the training script

Running

Run the model - from huggingface 🤗

Open the Model in hugging face.

Note that hugging face server supports tracks up to 2-3 minutes

Run the model - using python

On GPU:

docker run --name gtzan --rm -it --ipc=host --gpus=all -v $PWD:/home huggingface/transformers-pytorch-gpu

On CPU:

docker run --name gtzan --rm -it -v $PWD:/home huggingface/transformers-pytorch-gpu

In the container either use a python script file or via the interactive interpreter:

from transformers import pipeline
import torchaudio
import sys
MODEL_NAME = 'adamkatav/wav2vec2_100k_gtzan_30s_model'
SONG_IN_REPO_DIR_PATH = '/home/rolling_stones.wav'

pipe = pipeline(model=MODEL_NAME)
audio_array,sample_freq = torchaudio.load(SONG_IN_REPO_DIR_PATH)
resample = torchaudio.transforms.Resample(orig_freq=sample_freq)
audio_array = audio_array.mean(axis=0).squeeze().numpy()
output = pipe(audio_array)
print(output)

About

Music genre classification project in DL course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages