MusicGenreClassifier

Technion EE 046211 - Deep Learning

Our Project for the Technion's EE 046211 course "Deep Learning"

Animation by GumGum.
Reference work by Dohppak.

Agenda
MusicGenreClassifier
Dataset
Data augmentation
1D-Classifier
2D-Classifier
- Feature extraction
- Ensaemble

Agenda

As our final project in Deep Learning course we have been asked to choose a problem and to solve it using neural network and deep learning techniques. We chose to implement DL algorithm that classifies genre of music track.

MusicGenreClassifier

The algorithm's input is a 30 seconds long music track, and the output is one of the following genres: Blues, Rock, Classic, Reggae, Disco, Country, Hip-Hop, Metal, Jazz and Pop. Throughout our work we experimented several approches to solve this problem both via the data and the architecture.

Dataset

We used the widely used GTZAN dataset. the dataset includes 10 classes of music genres. Each class contains 100 tracks of 30 seconds. Therefore, we faced a low-amount-of-data problem.

Data augmentation

To enlarge our dataset, we used data augmentations. To execute those augmentation easily, we used Librosa package. We have used the following augmentations:

usage:

import audio_augmentation
audio_augmentation.main_reduced()

1D-Classifier

Our first trial to improve model's performance was to work with the raw data and to use 1D convnet. We tried 2 architetures that yileded same performances:

first:

second:

We tested our model on 10-classes dataset and we got poor 10% accuracy performance (random prediction). We tried to use chopped sub-tracks of different lengths and still the performance didn’t improve. At this point we concluded:

Working with the raw music signal (without any pre-processing) is more difficult and requires more sophisticated architectures.
Working with 2D data allows us to use known computer-vision architectures and techniques.

2D-Classifier

Feature extraction

Working with 2D input means transforming the data into time-frequency space of mel-spectrogram. We used Librosa tools to transform the data. Here is an ilustration for the transform: usage:

import feature_extraction
feature_extraction.main()

We used resnet18 architecture with dropout. To tune our hyper-parameters we used Optuna. That model achieved 62.4% accuracy on the test-set.

Ensemble

We tried to boost our performnaces by using ensemble of classifiers. In this method we chop each track to sub-tracks and predict label for each sub-track independently. we tried both 'soft' and 'hard' ensembles. 'soft' means summing up the output vectors and then taking the arg-max as the final prediction, 'hard' means create histogram from all the mini-predictions and take the label that got the majority of the mini-predictions as our final prediction.

that method using soft ensemble yielded the following performances:

8 classes:

10 classes:

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.idea		.idea
best_1d_ver2		best_1d_ver2
best_model10		best_model10
best_model8		best_model8
datasets		datasets
img		img
model_1D		model_1D
model_2D		model_2D
.gitignore		.gitignore
README.md		README.md
set_lists.py		set_lists.py
warmup.py		warmup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MusicGenreClassifier

Technion EE 046211 - Deep Learning

Agenda

MusicGenreClassifier

Dataset

Data augmentation

1D-Classifier

2D-Classifier

Feature extraction

Ensemble

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MusicGenreClassifier

Technion EE 046211 - Deep Learning

Agenda

MusicGenreClassifier

Dataset

Data augmentation

1D-Classifier

2D-Classifier

Feature extraction

Ensemble

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages