Skip to content

omercohen7640/MusicGenreClassifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MusicGenreClassifier


Technion EE 046211 - Deep Learning

Omer CohenJonathan Nir Shalit

Our Project for the Technion's EE 046211 course "Deep Learning"

Open In Colab Open In NBViewer Open In Binder

Agenda

As our final project in Deep Learning course we have been asked to choose a problem and to solve it using neural network and deep learning techniques. We chose to implement DL algorithm that classifies genre of music track.

MusicGenreClassifier

The algorithm's input is a 30 seconds long music track, and the output is one of the following genres: Blues, Rock, Classic, Reggae, Disco, Country, Hip-Hop, Metal, Jazz and Pop. Throughout our work we experimented several approches to solve this problem both via the data and the architecture.

Dataset

We used the widely used GTZAN dataset. the dataset includes 10 classes of music genres. Each class contains 100 tracks of 30 seconds. Therefore, we faced a low-amount-of-data problem.

Data augmentation

To enlarge our dataset, we used data augmentations. To execute those augmentation easily, we used Librosa package. We have used the following augmentations:

usage:

import audio_augmentation
audio_augmentation.main_reduced()

1D-Classifier

Our first trial to improve model's performance was to work with the raw data and to use 1D convnet. We tried 2 architetures that yileded same performances:

first:

second:

We tested our model on 10-classes dataset and we got poor 10% accuracy performance (random prediction). We tried to use chopped sub-tracks of different lengths and still the performance didn’t improve. At this point we concluded:

  1. Working with the raw music signal (without any pre-processing) is more difficult and requires more sophisticated architectures.
  2. Working with 2D data allows us to use known computer-vision architectures and techniques.

2D-Classifier

Feature extraction

Working with 2D input means transforming the data into time-frequency space of mel-spectrogram. We used Librosa tools to transform the data. Here is an ilustration for the transform: usage:

import feature_extraction
feature_extraction.main()

We used resnet18 architecture with dropout. To tune our hyper-parameters we used Optuna. That model achieved 62.4% accuracy on the test-set.

Ensemble

We tried to boost our performnaces by using ensemble of classifiers. In this method we chop each track to sub-tracks and predict label for each sub-track independently. we tried both 'soft' and 'hard' ensembles. 'soft' means summing up the output vectors and then taking the arg-max as the final prediction, 'hard' means create histogram from all the mini-predictions and take the label that got the majority of the mini-predictions as our final prediction.

that method using soft ensemble yielded the following performances:

8 classes:

10 classes:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages