Skip to content

RanBS/audio-digits-generator

Repository files navigation

audio-digits-generator

Generating audio files of spoken digits, using a conditional generative architectures (cVAE, cGAN) and evaluating the results with Inception Score.

GAN part is based on this repository.

spectograms

Background

The purpose of the project is to create a generative model, that can generate audio files. More specifically, we chose to focus on speech audio files of digits. Training a model to generate audio files based on their time series can be challenging, therefore we decided to use the STFT representation of the audio signal. Generally, the STFT of a signal is complex and for that reason we will represent it as a 2 channel image, where the first one is the amplitude and the second one is the phase. We examined two main generative architectures, conditional-VAE and conditional-wGAN-gp. For each architecture, we experimented with the different methods described below:

  • VAE:
    • Generating a spectogram amplitude image only, conditioned on the label of a digit.
  • GAN:
    • Experiment 1 - Generating a spectogram amplitude image only, conditioned on the label of a digit.
    • Experiment 2 - Generating a 2 channel image, of the spectogram's amplitude and phase, conditioned on the label of a digit.
    • Experiment 3 - Generating a spectogram amplitude image only, conditioned on both the label of a digit and a phase image compitable with the lable.
    • Experiment 4 - Same idea as Experiment 3, but with a regularization (explained in the PDF).

In addition to the generative models, we trained a digit-classifier based on the spectogram amplitude, for performance mesures perposes.

Main Prerequisites

Library Version
numpy 1.19.5
torch 1.6.0
librosa 0.8.0
tqdm 4.53.0
colorama 0.4.4

Files in the repository

File name Purpsoe
inception_scores_metrics.py Evaluating a generative model performance with Frechet Inception Score and 'Diversity Score'
pre_processing.py Converting .wav files to .npy for fitting the networks
weights.txt Link to download weights
metrics results for the exps.txt The performance of our trained generative models
dataset directory The datasets required for training and evaluating the experiments (partial only)

For each experiment, the following files are provided:

File name Purpsoe
dataset.py Dataset class, that fits the pytorch conventions
eval.py Loads a trained model and generates .wav files
models.py The model (Generator & Discriminator / VAE)
train.py Train the model
Samples Directory Examples for samples generated by the trained model
final_results_example.png 10 spectogram amplitudes generated from each label (each column is a specific label)
train_log.txt The training progress

The datasets (used for training the models, generating samples for conditioned GAN exp 3 and exp 4 and evaluting the results) should be seperated as in the provided dataset directory (which is partial only) - see explantion in How-to-use.

How-to-use

For running the models, one need to download the weights for the model from the link specified in weights.txt. The weights has to placed inside the relevant exp directory, with the same name as in the link.

The datasets can be found at this github, and need to be pre-processed (a.k.a converted to .npy) using pre_processing.py. Then, it should be placed inside the right sub-folders at the dataset directory:

Sub-Folder Purpsoe
test_spectograms .npy arrays with 2-channels (amplitude & phase) of the test set
train_spectograms .npy arrays with 2-channels (amplitude & phase) of the train set
test_spectograms_amplitude .npy arrays with 1-channel (amplitude only) of the test set
train_spectograms_amplitude .npy arrays with 1-channel (amplitude only) of the train set
data_for_metrics Contains a sub-folder for each exp and one for the real data. Each of these are seperated to folders by label

References

About

Generating audio files of spoken digits, using a conditional generative architectures (cVAE, cGAN) and evaluating the results with Inception Score.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages