audio-digits-generator

Generating audio files of spoken digits, using a conditional generative architectures (cVAE, cGAN) and evaluating the results with Inception Score.

GAN part is based on this repository.

audio-digits-generator

Background

The purpose of the project is to create a generative model, that can generate audio files. More specifically, we chose to focus on speech audio files of digits. Training a model to generate audio files based on their time series can be challenging, therefore we decided to use the STFT representation of the audio signal. Generally, the STFT of a signal is complex and for that reason we will represent it as a 2 channel image, where the first one is the amplitude and the second one is the phase. We examined two main generative architectures, conditional-VAE and conditional-wGAN-gp. For each architecture, we experimented with the different methods described below:

VAE:
- Generating a spectogram amplitude image only, conditioned on the label of a digit.
GAN:
- Experiment 1 - Generating a spectogram amplitude image only, conditioned on the label of a digit.
- Experiment 2 - Generating a 2 channel image, of the spectogram's amplitude and phase, conditioned on the label of a digit.
- Experiment 3 - Generating a spectogram amplitude image only, conditioned on both the label of a digit and a phase image compitable with the lable.
- Experiment 4 - Same idea as Experiment 3, but with a regularization (explained in the PDF).

In addition to the generative models, we trained a digit-classifier based on the spectogram amplitude, for performance mesures perposes.

Main Prerequisites

Library	Version
`numpy`	`1.19.5`
`torch`	`1.6.0`
`librosa`	`0.8.0`
`tqdm`	`4.53.0`
`colorama`	`0.4.4`

Files in the repository

File name	Purpsoe
`inception_scores_metrics.py`	Evaluating a generative model performance with Frechet Inception Score and 'Diversity Score'
`pre_processing.py`	Converting .wav files to .npy for fitting the networks
`weights.txt`	Link to download weights
`metrics results for the exps.txt`	The performance of our trained generative models
`dataset directory`	The datasets required for training and evaluating the experiments (partial only)

For each experiment, the following files are provided:

File name	Purpsoe
`dataset.py`	Dataset class, that fits the pytorch conventions
`eval.py`	Loads a trained model and generates .wav files
`models.py`	The model (Generator & Discriminator / VAE)
`train.py`	Train the model
`Samples Directory`	Examples for samples generated by the trained model
`final_results_example.png`	10 spectogram amplitudes generated from each label (each column is a specific label)
`train_log.txt`	The training progress

The datasets (used for training the models, generating samples for conditioned GAN exp 3 and exp 4 and evaluting the results) should be seperated as in the provided dataset directory (which is partial only) - see explantion in How-to-use.

How-to-use

For running the models, one need to download the weights for the model from the link specified in weights.txt. The weights has to placed inside the relevant exp directory, with the same name as in the link.

The datasets can be found at this github, and need to be pre-processed (a.k.a converted to .npy) using pre_processing.py. Then, it should be placed inside the right sub-folders at the dataset directory:

Sub-Folder	Purpsoe
`test_spectograms`	.npy arrays with 2-channels (amplitude & phase) of the test set
`train_spectograms`	.npy arrays with 2-channels (amplitude & phase) of the train set
`test_spectograms_amplitude`	.npy arrays with 1-channel (amplitude only) of the test set
`train_spectograms_amplitude`	.npy arrays with 1-channel (amplitude only) of the train set
`data_for_metrics`	Contains a sub-folder for each exp and one for the real data. Each of these are seperated to folders by label

References

Pytorch Conditional WGAN with Gradient Penalty by Guillem Cucurull
Avi Khemani Spoken Digit Recognition (Speech Recognition), 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

audio-digits-generator

Background

Main Prerequisites

Files in the repository

How-to-use

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
GAN		GAN
VAE		VAE
assets		assets
classifier		classifier
dataset		dataset
README.md		README.md
inception_scores_metrics.py		inception_scores_metrics.py
metrics results for the exps.txt		metrics results for the exps.txt
pre_processing.py		pre_processing.py
temp.py		temp.py
weights.txt		weights.txt

Folders and files

Latest commit

History

Repository files navigation

audio-digits-generator

Background

Main Prerequisites

Files in the repository

How-to-use

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages