Viseme-Classification

A pipeline from Dataset Gathering, Data annotations, Model training, Model Evaluation for viseme (visual sound phoneme) classification

How Can I training the data if I have the wav files and Corresponding Viseme tags？

If you want to train it in Python, make sure you have the following site packages:

numpy
sklearn
librosa
tensorflow (Optional)
keras (Optional)

For this question, you can see the simple of the 'Data Triaining' File.
This is the Tree Structure of the Data Training File

├── DataSet
│   ├── label
│   └── wav_data
├── dnn.py
├── example.py
├── explore_data.py
├── extract_mfcc.py
├── log_dir
├── mfcc14
├── mfcc5
├── mfcc9
├── model
├── Pipelines.py
├── __pycache__
│   └── explore_data.cpython-36.pyc
├── rebalance_samples.py
└── svm.py

In this directory, DataSet contains the wav_data and the label_data labelled by Oculus OVRlipSync (Oculus OVRlipSync reference)

folder	files contains
wav_data	7820 wav files
label	7820 label txt files

More about this dataset:

There are 7280 wav files and their corresponding visme labels. (Data source: AISHELL dataset).
All wav files are 16 bit, with a 16KHz Sample Rate. The channel is Mono (Only 1 channel).
All the wav file have been split into frames, the frame length is 16ms，and the frame shift (which means frame's sample step) is 8ms.
The labels is 15 dim for 15 Viseme. For each Viseme, You can see here Viseme Reference
There is a Python Interface for this datatset for fast via named 'explore_data.py' 。

Here is the recommended Way to Training Your Data: Run the Pipeline.py

import os

if __name__ == "__main__":
   # Get the data, save in into pickle (in case of the file is to big)
   os.system("python example.py")
   # Extract mfcc (librosa is required), save in into pickle
   os.system("extract_mfcc.py")
   # Rebalance the samples, save it into pickle
   os.system("rebalance_samples.py")
   # Run SVM(Sklearn is required)
   os.system("svm.py")
   # Run DNN (tensorflow and Keras are required)
   os.system("dnn.py") # Optional

The reason I use a pipeline is the dataset is to big for memory, it is better to do operations seperately to save times.

STEP ONE: Get Wav Frame Data and labels Data

Here is what 'example.py' works:

Example for using the 'explore_data.py':

import os    
from explore_data import PixelShiftSound
import numpy as np
if __name__ == "__main__":
    '''
    Attention : indice2 is bigger than indice1, both indice1 and indice 2 range from[0,7280] means how many files you want to use in Training
    '''
    
    ps = PixelShiftSound(sample_rate=16000,frame_duration=0.016,frame_shift_duration=0.008,indice1=0,indice2=2000)
    wav_data,wav_label = ps.get_all_wav_data()
    print("Wav Frame data：",wav_data.shape)
    print("Wav Frame label：",wav_label.shape)
    data_dict = {"Data":wav_data,"Label":wav_label}
    with open("data_pickle",'wb') as f1:
        pickle.dump(data_dict,f1)

STEP TWO : Extract MFCC Features

run the extract_mfcc.py
change the pickle file name the same with the pickle file name that generated by 'example.py' in line 109 :


data_pickle_files=["data_pickle"]  # Please filled in the Pickle files, generated from example.py

This code generate a mfcc pickle file in line 189 :

    # Save into a pickle file
    dct={"mfcc":trainig_data,"labels":useful_label_data_total}
    with open("mfcc14",'wb') as f1:
         pickle.dump(dct,f1)

STEP THREE : Balance the sample data:

The data which we get in following instructions have inbalance samples among different categories. So we do downsampling and reduce all category counts into the smallest one.
Run rebalance_samples.py

This code generate a rebalance pickle file in line 86:

    # You Can Save it into pickle , if you like
    smoteen_dict = {"mfcc":trainig_data_balance,"labels":trainig_labels_balance}
    with open("mfcc5",'wb') as f1:
        pickle.dump(smoteen_dict,f1)

Final Step : training with SVM or DNN.

In this example, we use SVM with RBF kernel function and C=8. And the a simple DNN by Keras. Training Set : Testing Set = 8 : 2.
To run svm.py :

You need to change the Category numbers : 9 or 5
5 for :["aa","E","ih","oh","ou"]
9 for : ["CH","SS","nn","RR","aa","E","ih","oh","ou"]

You can change them in line 42:

    pickle_file='mfcc5'
    category_required =5

To run dnn.py , The operations are the same:

You need to change the Category numbers : 9 or 5
5 for :["aa","E","ih","oh","ou"]
9 for : ["CH","SS","nn","RR","aa","E","ih","oh","ou"]

You can change them in line 42:

    pickle_file='mfcc5'
    category_required =5

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Data Training		Data Training
00000.png		00000.png
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Viseme-Classification

How Can I training the data if I have the wav files and Corresponding Viseme tags？

More about this dataset:

Here is the recommended Way to Training Your Data: Run the Pipeline.py

STEP ONE: Get Wav Frame Data and labels Data

STEP TWO : Extract MFCC Features

STEP THREE : Balance the sample data:

Final Step : training with SVM or DNN.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Viseme-Classification

How Can I training the data if I have the wav files and Corresponding Viseme tags？

More about this dataset:

Here is the recommended Way to Training Your Data: Run the Pipeline.py

STEP ONE: Get Wav Frame Data and labels Data

STEP TWO : Extract MFCC Features

STEP THREE : Balance the sample data:

Final Step : training with SVM or DNN.

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages