Skip to content

Magicboomliu/Viseme-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Viseme-Classification

A pipeline from Dataset Gathering, Data annotations, Model training, Model Evaluation for viseme (visual sound phoneme) classification

How Can I training the data if I have the wav files and Corresponding Viseme tags?

If you want to train it in Python, make sure you have the following site packages:

  • numpy
  • sklearn
  • librosa
  • tensorflow (Optional)
  • keras (Optional)

For this question, you can see the simple of the 'Data Triaining' File.
This is the Tree Structure of the Data Training File

├── DataSet
│   ├── label
│   └── wav_data
├── dnn.py
├── example.py
├── explore_data.py
├── extract_mfcc.py
├── log_dir
├── mfcc14
├── mfcc5
├── mfcc9
├── model
├── Pipelines.py
├── __pycache__
│   └── explore_data.cpython-36.pyc
├── rebalance_samples.py
└── svm.py

In this directory, DataSet contains the wav_data and the label_data labelled by Oculus OVRlipSync (Oculus OVRlipSync reference)

folder files contains
wav_data 7820 wav files
label 7820 label txt files

More about this dataset:

  • There are 7280 wav files and their corresponding visme labels. (Data source: AISHELL dataset).
  • All wav files are 16 bit, with a 16KHz Sample Rate. The channel is Mono (Only 1 channel).
  • All the wav file have been split into frames, the frame length is 16ms,and the frame shift (which means frame's sample step) is 8ms.
  • The labels is 15 dim for 15 Viseme. For each Viseme, You can see here Viseme Reference
  • There is a Python Interface for this datatset for fast via named 'explore_data.py'

Here is the recommended Way to Training Your Data: Run the Pipeline.py

import os

if __name__ == "__main__":
   # Get the data, save in into pickle (in case of the file is to big)
   os.system("python example.py")
   # Extract mfcc (librosa is required), save in into pickle
   os.system("extract_mfcc.py")
   # Rebalance the samples, save it into pickle
   os.system("rebalance_samples.py")
   # Run SVM(Sklearn is required)
   os.system("svm.py")
   # Run DNN (tensorflow and Keras are required)
   os.system("dnn.py") # Optional

The reason I use a pipeline is the dataset is to big for memory, it is better to do operations seperately to save times.

  • STEP ONE: Get Wav Frame Data and labels Data

Here is what 'example.py' works:

Example for using the 'explore_data.py':

import os    
from explore_data import PixelShiftSound
import numpy as np
if __name__ == "__main__":
    '''
    Attention : indice2 is bigger than indice1, both indice1 and indice 2 range from[0,7280] means how many files you want to use in Training
    '''
    
    ps = PixelShiftSound(sample_rate=16000,frame_duration=0.016,frame_shift_duration=0.008,indice1=0,indice2=2000)
    wav_data,wav_label = ps.get_all_wav_data()
    print("Wav Frame data:",wav_data.shape)
    print("Wav Frame label:",wav_label.shape)
    data_dict = {"Data":wav_data,"Label":wav_label}
    with open("data_pickle",'wb') as f1:
        pickle.dump(data_dict,f1)
  • STEP TWO : Extract MFCC Features

run the extract_mfcc.py
change the pickle file name the same with the pickle file name that generated by 'example.py' in line 109 :


data_pickle_files=["data_pickle"]  # Please filled in the Pickle files, generated from example.py  

This code generate a mfcc pickle file in line 189 :

    # Save into a pickle file
    dct={"mfcc":trainig_data,"labels":useful_label_data_total}
    with open("mfcc14",'wb') as f1:
         pickle.dump(dct,f1)
  • STEP THREE : Balance the sample data:

The data which we get in following instructions have inbalance samples among different categories. So we do downsampling and reduce all category counts into the smallest one.
Run rebalance_samples.py

This code generate a rebalance pickle file in line 86:

    # You Can Save it into pickle , if you like
    smoteen_dict = {"mfcc":trainig_data_balance,"labels":trainig_labels_balance}
    with open("mfcc5",'wb') as f1:
        pickle.dump(smoteen_dict,f1)
  • Final Step : training with SVM or DNN.

In this example, we use SVM with RBF kernel function and C=8. And the a simple DNN by Keras. Training Set : Testing Set = 8 : 2.
To run svm.py :

You need to change the Category numbers : 9 or 5
5 for :["aa","E","ih","oh","ou"]
9 for : ["CH","SS","nn","RR","aa","E","ih","oh","ou"]

You can change them in line 42:

    pickle_file='mfcc5'
    category_required =5

To run dnn.py , The operations are the same:

You need to change the Category numbers : 9 or 5
5 for :["aa","E","ih","oh","ou"]
9 for : ["CH","SS","nn","RR","aa","E","ih","oh","ou"]

You can change them in line 42:

    pickle_file='mfcc5'
    category_required =5

About

A pipeline from Dataset Gathering,Data annotations, Model training,Model Evaluation for viseme (visual sound phoneme) classification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages