A pipeline from Dataset Gathering, Data annotations, Model training, Model Evaluation for viseme (visual sound phoneme) classification
If you want to train it in Python, make sure you have the following site packages:
- numpy
- sklearn
- librosa
- tensorflow (Optional)
- keras (Optional)
For this question, you can see the simple of the 'Data Triaining' File.
This is the Tree Structure of the Data Training File
├── DataSet
│ ├── label
│ └── wav_data
├── dnn.py
├── example.py
├── explore_data.py
├── extract_mfcc.py
├── log_dir
├── mfcc14
├── mfcc5
├── mfcc9
├── model
├── Pipelines.py
├── __pycache__
│ └── explore_data.cpython-36.pyc
├── rebalance_samples.py
└── svm.py
In this directory, DataSet contains the wav_data and the label_data labelled by Oculus OVRlipSync (Oculus OVRlipSync reference)
| folder | files contains |
|---|---|
| wav_data | 7820 wav files |
| label | 7820 label txt files |
- There are 7280 wav files and their corresponding visme labels. (Data source: AISHELL dataset).
- All wav files are 16 bit, with a 16KHz Sample Rate. The channel is Mono (Only 1 channel).
- All the wav file have been split into frames, the frame length is 16ms,and the frame shift (which means frame's sample step) is 8ms.
- The labels is 15 dim for 15 Viseme. For each Viseme, You can see here Viseme Reference
- There is a Python Interface for this datatset for fast via named
'explore_data.py'。
import os
if __name__ == "__main__":
# Get the data, save in into pickle (in case of the file is to big)
os.system("python example.py")
# Extract mfcc (librosa is required), save in into pickle
os.system("extract_mfcc.py")
# Rebalance the samples, save it into pickle
os.system("rebalance_samples.py")
# Run SVM(Sklearn is required)
os.system("svm.py")
# Run DNN (tensorflow and Keras are required)
os.system("dnn.py") # Optional
The reason I use a pipeline is the dataset is to big for memory, it is better to do operations seperately to save times.
Here is what 'example.py' works:
Example for using the 'explore_data.py':
import os
from explore_data import PixelShiftSound
import numpy as np
if __name__ == "__main__":
'''
Attention : indice2 is bigger than indice1, both indice1 and indice 2 range from[0,7280] means how many files you want to use in Training
'''
ps = PixelShiftSound(sample_rate=16000,frame_duration=0.016,frame_shift_duration=0.008,indice1=0,indice2=2000)
wav_data,wav_label = ps.get_all_wav_data()
print("Wav Frame data:",wav_data.shape)
print("Wav Frame label:",wav_label.shape)
data_dict = {"Data":wav_data,"Label":wav_label}
with open("data_pickle",'wb') as f1:
pickle.dump(data_dict,f1)
run the extract_mfcc.py
change the pickle file name the same with the pickle file name that generated by 'example.py' in line 109 :
data_pickle_files=["data_pickle"] # Please filled in the Pickle files, generated from example.py
This code generate a mfcc pickle file in line 189 :
# Save into a pickle file
dct={"mfcc":trainig_data,"labels":useful_label_data_total}
with open("mfcc14",'wb') as f1:
pickle.dump(dct,f1)
The data which we get in following instructions have inbalance samples among different categories. So we do downsampling and reduce all category counts into the smallest one.
Run rebalance_samples.py
This code generate a rebalance pickle file in line 86:
# You Can Save it into pickle , if you like
smoteen_dict = {"mfcc":trainig_data_balance,"labels":trainig_labels_balance}
with open("mfcc5",'wb') as f1:
pickle.dump(smoteen_dict,f1)
In this example, we use SVM with RBF kernel function and C=8. And the a simple DNN by Keras. Training Set : Testing Set = 8 : 2.
To run svm.py :
You need to change the Category numbers : 9 or 5
5 for :["aa","E","ih","oh","ou"]
9 for : ["CH","SS","nn","RR","aa","E","ih","oh","ou"]
You can change them in line 42:
pickle_file='mfcc5'
category_required =5
To run dnn.py , The operations are the same:
You need to change the Category numbers : 9 or 5
5 for :["aa","E","ih","oh","ou"]
9 for : ["CH","SS","nn","RR","aa","E","ih","oh","ou"]
You can change them in line 42:
pickle_file='mfcc5'
category_required =5
