Interspeech 2024
TL;DR: We show that better detection of deepfake speech from codec-based TTS systems can be achieved by training models on speech re-synthesized with neural audio codecs. We also release the CodecFake dataset for this purpose.
See [here]. If using ZIP files is preferred, please use this commit (3abd4aa).
requirements.txt must be installed for execution. We state our experiment environment for those who prefer to simulate as similar as possible.
pip install -r requirements.txt
- Our environment (for GPU training)
- Python 3.8.18
- GCC 11.2.0
- GPU: 1 NVIDIA Tesla V100 32GB
- gpu-driver: 470.161.03
- Training
About 32GB GPU RAM is required to train AASIST using a batch size of 32.
Available codecs are listed in the script.
bash train.sh <codec_name>- Evaluation
First, add paths to trained checkpoints to the script. Then adjust subsets to evaluate on.
bash eval_all.shThis repository is built on top of several open source projects.