Skip to content

Audio-Foundation-Models/ConversationTTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ConversationTTS: A Speech Foundation Model for Multilingual Conversational Text-to-Speech

Introduction

We release the training and inference code for ConversationTTS. Also, we release the first checkpoint, which trained on 1.5 epoch on about 20w hours speech data.

V1: 1B-20w-1.5epoch

wget https://huggingface.co/AudioFoundation/SpeechFoundation/resolve/main/ckpt1.checkpoint

Data

We use large-scale TTS data, such as Emili-Yodas, wenetspeech, MLS, People speech. We collect a lot of podcast dataset, including English, Chinese, Cantonese. We use different speaker label (e.g. [1], [2]) to indicates different speaker. The first version is only trained on 20w hours data. We will update the checkpoints trained on more then 50w hours data.

Usage

⚡ Quick Start

🛠️ Local Deployment

Install and Run CapSpeech locally.

Development

Please refer to the following documents to prepare the data, train the model, and evaluate its performance.

Main Contributors

  • [Dongchao Yang]
  • [Dading Cong]
  • [Jiankun Zhao]
  • [Yuanyuan Wang]
  • [Helin Wang]

Citation

If you find this work useful, please consider contributing to this repo and cite this work:

License

All datasets, listening samples, source code, pretrained checkpoints, and the evaluation toolkit are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
See the LICENSE file for details.

Acknowledgements

This implementation is based on UniAudio, CSM, Moshi, RSTNet. We appreciate their awesome work.

🌟 Like This Project?

If you find this repo helpful or interesting, consider dropping a ⭐ — it really helps and means a lot!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published