GitHub - Audio-Foundation-Models/ConversationTTS

ConversationTTS: A Speech Foundation Model for Multilingual Conversational Text-to-Speech

Introduction

We release the training and inference code for ConversationTTS. Also, we release the first checkpoint, which trained on 1.5 epoch on about 20w hours speech data.

V1: 1B-20w-1.5epoch

wget https://huggingface.co/AudioFoundation/SpeechFoundation/resolve/main/ckpt1.checkpoint

Data

We use large-scale TTS data, such as Emili-Yodas, wenetspeech, MLS, People speech. We collect a lot of podcast dataset, including English, Chinese, Cantonese. We use different speaker label (e.g. [1], [2]) to indicates different speaker. The first version is only trained on 20w hours data. We will update the checkpoints trained on more then 50w hours data.

Usage

⚡ Quick Start

🛠️ Local Deployment

Install and Run CapSpeech locally.

💿 Installation & Usage: 📄 Instrucitons

Development

Please refer to the following documents to prepare the data, train the model, and evaluate its performance.

Main Contributors

[Dongchao Yang]
[Dading Cong]
[Jiankun Zhao]
[Yuanyuan Wang]
[Helin Wang]

Citation

If you find this work useful, please consider contributing to this repo and cite this work:

License

All datasets, listening samples, source code, pretrained checkpoints, and the evaluation toolkit are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
See the LICENSE file for details.

Acknowledgements

This implementation is based on UniAudio, CSM, Moshi, RSTNet. We appreciate their awesome work.

🌟 Like This Project?

If you find this repo helpful or interesting, consider dropping a ⭐ — it really helps and means a lot!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
docs		docs
egs/pretraining		egs/pretraining
inference		inference
llama3_2		llama3_2
models		models
tools		tools
trainer		trainer
utils		utils
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ConversationTTS: A Speech Foundation Model for Multilingual Conversational Text-to-Speech

Introduction

V1: 1B-20w-1.5epoch

Data

Usage

⚡ Quick Start

🛠️ Local Deployment

Development

Main Contributors

Citation

License

Acknowledgements

🌟 Like This Project?

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Audio-Foundation-Models/ConversationTTS

Folders and files

Latest commit

History

Repository files navigation

ConversationTTS: A Speech Foundation Model for Multilingual Conversational Text-to-Speech

Introduction

V1: 1B-20w-1.5epoch

Data

Usage

⚡ Quick Start

🛠️ Local Deployment

Development

Main Contributors

Citation

License

Acknowledgements

🌟 Like This Project?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages