[NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching

Our audio demo page is available at https://luotianze666.github.io/WaveFM.

Our checkpoints are available in this repository.

Basic Usage

1. Get Mel-Spectrograms

You can extract Mel-spectrograms for both the training and inference processes using the following command.

python dataset.py -i your_audio_path -o your_mel_saving_path

2. Train WaveFM

First modify the paths for training in src/params.py as needed, and then use python src/train.py to train WaveFM.

If trainCheckPointPath exists, the training process will resume from the corresponding checkpoints. Otherwise, new checkpoints will be saved to this path. During the training process, checkpoints are saved every trainCheckPointSavingStep steps.

Note: A coefficient $\frac{1}{\min(0.1,1-t)}$ is applied to our mean square loss function, where $t$ is uniformly sampled from $[0,1]$. Therefore, $7.5 \times 10^{-5}$ is not an actually small initial learning rate. Increasing the learning rate (e.g., to $1 \times 10^{-4}$) might improve performance but might also introduce instability during the training process.

# parameters for training
trainInitialLR = 7.5e-5,
trainFinalLR = 5e-6,
trainBetas = (0.9, 0.99),
trainWeightDecay = 5e-4,
trainSteps = 1000000,
trainBatch = 24,
trainCheckPointSavingStep = 10000,
trainAudiosPath = "Your Audios",
trainMelsPath = "Your Mels",
trainCheckPointPath = "./checkpoints/WaveFM_0",
trainGPUs = [0], # [] represents CPU

3. Use Checkpoints For Inference

First modify the paths for inference in src/params.py as needed, and then use python src/inference.py to run inference for WaveFM.

Note: The inference time displayed on the tqdm bar doesn't include the file loading time. Additionally, the inference speed can be much faster for longer audios due to a higher degree of parallelism.

# parameters for inference
inferenceSteps = 6,
inferenceMelsPath = "Your Mels For Inference",
inferenceSavingPath = "./generation",
inferenceCheckPointPath = "./checkpoints/WaveFM_1000000", 
                        # "./distilled_checkpoints/Distilled_WaveFM_25000",
inferenceWithGPU = True, # use cuda:0 or CPU

4. Distill WaveFM

First modify the paths for distillation in src/params.py as needed, and then use python src/distillation.py to distill WaveFM.

distillDelta is the time step length used in the distillation process, and distillModelPath specifies the saving path of the teacher model.

If distillCheckPointPath exists, the distillation process will resume from the corresponding checkpoints. Otherwise, new checkpoints will be saved to this path. During the distillation process, checkpoints are saved every distillCheckPointSavingStep steps.

Note: In our experiments, a distillation process of 25,000 steps is sufficient. Further increasing the distillation steps does not improve performance.

# parameters for distillation
distillInitialLR = 2e-5,
distillFinalLR = 5e-6,
distillBetas = (0.8, 0.95),
distillWeightDecay = 1e-2,
distillDeltaT = 0.01,
distillSteps = 25000,
distillBatch = 24,
distillCheckPointSavingStep = 5000,
distillModelPath = "./checkpoints/WaveFM_1000000",
distillCheckPointPath = "./distilled_checkpoints/Distilled_WaveFM_0",
distillAudiosPath = "Your Audios",
distillMelsPath = "Your Mels",
distillGPUs = [0], # [] represents CPU

Package Requirements

WaveFM has been tested on python 3.9 with the following requirements, and it should also work fine with the latest version.

torch           2.0.1+cu118
torchaudio      2.0.2

BibTex

@inproceedings{luo2025wavefm,
  title={Wavefm: A high-fidelity and efficient vocoder based on flow matching},
  author={Luo, Tianze and Miao, Xingchen and Duan, Wenbo},
  booktitle={Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)},
  pages={2187--2198},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
checkpoints		checkpoints
docs		docs
src		src
Readme.md		Readme.md
WaveFM A High-Fidelity and Efficient Vocoder Based on Flow Matching.pdf		WaveFM A High-Fidelity and Efficient Vocoder Based on Flow Matching.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching

Basic Usage

1. Get Mel-Spectrograms

2. Train WaveFM

3. Use Checkpoints For Inference

4. Distill WaveFM

Package Requirements

BibTex

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching

Basic Usage

1. Get Mel-Spectrograms

2. Train WaveFM

3. Use Checkpoints For Inference

4. Distill WaveFM

Package Requirements

BibTex

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages