Integrate w2v-bert2-LoRA-adapter-MFA model#439
Conversation
|
Please fix flake8 errors. |
|
Thanks for your feedback! I've just pushed a new commit that addresses all the flake8 style issues. |
There was a problem hiding this comment.
Perhaps you would prefer not to expose your local directories here
wsstriving
left a comment
There was a problem hiding this comment.
Can you also update the results and the pretrained model pages?
|
Okay, I will update later. |
There was a problem hiding this comment.
To ensure the simplicity of PR, irrelevant parts do not need to be modified.
There was a problem hiding this comment.
I didn’t intentionally modify those two files — it seems the formatting tool automatically updated them during commit. I will submit a clean PR later.
db87092 to
7588656
Compare
|
Update Summary:
Note on ONNX: I attempted to export the model to ONNX using wespeaker/bin/export_onnx.py, but it failed due to the complexity of the W2V-BERT architecture (specifically dynamic axes in MFA adapter layers). Therefore, I marked the Runtime Model column as -. |
examples/voxceleb/v2/run_w2v.sh
Outdated
| @@ -0,0 +1,273 @@ | |||
| #!/bin/bash | |||
|
|
|||
| # Copyright 2025 Your Name/Org (your_email@example.com) | |||
There was a problem hiding this comment.
Change to your name and email address
wespeaker/frontend/w2vbert.py
Outdated
| @@ -0,0 +1,388 @@ | |||
| # Copyright (c) 2025 Your Name/Org (your_email@example.com) | |||
There was a problem hiding this comment.
Change to your name and email address
| @@ -0,0 +1,126 @@ | |||
| # Copyright (c) 2025 Your Name/Org | |||
There was a problem hiding this comment.
Change to your name and email address
|
Good job! |
|
Please fix flake8 errors |
|
Merged. |
|
@cdliang11 maybe we also want to support this model in the CLI mode? @shangguanqituan We also need to decide which checkpoint to upload in this mode, the one you trained or the one provided by the original paper |
Of course, supporting this model in CLI mode is feasible. @shangguanqituan, please select the model checkpoint to adopt. |
|
Let's use the checkpoint provided by the original paper — w2v_bert2_voxblink_official_LM.pth. |
Overview
This pull request (PR) integrates the advanced model proposed in the paper "ENHANCING SPEAKER VERIFICATION WITH W2V-BERT 2.0 AND KNOWLEDGE DISTILLATION GUIDED STRUCTURED PRUNING" into the
wespeakerframework.We have successfully implemented the full three-stage training pipeline for the
w2v-bert2-lora-adapter-mfamodel and ensured its compatibility with the existingwespeakerecosystem. This not only introduces a powerful new model to the framework but also opens up new possibilities for future research and applications.Key Features and Changes
Model Integration:
w2vbert2frontend inwespeaker/frontend, which incorporates LoRA (Low-Rank Adaptation) for efficient fine-tuning.wespeaker/models. This module serves as the speaker model and is specifically designed to process the multi-layer hidden states from thew2vbert2frontend.New Training Pipeline:
run_w2v.sh, to facilitate the reproduction of the original paper's training process.Framework Adaptations:
w2vbert2frontend now returns all Transformer layer hidden states (all_hidden_states) as a tuple.last_hidden_statetensor), we have added conditional logic in theexecutorandextractmodules to ensure smooth pipeline execution.find_unused_parameters=TrueforDistributedDataParallelintrain.pyto resolve gradient synchronization issues.WarmupLR_withStepDecayandWarmupCosineScheduler, toscheduler.pyto meet the specific requirements for reproducing the paper's results.Pre-trained Model Compatibility:
checkpoint.py.5994*3vs.5994) that occurs because the original paper's third training stage does not usespeed_perturbdata augmentation. The code can now intelligently slice the weights to the correct dimension, enabling successful loading of the official model.We believe this integration will greatly enrich the
wespeakermodel zoo and provide the community with a powerful new tool. We look forward to your feedback and review!