Integrate w2v-bert2-LoRA-adapter-MFA model by shangguanqituan · Pull Request #439 · wenet-e2e/wespeaker

shangguanqituan · 2025-10-31T10:44:27Z

Overview

This pull request (PR) integrates the advanced model proposed in the paper "ENHANCING SPEAKER VERIFICATION WITH W2V-BERT 2.0 AND KNOWLEDGE DISTILLATION GUIDED STRUCTURED PRUNING" into the wespeaker framework.

We have successfully implemented the full three-stage training pipeline for the w2v-bert2-lora-adapter-mfa model and ensured its compatibility with the existing wespeaker ecosystem. This not only introduces a powerful new model to the framework but also opens up new possibilities for future research and applications.

Key Features and Changes

Model Integration:
- Frontend: Added a w2vbert2 frontend in wespeaker/frontend, which incorporates LoRA (Low-Rank Adaptation) for efficient fine-tuning.
- Model: Implemented an Adapter-MFA (Multi-Factor Attention) module in wespeaker/models. This module serves as the speaker model and is specifically designed to process the multi-layer hidden states from the w2vbert2 frontend.
New Training Pipeline:
- Added complete three-stage training configuration files (YAML) and a corresponding execution script, run_w2v.sh, to facilitate the reproduction of the original paper's training process.
Framework Adaptations:
- Dataflow Handling:
  - To meet the input requirements of the MFA module, the w2vbert2 frontend now returns all Transformer layer hidden states (all_hidden_states) as a tuple.
  - To handle this new return type (which differs from other frontends that return a last_hidden_state tensor), we have added conditional logic in the executor and extract modules to ensure smooth pipeline execution.
- DistributedDataParallel (DDP) Configuration: Due to the increased complexity of gradient computation introduced by LoRA and MFA, we found it necessary to set find_unused_parameters=True for DistributedDataParallel in train.py to resolve gradient synchronization issues.
- Learning Rate Schedulers: Added two new schedulers, WarmupLR_withStepDecay and WarmupCosineScheduler, to scheduler.py to meet the specific requirements for reproducing the paper's results.
- ASP Compatibility: The Automatic Speaker Verification Pipeline (ASP) has been slightly modified to be compatible with the new model's outputs without affecting the functionality of existing models in the framework.
Pre-trained Model Compatibility:
- To allow users to easily load the official pre-trained checkpoints (ckpt) from the paper, we have modified checkpoint.py.
- This change addresses a classifier dimension mismatch (5994*3 vs. 5994) that occurs because the original paper's third training stage does not use speed_perturb data augmentation. The code can now intelligently slice the weights to the correct dimension, enabling successful loading of the official model.

We believe this integration will greatly enrich the wespeaker model zoo and provide the community with a powerful new tool. We look forward to your feedback and review!

JiJiJiang · 2025-11-03T07:24:33Z

Please fix flake8 errors.

shangguanqituan · 2025-11-03T10:30:07Z

Thanks for your feedback! I've just pushed a new commit that addresses all the flake8 style issues.

wsstriving · 2025-11-05T15:46:27Z

examples/voxceleb/v2/conf/w2vbert_s3_lmft.yaml

Perhaps you would prefer not to expose your local directories here

wsstriving

Can you also update the results and the pretrained model pages?

shangguanqituan · 2025-11-05T16:17:58Z

Okay, I will update later.

cdliang11 · 2025-11-18T06:33:49Z

.pre-commit-config.yaml

Why modify this file?

cdliang11 · 2025-11-18T06:36:00Z

examples/sre/v3/local/make_system_sad.py

To ensure the simplicity of PR, irrelevant parts do not need to be modified.

I didn’t intentionally modify those two files — it seems the formatting tool automatically updated them during commit. I will submit a clean PR later.

shangguanqituan · 2025-11-22T14:17:59Z

Update Summary:

Code Reorganization: I have performed a clean commit to strictly limit changes to relevant files. This ensures that no unrelated files (e.g., whitespace changes in other scripts) are touched, addressing the previous feedback.
Updated Results: I have updated the README.md with the latest experimental results, including:

Reproduction Results: Trained on VoxCeleb (from scratch).
Verification Results: Inference using the official checkpoint to verify correctness.

Pretrained Models: I have uploaded the checkpoints to ModelScope and updated the model list in README.md.

Reproduced Models: Trained on VoxCeleb.
Official Models: Trained on VoxCeleb + VoxBlink.

Note on ONNX: I attempted to export the model to ONNX using wespeaker/bin/export_onnx.py, but it failed due to the complexity of the W2V-BERT architecture (specifically dynamic axes in MFA adapter layers). Therefore, I marked the Runtime Model column as -.

cdliang11 · 2025-12-02T08:04:07Z

examples/voxceleb/v2/run_w2v.sh

@@ -0,0 +1,273 @@
+#!/bin/bash
+
+# Copyright 2025 Your Name/Org (your_email@example.com)


Change to your name and email address

cdliang11 · 2025-12-02T08:04:57Z

wespeaker/frontend/w2vbert.py

@@ -0,0 +1,388 @@
+# Copyright (c) 2025 Your Name/Org (your_email@example.com)


Change to your name and email address

cdliang11 · 2025-12-02T08:05:06Z

wespeaker/models/w2vbert_adapter_mfa.py

@@ -0,0 +1,126 @@
+# Copyright (c) 2025 Your Name/Org


Change to your name and email address

cdliang11 · 2025-12-02T08:05:54Z

Good job!

cdliang11 · 2025-12-02T08:07:23Z

Please fix flake8 errors

cdliang11 · 2025-12-02T09:19:17Z

Merged.

wsstriving · 2025-12-02T12:07:25Z

@cdliang11 maybe we also want to support this model in the CLI mode? @shangguanqituan We also need to decide which checkpoint to upload in this mode, the one you trained or the one provided by the original paper

cdliang11 · 2025-12-03T09:43:05Z

@cdliang11 maybe we also want to support this model in the CLI mode? @shangguanqituan We also need to decide which checkpoint to upload in this mode, the one you trained or the one provided by the original paper

Of course, supporting this model in CLI mode is feasible. @shangguanqituan, please select the model checkpoint to adopt.

shangguanqituan · 2025-12-03T09:50:33Z

Let's use the checkpoint provided by the original paper — w2v_bert2_voxblink_official_LM.pth.
It’s listed in pretrained.md.

wsstriving reviewed Nov 5, 2025

View reviewed changes

examples/voxceleb/v2/conf/w2vbert_s3_lmft.yaml

Copy link

Collaborator

wsstriving Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you would prefer not to expose your local directories here

wsstriving reviewed Nov 5, 2025

View reviewed changes

cdliang11 reviewed Nov 18, 2025

View reviewed changes

shangguanqituan added 2 commits November 22, 2025 14:51

Reorganize and clean up code w2v integration

08d470d

Update the results and the pretrained model pages

7588656

shangguanqituan force-pushed the feature/w2v-integration branch from db87092 to 7588656 Compare November 22, 2025 14:14

cdliang11 reviewed Dec 2, 2025

View reviewed changes

resolve flake8 issues and update email

013dafb

cdliang11 approved these changes Dec 2, 2025

View reviewed changes

cdliang11 merged commit 7d7b707 into wenet-e2e:master Dec 2, 2025
4 checks passed

		@@ -0,0 +1,273 @@
		#!/bin/bash

		# Copyright 2025 Your Name/Org (your_email@example.com)

		@@ -0,0 +1,388 @@
		# Copyright (c) 2025 Your Name/Org (your_email@example.com)

Conversation

shangguanqituan commented Oct 31, 2025

Overview

Key Features and Changes

Uh oh!

JiJiJiang commented Nov 3, 2025

Uh oh!

shangguanqituan commented Nov 3, 2025

Uh oh!

wsstriving Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

wsstriving left a comment

Choose a reason for hiding this comment

Uh oh!

shangguanqituan commented Nov 5, 2025

Uh oh!

cdliang11 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

cdliang11 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

shangguanqituan Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

shangguanqituan commented Nov 22, 2025

Uh oh!

cdliang11 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

cdliang11 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

cdliang11 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

cdliang11 commented Dec 2, 2025

Uh oh!

cdliang11 commented Dec 2, 2025

Uh oh!

Uh oh!

cdliang11 commented Dec 2, 2025

Uh oh!

wsstriving commented Dec 2, 2025

Uh oh!

cdliang11 commented Dec 3, 2025

Uh oh!

shangguanqituan commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants