[Feature] Adding Qwen3-asr Model Support#22073
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Hi @adityavaid , thanks for your PR. Would you mind sharing the manual tests scripts and/or commands you were using? thx! |
645bd6e to
4e3a498
Compare
|
@mickqian addressed all comments |
|
how about numbers of transformers? |
Added that in another section, used a separate script to run transformers test in my setup. Should I add the script here for review ? |
|
/tag-and-rerun-ci |
|
/rerun-failed-ci |
|
may I have accuracy comparison result? |
Dataset: Qwen3-ASR-0.6B
Qwen3-ASR-1.7B
CommandsTransformersgit clone https://github.com/QwenLM/Qwen3-ASR
cd Qwen3-ASR/
pip install -e .
pip install datasets evaluate jiwer librosa soundfileSGLangpip install datasets evaluate jiwer librosa soundfile openai
# 0.6B
python -m sglang.launch_server --model Qwen/Qwen3-ASR-0.6B --port 30000 --host 0.0.0.0 --mem-fraction-static 0.85
python benchmark/asr/bench_sglang.py --base-url http://localhost:30000 --model Qwen/Qwen3-ASR-0.6B --api-type transcription --output results-0.6B.json
# 1.7B
python -m sglang.launch_server --model Qwen/Qwen3-ASR-1.7B --port 30000 --host 0.0.0.0 --mem-fraction-static 0.85
python benchmark/asr/bench_sglang.py --base-url http://localhost:30000 --model Qwen/Qwen3-ASR-1.7B --api-type transcription --output results-1.7B.json |
|
@mickqian |
Implement streaming transcription with chunk-based processing and prefix rollback, based on the Qwen3-ASR paper (arXiv:2601.21337). New files: - streaming_asr.py: StreamingASRState, split_audio_chunks, build_streaming_prompt Modified files: - serving_transcription.py: route streaming requests through chunked ASR pipeline for Qwen3-ASR model family - hf_transformers_utils.py: add Qwen3ASRConfig to _CONFIG_REGISTRY Depends on sgl-project#22073 for Qwen3-ASR model support. Ref: sgl-project#22025 (streaming input), vllm-project/vllm#35908 (related RFC)
Implement streaming transcription with chunk-based processing and prefix rollback, based on the Qwen3-ASR paper (arXiv:2601.21337). New files: - streaming_asr.py: StreamingASRState, split_audio_chunks, build_streaming_prompt Modified files: - serving_transcription.py: route streaming requests through chunked ASR pipeline for Qwen3-ASR model family - hf_transformers_utils.py: add Qwen3ASRConfig to _CONFIG_REGISTRY Depends on sgl-project#22073 for Qwen3-ASR model support. Ref: sgl-project#22025 (streaming input), vllm-project/vllm#35908 (related RFC)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Motivation
Issue : #22025
This PR adds support so users can serve Qwen3-ASR via the existing
/v1/audio/transcriptionsendpoint.References
New files
python/sglang/srt/configs/qwen3_asr.pyQwen3ASRConfig,Qwen3ASRThinkerConfig) handling the nestedthinker_config → audio_config / text_configlayoutpython/sglang/srt/models/qwen3_asr.pyQwen3OmniMoeAudioEncoder+Qwen3ForCausalLM, with weight-loading prefix remapping (thinker.*→ internal names)python/sglang/srt/multimodal/processors/qwen3_asr.py<|audio_start|>/<|audio_pad|>/<|audio_end|>token handlingtest/manual/test_qwen3_asr.pyModified files
configs/__init__.pyQwen3ASRConfigconfigs/model_config.pyQwen3ASRForConditionalGenerationtomultimodal_model_archsandis_audio_model(). Fixis_audio_understandable_modelto check nestedthinker_config.audio_configdisaggregation/encode_server.pyqwen3_asrbranch in_get_feat_extract_output_lengths(same formula asqwen3_omni_moe)serving_transcription.py<asr_text>prefix from output, skip Whisper-specific timestamp parsing in verbose_jsonAccuracy Tests
Qwen/Qwen3-ASR-0.6BandQwen/Qwen3-ASR-1.7BSpeed Tests and Profiling
Qwen3-ASR-1.7B Benchmark
Qwen3-ASR-0.6B Benchmark
Benchmark Results : SGLang v/s Transformers
Audio Sample used for testing:
AUDIO_EN = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"
AUDIO_ZH = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_zh.wav"
Model:
Qwen/Qwen3-ASR-0.6BModel:
Qwen/Qwen3-ASR-1.7BChecklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci