[Feature] Adding Qwen3-asr Model Support by adityavaid · Pull Request #22073 · sgl-project/sglang

adityavaid · 2026-04-03T20:17:29Z

Motivation

Issue : #22025

This PR adds support so users can serve Qwen3-ASR via the existing /v1/audio/transcriptions endpoint.

References

vLLM implementation: qwen3_asr.py, qwen3_asr_realtime.py
HuggingFace models: Qwen3-ASR collection, Qwen3-ASR-1.7B, Qwen3-ASR-0.6B

New files

File	Purpose
`python/sglang/srt/configs/qwen3_asr.py`	Config classes (`Qwen3ASRConfig`, `Qwen3ASRThinkerConfig`) handling the nested `thinker_config → audio_config / text_config` layout
`python/sglang/srt/models/qwen3_asr.py`	Model class reusing `Qwen3OmniMoeAudioEncoder` + `Qwen3ForCausalLM`, with weight-loading prefix remapping (`thinker.*` → internal names)
`python/sglang/srt/multimodal/processors/qwen3_asr.py`	Multimodal processor with `<\|audio_start\|>/<\|audio_pad\|>/<\|audio_end\|>` token handling
`test/manual/test_qwen3_asr.py`	Manual test — launches server, sends audio, validates output

Modified files

File	What changed
`configs/__init__.py`	Export `Qwen3ASRConfig`
`configs/model_config.py`	Add `Qwen3ASRForConditionalGeneration` to `multimodal_model_archs` and `is_audio_model()`. Fix `is_audio_understandable_model` to check nested `thinker_config.audio_config`
`disaggregation/encode_server.py`	Add `qwen3_asr` branch in `_get_feat_extract_output_lengths` (same formula as `qwen3_omni_moe`)
`serving_transcription.py`	Detect model family at init. For Qwen3-ASR: build chat-template prompt, strip `<asr_text>` prefix from output, skip Whisper-specific timestamp parsing in verbose_json

Accuracy Tests

Verify existing Whisper tests still pass (no behavior change for Whisper path)
Test with both Qwen/Qwen3-ASR-0.6B and Qwen/Qwen3-ASR-1.7B
Adding Unit Tests for qwen3_asr

Server successfully launched using command

python3 -m sglang.launch_server    --model Qwen/Qwen3-ASR-0.6B    --port 30000    --host 0.0.0.0    --served-model-name qwen3-asr    --mem-fraction-static 0.85

Testing audio FLAC file : https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac"

#:/sgl-workspace/sglang# curl -s http://localhost:30000/v1/audio/transcriptions  -F file=@/tmp/test.flac  -F model=qwen3-asr  -F response_format=verbose_json | python3 -m json.tool
{
    "task": "transcribe",
    "language": null,
    "duration": 10.44,
    "text": "He hoped there would be stew for dinner\u2014turnips and carrots and bruised potatoes and fat mutton pieces\u2014to be ladled out in thick peppered flour-fatted sauce.",
    "segments": [],
    "usage": {
        "type": "duration",
        "seconds": 11
    }
}
#:/sgl-workspace/sglang# curl -s http://localhost:30000/generate   -H "Content-Type: application/json"   -d '{
    "text": "<|im_start|>user\n<|audio_start|><|audio_pad|><|audio_end|><|im_end|>\n<|im_start|>assistant\n",
    "audio_data": "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac",
    "modalities": ["audio"],
    "sampling_params": {"temperature": 0, "max_new_tokens": 256}
  }'
{"text":"language English<asr_text>He hoped there would be stew for dinner—turnips and carrots and bruised potatoes and fat mutton pieces—to be ladled out in thick peppered flour-fatted sauce.","output_ids":[11528,6364,151704,1519,25189,1052,1035,387,60343,369,13856,2293,412,3077,323,61417,323,42000,4056,34167,323,8664,296,959,9666,49517,387,57625,832,700,304,12045,24353,291,19828,2220,12127,19187,13,151645],"meta_info":{"id":"7d1491d1ad264b039ce40452cb269e28","finish_reason":{"type":"stop","matched":151645},"prompt_tokens":146,"weight_version":"default","total_retractions":0,"completion_tokens":40,"cached_tokens":0,"cached_tokens_details":null,"dp_rank":null,"e2e_latency":1.1166456790015218,"response_sent_to_client_ts":1775261852.3463674}}r

Speed Tests and Profiling

Qwen3-ASR-1.7B Benchmark

------------------------------
Results for Qwen/Qwen3-ASR-1.7B:
Total Requests: 50
WER: 25.2366
Average Latency: 0.4370s
Median Latency: 0.3654s
95th Latency: 0.8698s
Throughput: 5.17 req/s
Token Throughput: 117.69 tok/s
Total Test Time: 9.6776s
------------------------------

==================== Sample Predictions ====================
Sample 1:
  REF: um, on the use of taxonomy, i, you know, i think it's, it's early days for us to, to make any, um, clear indications to the market about, uh, the proportion that would fall under that, um, requirement.
  PRED: on the eu taxonomy, i think it's early days for us to make any clear indications to the market about the proportion that would fall under that requirement. so.
----------------------------------------
Sample 2:
  REF: so within fiscal year 2021, say 120, a hundred depending on what the micro will do, and next year, uh, it's not necessarily payable in q1, is we'll look at what the cash flows for 2022 look like.
  PRED: so within fiscal year 2021, say 120, depending on what the macro will do, and next year, it's not necessarily payable in q1, is we'll look at what the cash flows for 2022 look like and.
----------------------------------------
Sample 3:
  REF: we talked about 4.7 gigawatts.
  PRED: we talked about 4.7 gigawatts.
----------------------------------------
Sample 4:
  REF: and, you know, depending on that working capital build, we'll, we'll see what that yields.
  PRED: and depending on that working capital build, we'll see what that yields.
----------------------------------------
Sample 5:
  REF: so on, on sinopec, what we have agreed with sinopec way back then is that free cash flows after paying all capexs are distributed out 30, 70%.
  PRED: so, on sanopek, what we have agreed with sanopek way back then is that free cash flows, after paying all capexes, are distributed out 30-70%.
----------------------------------------
============================================================

Qwen3-ASR-0.6B Benchmark

------------------------------
Results for Qwen/Qwen3-ASR-0.6B:
Total Requests: 50
WER: 23.6593
Average Latency: 0.3149s
Median Latency: 0.2252s
95th Latency: 1.1177s
Throughput: 6.99 req/s
Token Throughput: 152.87 tok/s
Total Test Time: 7.1565s
------------------------------

==================== Sample Predictions ====================
Sample 1:
  REF: um, on the use of taxonomy, i, you know, i think it's, it's early days for us to, to make any, um, clear indications to the market about, uh, the proportion that would fall under that, um, requirement.
  PRED: on the eu taxonomy, i think it's early days for us to make any clear indications to the market about the proportion that would fall under that requirement.
----------------------------------------
Sample 2:
  REF: so within fiscal year 2021, say 120, a hundred depending on what the micro will do, and next year, uh, it's not necessarily payable in q1, is we'll look at what the cash flows for 2022 look like.
  PRED: so within fiscal year 2021, say 120, depending on what the micro will do, and next year, it's not necessarily payable in q1. is we'll look at what the cash flows for 2022 look like and.
----------------------------------------
Sample 3:
  REF: we talked about 4.7 gigawatts.
  PRED: we talked about 4.7 gigawatts.
----------------------------------------
Sample 4:
  REF: and, you know, depending on that working capital build, we'll, we'll see what that yields.
  PRED: and depending on that working capital build, we'll see what that yields.
----------------------------------------
Sample 5:
  REF: so on, on sinopec, what we have agreed with sinopec way back then is that free cash flows after paying all capexs are distributed out 30, 70%.
  PRED: so on on sinopac, what we have agreed with sinopac way back then is that free cash flows after paying all capexes are distributed out 30-70.
----------------------------------------
============================================================

Benchmark Results : SGLang v/s Transformers

Audio Sample used for testing:
AUDIO_EN = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"
AUDIO_ZH = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_zh.wav"

Model: `Qwen/Qwen3-ASR-0.6B`

Metric	SGLang	Transformers (HF)
EN transcription	"Oh yeah, yeah. He wasn't even that big when I started listening to him. But and …"	"Hmm. Oh yeah, yeah. He wasn't even that big when I started listening to him, but and ..."
ZH transcription	"甚至出现交易几乎停滞的情况。"	甚至出现交易几乎停滞的情况。
EN avg latency (5 runs)	0.246 s	1.113s
ZH avg latency (5 runs)	0.088 s	0.258s

Model: `Qwen/Qwen3-ASR-1.7B`

Metric	SGLang	Transformers (HF)
EN transcription	"Uh huh. Oh yeah, yeah. He wasn't even that big when I started listening to him, …"	"Hmm. Oh yeah, yeah. He wasn't even that big when I started listening to him, but and..."
ZH transcription	"甚至出现交易几乎停滞的情况。"	"甚至出现交易几乎停滞的情况。"
EN avg latency (5 runs)	0.472 s	1.158s
ZH avg latency (5 runs)	0.123 s	0.272s

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist · 2026-04-03T20:17:35Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

AgainstEntropy · 2026-04-03T21:27:13Z

Hi @adityavaid , thanks for your PR.
I tested with python -m sglang.launch_server --model Qwen/Qwen3-ASR-1.7B --trust-remote-code --port 30010 --host 0.0.0.0 --served-model-name qwen3-asr and it raised the following error:

AttributeError: Qwen2Tokenizer has no attribute tokenizer. Did you mean: '_tokenizer'?

Would you mind sharing the manual tests scripts and/or commands you were using? thx!

adityavaid · 2026-04-05T09:59:34Z

@mickqian addressed all comments

mickqian · 2026-04-05T10:38:55Z

how about numbers of transformers?

adityavaid · 2026-04-05T11:35:18Z

how about numbers of transformers?

Added that in another section, used a separate script to run transformers test in my setup. Should I add the script here for review ?

mickqian · 2026-04-05T16:40:47Z

/tag-and-rerun-ci

adityavaid · 2026-04-05T18:06:49Z

/rerun-failed-ci

mickqian · 2026-04-06T01:41:09Z

may I have accuracy comparison result?

AgainstEntropy · 2026-04-06T07:41:57Z

may I have accuracy comparison result?

Dataset: D4nt3/esb-datasets-earnings22-validation-tiny-filtered (511 samples, validation split)
GPU: 1× H200

Qwen3-ASR-0.6B

Metric	Transformers (bs=4)	SGLang (concurrency=4)
WER	23.56	23.58
Avg Latency	0.3276s	0.4352s
Median Latency	0.2837s	0.3483s
P95 Latency	0.5034s	0.6952s
Throughput	3.05 req/s	8.50 req/s
Token Throughput	59.17 tok/s	162.84 tok/s
Total Time	167.50s	60.15s

Qwen3-ASR-1.7B

Metric	Transformers (bs=4)	SGLang (concurrency=4)
WER	23.92	24.48
Avg Latency	0.3389s	0.4348s
Median Latency	0.2946s	0.3909s
P95 Latency	0.5268s	0.6906s
Throughput	2.95 req/s	8.45 req/s
Token Throughput	58.01 tok/s	165.80 tok/s
Total Time	173.26s	60.51s

Note: Latency is measured differently — Transformers reports batch_time / batch_size (amortized), while SGLang reports per-request end-to-end time (including network + server queuing).
Throughput and Total Time are more suitable for direct comparison.

Commands

Transformers

git clone https://github.com/QwenLM/Qwen3-ASR
cd Qwen3-ASR/
pip install -e .
pip install datasets evaluate jiwer librosa soundfile

bench_transformers.py

# 0.6B
python bench_transformers.py --model Qwen/Qwen3-ASR-0.6B --batch-size 4 --output results-0.6B.json

# 1.7B
python bench_transformers.py --model Qwen/Qwen3-ASR-1.7B --batch-size 4 --output results-1.7B.json

SGLang

pip install datasets evaluate jiwer librosa soundfile openai

# 0.6B
python -m sglang.launch_server --model Qwen/Qwen3-ASR-0.6B --port 30000 --host 0.0.0.0 --mem-fraction-static 0.85
python benchmark/asr/bench_sglang.py --base-url http://localhost:30000 --model Qwen/Qwen3-ASR-0.6B --api-type transcription --output results-0.6B.json

# 1.7B
python -m sglang.launch_server --model Qwen/Qwen3-ASR-1.7B --port 30000 --host 0.0.0.0 --mem-fraction-static 0.85
python benchmark/asr/bench_sglang.py --base-url http://localhost:30000 --model Qwen/Qwen3-ASR-1.7B --api-type transcription --output results-1.7B.json

adityavaid · 2026-04-06T11:36:01Z

@mickqian
I did add the accuracy benchmark WER in the Description.
Can you approve again, I addressed all the other comments .

Implement streaming transcription with chunk-based processing and prefix rollback, based on the Qwen3-ASR paper (arXiv:2601.21337). New files: - streaming_asr.py: StreamingASRState, split_audio_chunks, build_streaming_prompt Modified files: - serving_transcription.py: route streaming requests through chunked ASR pipeline for Qwen3-ASR model family - hf_transformers_utils.py: add Qwen3ASRConfig to _CONFIG_REGISTRY Depends on sgl-project#22073 for Qwen3-ASR model support. Ref: sgl-project#22025 (streaming input), vllm-project/vllm#35908 (related RFC)

Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>

adityavaid requested review from ByronHsu, CatherineSue, JustinTong0323, ShangmingCai, hnyls2002, ispobock, merrymercy, mickqian, slin1237, yhyang201 and yuan-luo as code owners April 3, 2026 20:17

adityavaid changed the title ~~[Feat] Adding Qwen3_asr Model Support~~ [Feat] Adding Qwen3-asr Model Support Apr 3, 2026

adityavaid changed the title ~~[Feat] Adding Qwen3-asr Model Support~~ [Feature] Adding Qwen3-asr Model Support Apr 3, 2026

adityavaid mentioned this pull request Apr 3, 2026

[model] support qwen3-asr #22025

Closed

adityavaid force-pushed the adddingQwen3AsrModelSupport branch from 645bd6e to 4e3a498 Compare April 3, 2026 23:12

adityavaid requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg, Fridge003, HaiShaw, Qiaolin-Yu, Ying1123, b8zhong, ch-wan, hebiao064, xiezhq-hermann and yizhang2077 as code owners April 3, 2026 23:12

updating docs

515660b

mickqian approved these changes Apr 5, 2026

View reviewed changes

github-actions Bot added the run-ci label Apr 5, 2026

AgainstEntropy mentioned this pull request Apr 6, 2026

[feat] add Qwen3-ASR model support and related configurations #22071

Closed

5 tasks

JustinTong0323 reviewed Apr 6, 2026

View reviewed changes

Comment thread python/sglang/srt/configs/model_config.py

JustinTong0323 reviewed Apr 6, 2026

View reviewed changes

Comment thread python/sglang/srt/configs/qwen3_asr.py Outdated

JustinTong0323 reviewed Apr 6, 2026

View reviewed changes

Comment thread python/sglang/srt/configs/qwen3_asr.py

JustinTong0323 reviewed Apr 6, 2026

View reviewed changes

Comment thread python/sglang/srt/entrypoints/openai/serving_transcription.py

AgainstEntropy mentioned this pull request Apr 6, 2026

[refactor] [asr] Add transcription adapter for extensible ASR models support #22181

Merged

5 tasks

mickqian approved these changes Apr 6, 2026

View reviewed changes

adityavaid added 2 commits April 6, 2026 16:54

Addressing further comments by JustinTong0323

d7e7a46

lint fix

5601c42

Merge branch 'main' into adddingQwen3AsrModelSupport

544499f

mickqian merged commit f6e8567 into sgl-project:main Apr 7, 2026
267 of 325 checks passed

Fridge003 pushed a commit that referenced this pull request Apr 7, 2026

model: support qwen3-asr (#22073)

61d9e2d

Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>

This was referenced Apr 10, 2026

[RFC]: Real-Time Streaming Audio Input for ASR Models #22474

Open

[Feature] WebSocket streaming audio input for ASR #22821

Closed

[Feature] WebSocket streaming audio input for ASR #22848

Open

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

model: support qwen3-asr (sgl-project#22073)

1498e74

Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>

AgainstEntropy mentioned this pull request Apr 30, 2026

[Fix] Qwen3-ASR config: set thinker_config before super().__init__ #24187

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Adding Qwen3-asr Model Support#22073

[Feature] Adding Qwen3-asr Model Support#22073
mickqian merged 12 commits intosgl-project:mainfrom
adityavaid:adddingQwen3AsrModelSupport

adityavaid commented Apr 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 3, 2026

Uh oh!

AgainstEntropy commented Apr 3, 2026

Uh oh!

adityavaid commented Apr 5, 2026

Uh oh!

mickqian commented Apr 5, 2026

Uh oh!

adityavaid commented Apr 5, 2026 •

edited

Loading

Uh oh!

mickqian commented Apr 5, 2026

Uh oh!

adityavaid commented Apr 5, 2026

Uh oh!

mickqian commented Apr 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AgainstEntropy commented Apr 6, 2026

Uh oh!

adityavaid commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

adityavaid commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

References

New files

Modified files

Accuracy Tests

Speed Tests and Profiling

Qwen3-ASR-1.7B Benchmark

Qwen3-ASR-0.6B Benchmark

Benchmark Results : SGLang v/s Transformers

Model: Qwen/Qwen3-ASR-0.6B

Model: Qwen/Qwen3-ASR-1.7B

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented Apr 3, 2026

Uh oh!

AgainstEntropy commented Apr 3, 2026

Uh oh!

adityavaid commented Apr 5, 2026

Uh oh!

mickqian commented Apr 5, 2026

Uh oh!

adityavaid commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mickqian commented Apr 5, 2026

Uh oh!

adityavaid commented Apr 5, 2026

Uh oh!

mickqian commented Apr 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AgainstEntropy commented Apr 6, 2026

Qwen3-ASR-0.6B

Qwen3-ASR-1.7B

Commands

Transformers

SGLang

Uh oh!

adityavaid commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adityavaid commented Apr 3, 2026 •

edited

Loading

Model: `Qwen/Qwen3-ASR-0.6B`

Model: `Qwen/Qwen3-ASR-1.7B`

adityavaid commented Apr 5, 2026 •

edited

Loading