Support Multi Detokenizer based on Multi Tokenizer by LLLL114 · Pull Request #9970 · sgl-project/sglang

LLLL114 · 2025-09-03T11:38:33Z

Motivation

Enable multi detokenizer based on Multi Tokenizer.

Modifications

Add detokenizer-worker-num to set nums of detokenizer-worker-num
To minimize the code change and save socket usage, reuse most structure of MultiTokenizerMixin, and the detokenizer nums must be divisible by tokenizer nums.
Add MultiDetokenizerRouter to route request from scheduler to multi detokenizer.

Benchmarking and Profiling

set detokenizer-worker-num = 1,4,8
cmd:

SGLANG_USE_MODELSCOPE=true \
python -m sglang.launch_server \
    --model-path /root/.cache/modelscope/hub/models/Qwen/Qwen2.5-0.5B --disaggregation-mode null \
    --port $PORT --base-gpu-id $BASEGPUID \
    --trust-remote-code --tp-size 1 --dp-size 8 --tokenizer-worker-num 8 --detokenizer-worker-num 1 \
    --disable-radix-cache

detokenizer-worker-num = 1
============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    inf       
Max request concurrency:                 1500      
Successful requests:                     10000     
Benchmark duration (s):                  133.01    
Total input tokens:                      20707676  
Total generated tokens:                  5122821   
Total generated tokens (retokenized):    5120982   
Request throughput (req/s):              75.18     
Input token throughput (tok/s):          155689.50 
Output token throughput (tok/s):         38515.64  
Total token throughput (tok/s):          194205.15 
Concurrency:                             1412.65   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   18789.19  
Median E2E Latency (ms):                 17832.40  
---------------Time to First Token----------------
Mean TTFT (ms):                          4460.66   
Median TTFT (ms):                        4030.85   
P99 TTFT (ms):                           13525.59  
---------------Inter-Token Latency----------------
Mean ITL (ms):                           28.02     
Median ITL (ms):                         0.01      
P95 ITL (ms):                            102.88    
P99 ITL (ms):                            181.98    
Max ITL (ms):                            7381.73   
==================================================

detokenizer-worker-num = 4
============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    inf       
Max request concurrency:                 1500      
Successful requests:                     10000     
Benchmark duration (s):                  129.37    
Total input tokens:                      20707676  
Total generated tokens:                  5122821   
Total generated tokens (retokenized):    5121155   
Request throughput (req/s):              77.30     
Input token throughput (tok/s):          160069.65 
Output token throughput (tok/s):         39599.24  
Total token throughput (tok/s):          199668.89 
Concurrency:                             1326.15   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   17155.94  
Median E2E Latency (ms):                 15163.17  
---------------Time to First Token----------------
Mean TTFT (ms):                          2551.11   
Median TTFT (ms):                        1784.67   
P99 TTFT (ms):                           17076.68  
---------------Inter-Token Latency----------------
Mean ITL (ms):                           28.57     
Median ITL (ms):                         0.01      
P95 ITL (ms):                            146.90    
P99 ITL (ms):                            481.66    
Max ITL (ms):                            6877.82   
==================================================

detokenizer-worker-num = 8
============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    inf       
Max request concurrency:                 1500      
Successful requests:                     10000     
Benchmark duration (s):                  128.83    
Total input tokens:                      20707676  
Total generated tokens:                  5122821   
Total generated tokens (retokenized):    5121141   
Request throughput (req/s):              77.62     
Input token throughput (tok/s):          160741.49 
Output token throughput (tok/s):         39765.44  
Total token throughput (tok/s):          200506.93 
Concurrency:                             1337.05   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   17224.71  
Median E2E Latency (ms):                 15310.77  
---------------Time to First Token----------------
Mean TTFT (ms):                          2456.57   
Median TTFT (ms):                        1451.91   
P99 TTFT (ms):                           24556.52  
---------------Inter-Token Latency----------------
Mean ITL (ms):                           28.88     
Median ITL (ms):                         0.01      
P95 ITL (ms):                            167.74    
P99 ITL (ms):                            456.69    
Max ITL (ms):                            6719.15   
==================================================

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

Summary by CodeRabbit

New Features
- Supports multiple detokenizer workers coordinated via a router for parallel detokenization.
- Adds CLI flags to configure tokenizer and detokenizer worker counts, with validation for compatible ratios.
Performance
- Increased throughput and scalability for tokenization/detokenization in multi-worker configurations.
Tests
- Test suite updated to run with multiple detokenizer workers to reflect new parallel setup.

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

whybeyoung · 2025-09-04T06:32:50Z

Thanks for you contribution~ LGTM

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

miter6 · 2025-09-07T07:11:19Z

@LLLL114 Hi，
Dose Multi tokenizer and Detokenizer support for PD？？

LLLL114 · 2025-09-07T09:03:48Z

@LLLL114 Hi，

Dose Multi tokenizer and Detokenizer support for PD？？

Sure,you can try with tokenizer-worker-num and detokenizer-worker-num

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

whybeyoung · 2025-09-10T07:03:07Z

@hnyls2002 can you review it

Signed-off-by: ybyang <ybyang7@iflytek.com>

a4zhangfei · 2025-11-18T02:10:14Z

            raise ValueError(f"Unknown req type: {type(req)}")


+class MultiDetokenizerRouter:


The router might become a bottleneck.

for us, we don't encounter the boottleneck now , what's you case? (maybe our hardware can't reach thos hign cocurrency.

Backport sgl-project#9970 with adaptations for the V4 PD branch. Add a new --detokenizer-worker-num CLI flag that scales the detokenizer out of the single-process bottleneck. When > 1, N DetokenizerManager processes each listen on a private IPC socket and a new MultiDetokenizerRouter process owns the public detokenizer IPC and fans out scheduler outputs by hashing http_worker_ipc (zlib.crc32, so routing is deterministic across runs). Stream-order is preserved because all outputs of the same HTTP/tokenizer worker pin to the same detokenizer. * server_args: new field + CLI arg + divisibility check (tokenizer_worker_num must be a multiple of detokenizer_worker_num); skip_tokenizer_init forces it back to 1. * multi_tokenizer_mixin: - SocketMapping.send_output gains an optional is_tokenizer flag (only affects log labelling). - multi_http_worker_event_loop now also handles BaseReq, since the detok router may forward single requests downstream. - new MultiDetokenizerRouter class + run_multi_detokenizer_router_process. * detokenizer_manager: skip the unused send_to_tokenizer socket in multi-tokenizer mode (results go through SocketMapping instead). * engine: factor detok launch into _launch_detokenizer_subprocesses, spawning N detok workers + 1 router when detokenizer_worker_num > 1. * test_multi_tokenizer: exercise --detokenizer-worker-num 4.

LLLL114 added 2 commits September 3, 2025 19:34

support multi detokenizer

cd3e474

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

update

6be4e98

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

LLLL114 marked this pull request as ready for review September 4, 2025 06:21

LLLL114 requested review from CatherineSue, Ying1123, hnyls2002, ispobock, merrymercy, slin1237 and xiezhq-hermann as code owners September 4, 2025 06:21

lw9527 reviewed Sep 4, 2025

View reviewed changes

Comment thread python/sglang/srt/entrypoints/engine.py

LLLL114 and others added 4 commits September 5, 2025 10:31

Merge branch 'main' into multi_detokenizer_manager

b8b69f0

typo fix

82d448c

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main' into multi_detokenizer_manager

afb9207

lint

d674bac

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

LLLL114 and others added 3 commits September 8, 2025 14:44

Merge branch 'main' into multi_detokenizer_manager

f76c646

fix conflict based on refactored Multi_Tokenizer_Mixin

8b6a180

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main' into multi_detokenizer_manager

e15913c

JustinTong0323 reviewed Sep 9, 2025

View reviewed changes

Comment thread python/sglang/srt/managers/multi_tokenizer_mixin.py Outdated

JustinTong0323 reviewed Sep 9, 2025

View reviewed changes

Comment thread python/sglang/srt/server_args.py

LLLL114 and others added 3 commits September 9, 2025 16:24

update

a3d4a03

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main' into multi_detokenizer_manager

1fe270a

fix logger.info

c4150e5

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

LLLL114 and others added 3 commits September 10, 2025 15:05

Merge branch 'main' into multi_detokenizer_manager

d3366c9

Merge branch 'main' into multi_detokenizer_manager

a1e9652

Merge branch 'main' into multi_detokenizer_manager

882941b

hnyls2002 requested changes Sep 23, 2025

View reviewed changes

Comment thread python/sglang/srt/managers/multi_tokenizer_mixin.py Outdated

Comment thread python/sglang/srt/managers/multi_tokenizer_mixin.py Outdated

whybeyoung and others added 3 commits September 28, 2025 18:30

Merge branch 'main' into multi_detokenizer_manager

3fa2758

fix: auto

130fb25

Signed-off-by: ybyang <ybyang7@iflytek.com>

fix: remove router type

6023602

Signed-off-by: ybyang <ybyang7@iflytek.com>

whybeyoung force-pushed the multi_detokenizer_manager branch from 6f774ce to 6023602 Compare September 28, 2025 12:40

whybeyoung and others added 4 commits September 28, 2025 23:31

Merge branch 'main' into multi_detokenizer_manager

d8627ea

Merge branch 'main' into multi_detokenizer_manager

5c4ca8c

Merge branch 'main' into feature/multi_detokenizer_manager

d63690e

Merge branch 'main' into feature/multi_detokenizer_manager

fc5d49e

This comment was marked as outdated.

Sign in to view

whybeyoung and others added 7 commits October 12, 2025 10:54

Merge branch 'main' into multi_detokenizer_manager

b90a60f

Merge branch 'main' into multi_detokenizer_manager

0bf462b

Merge branch 'main' into multi_detokenizer_manager

482d4a4

Merge branch 'main' into multi_detokenizer_manager

17209c1

Merge branch 'main' into feature/multi-dtml

6ef2c0e

Merge branch 'main' into multi_detokenizer_manager

4073d81

Remove redaunt code.

bc75549

sgl-project deleted a comment from coderabbitai Bot Oct 24, 2025

whybeyoung and others added 6 commits October 24, 2025 17:10

fix_hash detokeinzer

9c193ce

Signed-off-by: ybyang <ybyang7@iflytek.com>

Merge branch 'main' into multi_detokenizer_manager

6b56e8f

lint

5054e8b

Signed-off-by: ybyang <ybyang7@iflytek.com>

Merge branch 'main' into multi_detokenizer_manager

490a629

Merge branch 'main' into multi_detokenizer_manager

44b6563

Merge branch 'main' into multi_detokenizer_manager

76ae351

a4zhangfei reviewed Nov 18, 2025

View reviewed changes

merrymercy requested a review from zhyncs as a code owner November 29, 2025 07:06

Merge branch 'main' into feature/multi-dtml

6c667bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Multi Detokenizer based on Multi Tokenizer#9970

Support Multi Detokenizer based on Multi Tokenizer#9970
LLLL114 wants to merge 50 commits intosgl-project:mainfrom
LLLL114:multi_detokenizer_manager

LLLL114 commented Sep 3, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

whybeyoung commented Sep 4, 2025

Uh oh!

Uh oh!

miter6 commented Sep 7, 2025

Uh oh!

LLLL114 commented Sep 7, 2025

Uh oh!

Uh oh!

Uh oh!

whybeyoung commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

a4zhangfei Nov 18, 2025

Uh oh!

whybeyoung Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

		raise ValueError(f"Unknown req type: {type(req)}")


		class MultiDetokenizerRouter:

Conversation

LLLL114 commented Sep 3, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Benchmarking and Profiling

Checklist

Summary by CodeRabbit

Uh oh!

whybeyoung commented Sep 4, 2025

Uh oh!

Uh oh!

miter6 commented Sep 7, 2025

Uh oh!

LLLL114 commented Sep 7, 2025

Uh oh!

Uh oh!

Uh oh!

whybeyoung commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

a4zhangfei Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

whybeyoung Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

LLLL114 commented Sep 3, 2025 •

edited by coderabbitai Bot

Loading