Support Multi Detokenizer based on Multi Tokenizer#9970
Open
LLLL114 wants to merge 50 commits intosgl-project:mainfrom
Open
Support Multi Detokenizer based on Multi Tokenizer#9970LLLL114 wants to merge 50 commits intosgl-project:mainfrom
LLLL114 wants to merge 50 commits intosgl-project:mainfrom
Conversation
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Collaborator
|
Thanks for you contribution~ LGTM |
lw9527
reviewed
Sep 4, 2025
Contributor
|
@LLLL114 Hi, |
Contributor
Author
Sure,you can try with tokenizer-worker-num and detokenizer-worker-num |
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Collaborator
|
@hnyls2002 can you review it |
hnyls2002
requested changes
Sep 23, 2025
Signed-off-by: ybyang <ybyang7@iflytek.com>
6f774ce to
6023602
Compare
Signed-off-by: ybyang <ybyang7@iflytek.com>
a4zhangfei
reviewed
Nov 18, 2025
| raise ValueError(f"Unknown req type: {type(req)}") | ||
|
|
||
|
|
||
| class MultiDetokenizerRouter: |
Contributor
There was a problem hiding this comment.
The router might become a bottleneck.
Collaborator
There was a problem hiding this comment.
for us, we don't encounter the boottleneck now , what's you case? (maybe our hardware can't reach thos hign cocurrency.
whybeyoung
added a commit
to whybeyoung/sglang
that referenced
this pull request
Apr 28, 2026
Backport sgl-project#9970 with adaptations for the V4 PD branch. Add a new --detokenizer-worker-num CLI flag that scales the detokenizer out of the single-process bottleneck. When > 1, N DetokenizerManager processes each listen on a private IPC socket and a new MultiDetokenizerRouter process owns the public detokenizer IPC and fans out scheduler outputs by hashing http_worker_ipc (zlib.crc32, so routing is deterministic across runs). Stream-order is preserved because all outputs of the same HTTP/tokenizer worker pin to the same detokenizer. * server_args: new field + CLI arg + divisibility check (tokenizer_worker_num must be a multiple of detokenizer_worker_num); skip_tokenizer_init forces it back to 1. * multi_tokenizer_mixin: - SocketMapping.send_output gains an optional is_tokenizer flag (only affects log labelling). - multi_http_worker_event_loop now also handles BaseReq, since the detok router may forward single requests downstream. - new MultiDetokenizerRouter class + run_multi_detokenizer_router_process. * detokenizer_manager: skip the unused send_to_tokenizer socket in multi-tokenizer mode (results go through SocketMapping instead). * engine: factor detok launch into _launch_detokenizer_subprocesses, spawning N detok workers + 1 router when detokenizer_worker_num > 1. * test_multi_tokenizer: exercise --detokenizer-worker-num 4.
whybeyoung
added a commit
to whybeyoung/sglang
that referenced
this pull request
Apr 29, 2026
Backport sgl-project#9970 with adaptations for the V4 PD branch. Add a new --detokenizer-worker-num CLI flag that scales the detokenizer out of the single-process bottleneck. When > 1, N DetokenizerManager processes each listen on a private IPC socket and a new MultiDetokenizerRouter process owns the public detokenizer IPC and fans out scheduler outputs by hashing http_worker_ipc (zlib.crc32, so routing is deterministic across runs). Stream-order is preserved because all outputs of the same HTTP/tokenizer worker pin to the same detokenizer. * server_args: new field + CLI arg + divisibility check (tokenizer_worker_num must be a multiple of detokenizer_worker_num); skip_tokenizer_init forces it back to 1. * multi_tokenizer_mixin: - SocketMapping.send_output gains an optional is_tokenizer flag (only affects log labelling). - multi_http_worker_event_loop now also handles BaseReq, since the detok router may forward single requests downstream. - new MultiDetokenizerRouter class + run_multi_detokenizer_router_process. * detokenizer_manager: skip the unused send_to_tokenizer socket in multi-tokenizer mode (results go through SocketMapping instead). * engine: factor detok launch into _launch_detokenizer_subprocesses, spawning N detok workers + 1 router when detokenizer_worker_num > 1. * test_multi_tokenizer: exercise --detokenizer-worker-num 4.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Enable multi detokenizer based on Multi Tokenizer.
Modifications
detokenizer-worker-numto set nums of detokenizer-worker-numMultiTokenizerMixin, and the detokenizer nums must be divisible by tokenizer nums.MultiDetokenizerRouterto route request from scheduler to multi detokenizer.Benchmarking and Profiling
set detokenizer-worker-num = 1,4,8
cmd:
Checklist
Summary by CodeRabbit