Skip to content

Added the ability to Modify the Context Length#210

Merged
comaniac merged 2 commits intosgl-project:mainfrom
psych0v0yager:variable_ctx
Feb 21, 2024
Merged

Added the ability to Modify the Context Length#210
comaniac merged 2 commits intosgl-project:mainfrom
psych0v0yager:variable_ctx

Conversation

@psych0v0yager
Copy link
Copy Markdown
Contributor

@psych0v0yager psych0v0yager commented Feb 20, 2024

Fixes issue #159

You can now specify how much context you want the model to have.

For example Mixtral 8x7b AWQ:

python -m sglang.launch_server --model-path /path/to/bagel_mixtral_AWQ --port 30000 --tp 2

Rank 1: max_total_num_token=135505, max_prefill_num_token=32768, context_len=32768, model_mode=[]
Rank 0: max_total_num_token=135505, max_prefill_num_token=32768, context_len=32768, model_mode=[]

With the adjustment

python -m sglang.launch_server --model-path /path/to/bagel_mixtral_AWQ --port 30000 --tp 2 --context-length 8192

Rank 0: max_total_num_token=135505, max_prefill_num_token=22584, context_len=8192, model_mode=[]
Rank 1: max_total_num_token=135505, max_prefill_num_token=22584, context_len=8192, model_mode=[]

Copy link
Copy Markdown
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@comaniac comaniac linked an issue Feb 21, 2024 that may be closed by this pull request
@comaniac comaniac merged commit 9de9a46 into sgl-project:main Feb 21, 2024
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
lujangus pushed a commit to tails-mpt/sglang that referenced this pull request Mar 31, 2026
* vlm model support flex_attention

* fix lint

* fix rope
EdwardXuy pushed a commit to shun8686/sglang that referenced this pull request Apr 6, 2026
Add MambaCache features testcases and Configuration file support and Forward hooks features testcases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

initialise model with max_model_len

2 participants