Skip to content

Fix FA3 swa spec pg_size > 1#20369

Merged
ispobock merged 1 commit intomainfrom
fix-swa-spec-page
Mar 12, 2026
Merged

Fix FA3 swa spec pg_size > 1#20369
ispobock merged 1 commit intomainfrom
fix-swa-spec-page

Conversation

@ispobock
Copy link
Copy Markdown
Collaborator

@ispobock ispobock commented Mar 11, 2026

Motivation

close #20334

Modifications

Accuracy Tests

python3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-V2-Flash --tp-size 4 --trust-remote-code --mem-fraction-static 0.7 --model-loader-extra-config '{"enable_multithread_load": "true","num_threads": 64}' --attention-backend fa3 --speculative-algorithm EAGLE --speculative-num-step 3 --speculative-eagle-topk=1 --speculative-num-draft-tokens=4 --page-size 64 --cuda-graph-max-bs 64
python3 benchmark/gsm8k/bench_sglang.py --parallel 1400 --num-questions 1400 

Accuracy: 0.827
Invalid: 0.000
Latency: 86.098 s
Output throughput: 2088.815 token/s

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@ispobock ispobock changed the title Fix FA33 swa spec pg_size > 1 Fix FA3 swa spec pg_size > 1 Mar 11, 2026
@ispobock
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@ispobock
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@hnyls2002
Copy link
Copy Markdown
Collaborator

Great Job

@ispobock ispobock merged commit ae7c239 into main Mar 12, 2026
272 of 300 checks passed
@ispobock ispobock deleted the fix-swa-spec-page branch March 12, 2026 03:42
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 12, 2026
liubiyongge pushed a commit to liubiyongge/sglang that referenced this pull request Mar 13, 2026
whybeyoung pushed a commit to whybeyoung/sglang that referenced this pull request Mar 14, 2026
whybeyoung pushed a commit to whybeyoung/sglang that referenced this pull request Mar 14, 2026
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Mar 15, 2026
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] CUDA Graph + MTP + page_size = 64 MiMo-V2-Flash precision issue

2 participants