[AMD] fix amd ci dpskv32 by yctseng0211 · Pull Request #17432 · sgl-project/sglang

yctseng0211 · 2026-01-20T16:09:05Z

Motivation

Fix the runtime error from PR-17205 :
https://github.com/sgl-project/sglang/actions/runs/21157007917/job/60858903195?pr=17205#step:6:17583

  File "/sglang-checkout/python/sglang/srt/layers/attention/nsa/nsa_indexer.py", line 1008, in forward_cuda
    weights = self._get_logits_head_gate(x_for_gate, q_scale)
  File "/sglang-checkout/python/sglang/srt/layers/attention/nsa/nsa_indexer.py", line 230, in _get_logits_head_gate
    weights, _ = self.weights_proj(x)
  File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/sglang-checkout/python/sglang/srt/layers/linear.py", line 260, in forward
    output = self.quant_method.apply(self, x, bias)
  File "/sglang-checkout/python/sglang/srt/layers/quantization/unquant.py", line 143, in apply
    return F.linear(x, layer.weight, bias)
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

…istration times to 5400 for DeepSeek V3.2 tests.

michaelzhang-ai · 2026-01-21T22:56:09Z

https://github.com/sgl-project/sglang/actions/runs/21209864077?pr=17432 passed and ready to merge. The PR will fix current dpv32 issue and largely improve queue time of mi35x. @HaiShaw cc: @yctseng0211

hubertlu-tw

LGTM
Thanks for the fix.

HaiShaw · 2026-01-22T04:34:45Z

Only changed to AMD path

Co-authored-by: michaelzhang-ai <michaelzhang.ai@users.noreply.github.com>

fix amd ci dpskv32

66ac70c

yctseng0211 added the amd label Jan 20, 2026

temp reset the needs of stage c test

2851b87

yctseng0211 marked this pull request as ready for review January 20, 2026 16:13

yctseng0211 requested review from Fridge003, Kangyan-Zhou, Qiaolin-Yu, hebiao064, ispobock and merrymercy as code owners January 20, 2026 16:13

yctseng0211 added the run-ci label Jan 20, 2026

Increase server launch timeout to 3600 for AMD CI and update test reg…

9a275c1

…istration times to 5400 for DeepSeek V3.2 tests.

github-actions Bot added the deepseek label Jan 21, 2026

michaelzhang-ai and others added 5 commits January 20, 2026 19:47

Skip DP tests for AMD CI in DeepSeek V3.2 to optimize execution time.

6f2836b

revert the needs

b0758a0

set estimated time

a746ef1

adjust thres for TP-MTP, disable basic for saving time

78976cc

set partitions as 2

a6f77cc

yctseng0211 mentioned this pull request Jan 21, 2026

[AMD] CI - migrate perf test and fix stage-b-test-1-gpu-amd #17340

Merged

5 tasks

michaelzhang-ai approved these changes Jan 21, 2026

View reviewed changes

hubertlu-tw mentioned this pull request Jan 21, 2026

[Bug] failed to launch deepseek v3.2 with lmsysorg/sglang:v0.5.6-rocm700-mi35x #14500

Closed

5 tasks

hubertlu-tw approved these changes Jan 22, 2026

View reviewed changes

HaiShaw requested changes Jan 22, 2026

View reviewed changes

Comment thread python/sglang/srt/layers/attention/nsa/nsa_indexer.py Outdated

Comment thread python/sglang/srt/layers/attention/nsa/nsa_indexer.py Outdated

conditional on _is_hip

63b7b7e

hubertlu-tw mentioned this pull request Jan 22, 2026

[AMD] Support ds3.2 on gfx942 platform #17504

Merged

5 tasks

HaiShaw approved these changes Jan 22, 2026

View reviewed changes

HaiShaw merged commit 17807ca into sgl-project:main Jan 22, 2026
33 of 74 checks passed

Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026

[AMD] fix amd ci dpskv32 (sgl-project#17432)

63dcd0b

Co-authored-by: michaelzhang-ai <michaelzhang.ai@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] fix amd ci dpskv32#17432

[AMD] fix amd ci dpskv32#17432
HaiShaw merged 9 commits intosgl-project:mainfrom
yctseng0211:fix_dpsk_0120

yctseng0211 commented Jan 20, 2026 •

edited

Loading

Uh oh!

michaelzhang-ai commented Jan 21, 2026

Uh oh!

hubertlu-tw left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HaiShaw commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yctseng0211 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

michaelzhang-ai commented Jan 21, 2026

Uh oh!

hubertlu-tw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HaiShaw commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yctseng0211 commented Jan 20, 2026 •

edited

Loading