Fix for T4 GPUs by Ying1123 · Pull Request #16 · sgl-project/sglang

Ying1123 · 2024-01-16T23:44:38Z

No description provided.

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

[Feature] Accelerate VisionAttention by precompute H2D part in every …

Signed-off-by: Stanley Winata <stanley.winata@amd.com> [Wave] Add wave extend attention kernel Signed-off-by: Harsh Menon <harsh@nod-labs.com> [Wave] Adding logit_cap and layer scaling to API Also add support for the wave backend to the model runner. And use Triton decode kernels for now. [Wave] Run chunked prefill for perf comparison on Wave test Need to rename the non chunked/regular prefill version because otherwise rpd will treat it as the same kernel Signed-off-by: Stanley Winata <stanley.winata@amd.com> [Wave] Cache the function that loads the wave kernel Also maintain a global kernel hash to avoid recomputing the hash on every call. [Wave] Don't specify block size and enable buffer ops [Wave] Enable wave runtime and update scheduling API [Wave] Update API to use wave_compile & WaveCompileOptions [Wave] Update wave backend and extend attention to latest [Wave] Add speculative decode kernel Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com> cache kernels using lru_cache Update WaveBackend to use Wave Decode (sgl-project#6) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Revert "Update WaveBackend to use Wave Decode (sgl-project#6)" (sgl-project#7) This reverts commit eac4599. Wave Backend decode (sgl-project#8) * align shapes Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> * fix Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Wave backend fixes (sgl-project#10) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> More fixes to Wave decode (sgl-project#12) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> is_causal Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Enable the grok in3 model (sgl-project#14) Set unique cache dir for each worker (sgl-project#16) update kernel (sgl-project#18) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> updated spec decode test as per wave Signed-off-by: xintin <gaurav.verma@amd.com> fix extend (sgl-project#23) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Refactor paged decode intermediate arrays shapes (sgl-project#24) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> remove dyn symbols (sgl-project#26) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> cleanup shapes (sgl-project#27) Some fields were removed from `paged_decode_attention_shape`. Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Remove `mha` param from Wave decode attention kernel (sgl-project#28) Depends on iree-org/iree-turbine#1039 Signed-off-by: Paul Zhang <paul.zhang@amd.com> nfc: fix problems reported by linting update references from iree.turbine to wave_lang

…ate_sgl-jax_discussions bugfix: use sgl-jax discussions address

fix bug

docs: polish sglang launch & add python request examples

5 adverse-condition tests (4/5 PASS): client disconnect, SIGKILL mid-inference (with startup preload verification), SIGKILL during write, SIGTERM, abort+save. New bug #16: SIGTERM graceful shutdown hangs >60s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: added classification-model.mdx * fix: modified the description

Replace MTP/EAGLE speculative decoding benchmarks with standard baseline results using plain sglang serve (no speculative flags), per sgl-cookbook issue sgl-project#16. All 4 variants (35B-A3B FP8/BF16 and 27B FP8/BF16) benchmarked on 1× H100 NVL across Chat (1K/1K), Reasoning (1K/8K), and Summarization (8K/1K) scenarios at concurrency 1, 16, and 64/100. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Ying1123 and others added 4 commits January 16, 2024 23:22

fix for t4/v100

51ed49b

fix for t4

e2dbd4f

fix

a2e0bd3

fix

93ba485

Ying1123 merged commit ffe4aae into main Jan 16, 2024

Ying1123 deleted the fix branch January 16, 2024 23:49

Ying1123 mentioned this pull request Jan 17, 2024

Colab? #14

Closed

CSEEduanyu mentioned this pull request Jan 26, 2025

[Bug] NCCL Crash with SIGSEGV Frequently when deploying deepseek v3 #2803

Closed

5 tasks

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Fix for T4 GPUs (sgl-project#16)

f3b768f

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

yanbing-j pushed a commit to yanbing-j/sglang that referenced this pull request Mar 19, 2025

Switch to weight_packed_linear for MoEGate and lm_head (sgl-project#16)

18864b4

pi314ever pushed a commit to pi314ever/sglang that referenced this pull request Apr 23, 2025

Fix CLANG pre_commit for model_runner.py. (sgl-project#16)

b75996c

yuleiqin mentioned this pull request May 26, 2025

[Bug] main pd version Exception: Failed to encode tensor map: 700 #6590

Closed

5 tasks

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request May 27, 2025

Switch to weight_packed_linear for MoEGate and lm_head (sgl-project#16)

dc0b04c

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request May 28, 2025

Switch to weight_packed_linear for MoEGate and lm_head (sgl-project#16)

f5fff6c

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request May 28, 2025

Switch to weight_packed_linear for MoEGate and lm_head (sgl-project#16)

996de3e

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request Jun 3, 2025

Switch to weight_packed_linear for MoEGate and lm_head (sgl-project#16)

abb46d1

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request Jun 6, 2025

Switch to weight_packed_linear for MoEGate and lm_head (sgl-project#16)

1e4e60b

pengxin99 pushed a commit to pengxin99/sglang that referenced this pull request Jun 19, 2025

Fix silu_mul w/ basic torch ops (sgl-project#16)

1549bde

Zhou-sx mentioned this pull request Jun 19, 2025

[Bug] Deepseek EP + DP Fail and Accuracy Crush #7041

Closed

5 tasks

yichiche pushed a commit to yichiche/sglang that referenced this pull request Jul 30, 2025

Set unique cache dir for each worker (sgl-project#16)

d9e3745

yichiche pushed a commit to yichiche/sglang that referenced this pull request Aug 7, 2025

Set unique cache dir for each worker (sgl-project#16)

c6cb1ee

yichiche pushed a commit to yichiche/sglang that referenced this pull request Aug 11, 2025

Set unique cache dir for each worker (sgl-project#16)

a5a0e79

ericschreiber mentioned this pull request Aug 13, 2025

[Bug] CUDA error: uncorrectable ECC error encountered when using HiCache with xPyD disaggregation. #9151

Closed

5 tasks

gaolaobao mentioned this pull request Aug 25, 2025

[Bug] RTX 5060: RMSNorm failed, same as the #7249 issue, when running qwen2.5-0.5b-instruct model. #9600

Closed

5 tasks

0xymoro mentioned this pull request Nov 10, 2025

[Bug] 0.5.5 custom all reduce crashing #13016

Closed

5 tasks

yangzhipeng1108 mentioned this pull request Nov 13, 2025

[Don't merge] Deploying DeepSeek-R1 on H20-96G with SGLang: Best Practices antgroup/sglang#4

Draft

amd-youchen pushed a commit to amd-youchen/sglang that referenced this pull request Nov 18, 2025

Merge pull request sgl-project#16 from ZLkanyo009/dev/perf

fe9b52b

[Feature] Accelerate VisionAttention by precompute H2D part in every …

apinge pushed a commit to apinge/sglang that referenced this pull request Nov 18, 2025

Merge pull request sgl-project#16 from ZLkanyo009/dev/perf

e222f97

[Feature] Accelerate VisionAttention by precompute H2D part in every …

Garrybest pushed a commit to Garrybest/sglang that referenced this pull request Jan 9, 2026

Merge pull request sgl-project#16 from sgl-project/bugfix/issue_templ…

f2d94a2

…ate_sgl-jax_discussions bugfix: use sgl-jax discussions address

wzrf pushed a commit to wzrf/sglang-fusionrag that referenced this pull request Feb 8, 2026

Merge pull request sgl-project#16 from kvcache-ai/glm4.7-eplb

860d8ee

fix bug

putdanil mentioned this pull request Mar 4, 2026

[Bug] FLUX.2-dev FP8 transformer crashes with 4 reference images during denoising #19873

Closed

5 tasks

alphabetc1 pushed a commit to alphabetc1/sglang that referenced this pull request Mar 14, 2026

Merge pull request sgl-project#16 from alphabetc1/doc/example_optimize

eef1d43

docs: polish sglang launch & add python request examples

wisclmy0611 pushed a commit that referenced this pull request Apr 7, 2026

feat: added classification-model.mdx (#16)

3295945

* feat: added classification-model.mdx * fix: modified the description

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for T4 GPUs#16

Fix for T4 GPUs#16
Ying1123 merged 4 commits intomainfrom
fix

Ying1123 commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ying1123 commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants