Update Readme by merrymercy · Pull Request #11 · sgl-project/sglang

merrymercy · 2024-01-16T10:46:21Z

No description provided.

Rebase ab4a83b

* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend

Bugs fix

* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend

finish training logic

[bugfix]rotary_embedding fix precision error

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

…gl-project#10739 (sgl-project#11) * [Intel XPU]Add XPU device support to Triton attention kernel tests * Update test_triton_attention_kernels.py * Update test_triton_attention_kernels.py --------- Co-authored-by: svc_repro_tool <svc_repro_tool@habana.ai>

[script] add Qwen3-VL README and run scripts

* fix: skip embed init for mm_only mode * fix: skip send health-check-req to encoder with epd mode

…3f8fc50101861f parent 45081af author Your Name <you@example.com> 1767269369 +0800 committer Your Name <you@example.com> 1767269369 +0800 rebase # This is the commit message sgl-project#11: clear cache once

parent 45081af author Your Name <you@example.com> 1767269369 +0800 committer Your Name <you@example.com> 1767269369 +0800 rebase # This is the commit message sgl-project#11: clear cache once # This is the commit message sgl-project#12: simplified VAE cache logic for qwenimage and wan # This is the commit message sgl-project#14: remove duplicated code

- Validate alloc reply_id matches request_id (sgl-project#3) - Remove dead variable num_gen_tokens (sgl-project#4) - Move inline imports to top level (sgl-project#5) - Replace hasattr guards with proper None checks (sgl-project#6) - Demote per-request logs to DEBUG, keep milestones at INFO (sgl-project#11) - Remove unused tree_cache param from start_kv_return_receiver (sgl-project#14)

…inference (sgl-project#11)

Co-authored-by: longGGGGGG <553746008@qq.com>

sgl-project#10 Sweep 1: 3 seeds × 5 ratios. Std 3-5% of mean across all ratios; swing 1.71× (4711→8075) reproduces within noise of original 1.91×. Variance bands now in paper Table 1. sgl-project#11 Setting 4 fallback rule: - Implementation: SGLANG_XPOOL_QDEPTH_TRIGGER added to cross_pool_planner.py (gated, legacy preserved). - Unit tests: 5/5 PASS. - E2E: both arms fired 21 transfers on Phase 1+2+3 (workload doesn't dual-saturate; KV stays <1%). Honest finding documented in §6.4. - Deeper fix (per-pool admission signal everywhere) is follow-up. SETTINGS.md scoreboard reflects both items DONE.

…s 28 xfers v9 pool-binding-shift trace produces real differentiation: - Phase B (KV-bound 8K random): L1+L2 -37% mean TTFT vs stock - Phase C (mixed 4K random): L1+L2 -38% median E2E vs stock - Cross-pool transfers: stock 0, L1-only 0, L2-only 0, L1+L2 28 Two surprising findings documented: 1. Layer 2 alone fires zero transfers — Layer 1 retention is what makes Layer 2 cross the firing threshold. 2. Phase A regresses with L1 (-20% TPS) because K_big=8192 hurts on prefix-friendly GSP. Consistent with A2's K_big=0-wins finding. Adaptive K_big control marked as follow-up. Settings status: Setting 1 marked **DONE v6 NULL + v9 PASS**. All 4 user-requested follow-ups (sgl-project#9 Q3.A 4-arm, sgl-project#10 Sweep 1 multi-seed, sgl-project#11 Setting 4 fallback rule, sgl-project#12 Setting 1 v9 trace) now complete.

Replaces the sparse_attn_v4 stub (which raised NotImplementedError) with a correct Python reference implementation. Direct port of V4 reference inference/kernel.py:sparse_attn_kernel (lines 277-352) + sparse_attn dispatcher (line 355). What this enables: - V4Attention forward path runs end-to-end (was raising at the sparse_attn call site) - HCA layers (compress_ratio=128) use deterministic-stride topk (get_compress_topk_idxs) and now compute attention correctly via the Python reference - Window-only layers (compress_ratio=0) compute attention correctly - CSA layers (compress_ratio=4) currently fall through to the HCA path (deterministic stride) until the NSA Indexer is wired (TODO(phase1-nsa)) What this does NOT enable: - Performance: Python reference is slow. NOT for production use. Phase 5 launches require the NSA tilelang kernel + attn_sink extension per architecture-notes.md "Open risks sgl-project#11" - Numerical agreement vs V4 HF reference: that's a separate validation task on a real GPU with a loaded V4 checkpoint - CSA quality: until NSAIndexer wiring lands, CSA layers use deterministic stride (HCA's behavior) which approximates but doesn't match the V4 reference's learned-index Indexer Algorithm details (matches V4 reference exactly): - Mask topk_idxs == -1 to -inf scores - Compute scaled QK scores: einsum("bshd,bskd->bshk", q, kv) - Numerically-stable softmax with attn_sink contribution to denominator: scores_max = scores.amax(dim=K) sum_exp = sum(exp(scores - scores_max)) sink_term = exp(attn_sink - scores_max) # per-head sink in denominator weights = exp(scores - scores_max) / (sum_exp + sink_term) - Output = einsum("bshk,bskd->bshd", weights, kv) — sink slot has v=0 - Handles all-invalid-row case (output all zeros, sink absorbs mass) V4Attention.forward updated: - The CSA Indexer branch now CALLS self.indexer(x, qr, start_pos, offset) when self.indexer is not None. Currently always None (TODO(phase1-nsa)) so falls through to the deterministic-stride branch. - Comments updated to make the CSA-quality fallback explicit and cross-reference architecture-notes.md "Open risks sgl-project#11". Tests added: - test_sparse_attn_v4_basic_shape: shape contract (B, S, H, D output; no NaN, no Inf) - test_sparse_attn_v4_invalid_indices_zero_contribution: validates the -1 mask handling. Single-valid-idx case: output == that kv. All- invalid case: output == zeros (sink absorbs all softmax mass). test_v4attention_forward_shape stays skipped (depends on DeepseekV4ForCausalLM trunk + load_weights — separate from sparse_attn).

merrymercy added 3 commits January 16, 2024 10:38

update

337507a

update

7518869

update readme

a0e35c6

merrymercy merged this pull request into main Jan 16, 2024

merrymercy deleted the fix branch January 16, 2024 10:46

merrymercy added a commit that referenced this pull request Jan 16, 2024

Update Readme (#11)

fbf4226

Ying1123 pushed a commit that referenced this pull request Sep 13, 2024

Merge pull request #11 from ivanium/pr-rebase-ab4a83b

0175ca2

Rebase ab4a83b

CSEEduanyu mentioned this pull request Jan 26, 2025

[Bug] NCCL Crash with SIGSEGV Frequently when deploying deepseek v3 #2803

Closed

5 tasks

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Update Readme (sgl-project#11)

3d50d7f

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request Mar 14, 2025

switch to weight_packed_linear if cpu_has_amx_support (sgl-project#11)

108d663

* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request Mar 14, 2025

switch to weight_packed_linear if cpu_has_amx_support (sgl-project#11)

710c039

* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request Mar 14, 2025

switch to weight_packed_linear if cpu_has_amx_support (sgl-project#11)

434bb9e

* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend

yanbing-j pushed a commit to yanbing-j/sglang that referenced this pull request Mar 18, 2025

switch to weight_packed_linear if cpu_has_amx_support (sgl-project#11)

74a8f05

* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend

This was referenced Apr 16, 2025

enable ci test: upstream ci for XPU DiweiSun/sglang#4

Closed

Enable CPU CI: upstream CI enabling with github workflow DiweiSun/sglang#3

Closed

Update README.md DiweiSun/sglang#1

Closed

NorthmanPKU added a commit to NorthmanPKU/sglang that referenced this pull request May 16, 2025

Merge pull request sgl-project#11 from NorthmanPKU/device_sync_fix

b37ce6d

Bugs fix

yuleiqin mentioned this pull request May 26, 2025

[Bug] main pd version Exception: Failed to encode tensor map: 700 #6590

Closed

5 tasks

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request May 27, 2025

switch to weight_packed_linear if cpu_has_amx_support (sgl-project#11)

9939b78

* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request May 28, 2025

switch to weight_packed_linear if cpu_has_amx_support (sgl-project#11)

b09ba05

* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request May 28, 2025

switch to weight_packed_linear if cpu_has_amx_support (sgl-project#11)

6d4e316

* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request Jun 3, 2025

switch to weight_packed_linear if cpu_has_amx_support (sgl-project#11)

3927c1a

* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend

Zhou-sx mentioned this pull request Jun 19, 2025

[Bug] Deepseek EP + DP Fail and Accuracy Crush #7041

Closed

5 tasks

sleepcoo pushed a commit to shuaills/sglang that referenced this pull request Jun 24, 2025

Merge pull request sgl-project#11 from sgl-project/feature/training

186795c

finish training logic

siuhunh pushed a commit to xing-wenjin/sglang that referenced this pull request Jul 23, 2025

Merge pull request sgl-project#11 from wangqian108/main

a1a7bbc

[bugfix]rotary_embedding fix precision error

yichiche pushed a commit to yichiche/sglang that referenced this pull request Jul 30, 2025

waves_per_eu (sgl-project#11)

02bf55e

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

yichiche pushed a commit to yichiche/sglang that referenced this pull request Aug 7, 2025

waves_per_eu (sgl-project#11)

cdc4d0c

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

yichiche pushed a commit to yichiche/sglang that referenced this pull request Aug 11, 2025

waves_per_eu (sgl-project#11)

6749c46

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

ericschreiber mentioned this pull request Aug 13, 2025

[Bug] CUDA error: uncorrectable ECC error encountered when using HiCache with xPyD disaggregation. #9151

Closed

5 tasks

gaolaobao mentioned this pull request Aug 25, 2025

[Bug] RTX 5060: RMSNorm failed, same as the #7249 issue, when running qwen2.5-0.5b-instruct model. #9600

Closed

5 tasks

Xia-Weiwen pushed a commit to Xia-Weiwen/sglang that referenced this pull request Sep 9, 2025

align prefill throughput with inductor (sgl-project#11)

da66592

0xymoro mentioned this pull request Nov 10, 2025

[Bug] 0.5.5 custom all reduce crashing #13016

Closed

5 tasks

amd-youchen pushed a commit to amd-youchen/sglang that referenced this pull request Nov 13, 2025

Merge pull request sgl-project#11 from zejunchen-zejun/add_qwen3_readme

40c629b

[script] add Qwen3-VL README and run scripts

yhyang201 pushed a commit that referenced this pull request Dec 13, 2025

fix: skip init emeb_tokens && skip send_encode without mm_inputs (#11)

51afd45

* fix: skip embed init for mm_only mode * fix: skip send health-check-req to encoder with epd mode

tpoisonooo pushed a commit to tpoisonooo/sglang that referenced this pull request Feb 12, 2026

[FIX] clamp k1/k2 len (sgl-project#11)

4498207

putdanil mentioned this pull request Mar 4, 2026

[Bug] FLUX.2-dev FP8 transformer crashes with 4 reference images during denoising #19873

Closed

5 tasks

Estrella-xx added a commit to Estrella-xx/sglang that referenced this pull request Mar 13, 2026

adaptation to support chunk prefill and radix cache in deterministic …

e9ec309

…inference (sgl-project#11)

wisclmy0611 pushed a commit that referenced this pull request Apr 7, 2026

migrate cookbook to new section (#11)

bf03598

Co-authored-by: longGGGGGG <553746008@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Readme#11

Update Readme#11
merrymercy merged 3 commits intomainfrom
fix

merrymercy commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

merrymercy commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant