Update Readme#11
Merged
merrymercy merged 3 commits intomainfrom Jan 16, 2024
Merged
Conversation
merrymercy
added a commit
that referenced
this pull request
Jan 16, 2024
Ying1123
pushed a commit
that referenced
this pull request
Sep 13, 2024
5 tasks
timethink
pushed a commit
to timethink/sglang
that referenced
this pull request
Mar 9, 2025
chunyuan-w
added a commit
to chunyuan-w/sglang
that referenced
this pull request
Mar 14, 2025
* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend
chunyuan-w
added a commit
to chunyuan-w/sglang
that referenced
this pull request
Mar 14, 2025
* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend
chunyuan-w
added a commit
to chunyuan-w/sglang
that referenced
this pull request
Mar 14, 2025
* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend
yanbing-j
pushed a commit
to yanbing-j/sglang
that referenced
this pull request
Mar 18, 2025
* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend
This was referenced Apr 16, 2025
NorthmanPKU
added a commit
to NorthmanPKU/sglang
that referenced
this pull request
May 16, 2025
5 tasks
chunyuan-w
added a commit
to chunyuan-w/sglang
that referenced
this pull request
May 27, 2025
* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend
chunyuan-w
added a commit
to chunyuan-w/sglang
that referenced
this pull request
May 28, 2025
* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend
chunyuan-w
added a commit
to chunyuan-w/sglang
that referenced
this pull request
May 28, 2025
* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend
chunyuan-w
added a commit
to chunyuan-w/sglang
that referenced
this pull request
Jun 3, 2025
* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend
5 tasks
sleepcoo
pushed a commit
to shuaills/sglang
that referenced
this pull request
Jun 24, 2025
finish training logic
siuhunh
pushed a commit
to xing-wenjin/sglang
that referenced
this pull request
Jul 23, 2025
[bugfix]rotary_embedding fix precision error
yichiche
pushed a commit
to yichiche/sglang
that referenced
this pull request
Jul 30, 2025
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
yichiche
pushed a commit
to yichiche/sglang
that referenced
this pull request
Aug 7, 2025
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
yichiche
pushed a commit
to yichiche/sglang
that referenced
this pull request
Aug 11, 2025
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
5 tasks
5 tasks
Xia-Weiwen
pushed a commit
to Xia-Weiwen/sglang
that referenced
this pull request
Sep 9, 2025
kalyank007
pushed a commit
to kalyank007/sglang
that referenced
this pull request
Nov 7, 2025
…gl-project#10739 (sgl-project#11) * [Intel XPU]Add XPU device support to Triton attention kernel tests * Update test_triton_attention_kernels.py * Update test_triton_attention_kernels.py --------- Co-authored-by: svc_repro_tool <svc_repro_tool@habana.ai>
5 tasks
amd-youchen
pushed a commit
to amd-youchen/sglang
that referenced
this pull request
Nov 13, 2025
[script] add Qwen3-VL README and run scripts
yhyang201
pushed a commit
that referenced
this pull request
Dec 13, 2025
* fix: skip embed init for mm_only mode * fix: skip send health-check-req to encoder with epd mode
triple-mu
pushed a commit
to triple-mu/sglang
that referenced
this pull request
Jan 1, 2026
…3f8fc50101861f parent 45081af author Your Name <you@example.com> 1767269369 +0800 committer Your Name <you@example.com> 1767269369 +0800 rebase # This is the commit message sgl-project#11: clear cache once
triple-mu
pushed a commit
to triple-mu/sglang
that referenced
this pull request
Jan 1, 2026
parent 45081af author Your Name <you@example.com> 1767269369 +0800 committer Your Name <you@example.com> 1767269369 +0800 rebase # This is the commit message sgl-project#11: clear cache once # This is the commit message sgl-project#12: simplified VAE cache logic for qwenimage and wan # This is the commit message sgl-project#14: remove duplicated code
tpoisonooo
pushed a commit
to tpoisonooo/sglang
that referenced
this pull request
Feb 12, 2026
MatejKosec
added a commit
to MatejKosec/sglang
that referenced
this pull request
Feb 25, 2026
- Validate alloc reply_id matches request_id (sgl-project#3) - Remove dead variable num_gen_tokens (sgl-project#4) - Move inline imports to top level (sgl-project#5) - Replace hasattr guards with proper None checks (sgl-project#6) - Demote per-request logs to DEBUG, keep milestones at INFO (sgl-project#11) - Remove unused tree_cache param from start_kv_return_receiver (sgl-project#14)
MatejKosec
added a commit
to MatejKosec/sglang
that referenced
this pull request
Feb 26, 2026
- Validate alloc reply_id matches request_id (sgl-project#3) - Remove dead variable num_gen_tokens (sgl-project#4) - Move inline imports to top level (sgl-project#5) - Replace hasattr guards with proper None checks (sgl-project#6) - Demote per-request logs to DEBUG, keep milestones at INFO (sgl-project#11) - Remove unused tree_cache param from start_kv_return_receiver (sgl-project#14)
5 tasks
Estrella-xx
added a commit
to Estrella-xx/sglang
that referenced
this pull request
Mar 13, 2026
wisclmy0611
pushed a commit
that referenced
this pull request
Apr 7, 2026
Co-authored-by: longGGGGGG <553746008@qq.com>
rucnyz
added a commit
to rucnyz/sglang
that referenced
this pull request
Apr 30, 2026
sgl-project#10 Sweep 1: 3 seeds × 5 ratios. Std 3-5% of mean across all ratios; swing 1.71× (4711→8075) reproduces within noise of original 1.91×. Variance bands now in paper Table 1. sgl-project#11 Setting 4 fallback rule: - Implementation: SGLANG_XPOOL_QDEPTH_TRIGGER added to cross_pool_planner.py (gated, legacy preserved). - Unit tests: 5/5 PASS. - E2E: both arms fired 21 transfers on Phase 1+2+3 (workload doesn't dual-saturate; KV stays <1%). Honest finding documented in §6.4. - Deeper fix (per-pool admission signal everywhere) is follow-up. SETTINGS.md scoreboard reflects both items DONE.
rucnyz
added a commit
to rucnyz/sglang
that referenced
this pull request
Apr 30, 2026
…s 28 xfers v9 pool-binding-shift trace produces real differentiation: - Phase B (KV-bound 8K random): L1+L2 -37% mean TTFT vs stock - Phase C (mixed 4K random): L1+L2 -38% median E2E vs stock - Cross-pool transfers: stock 0, L1-only 0, L2-only 0, L1+L2 28 Two surprising findings documented: 1. Layer 2 alone fires zero transfers — Layer 1 retention is what makes Layer 2 cross the firing threshold. 2. Phase A regresses with L1 (-20% TPS) because K_big=8192 hurts on prefix-friendly GSP. Consistent with A2's K_big=0-wins finding. Adaptive K_big control marked as follow-up. Settings status: Setting 1 marked **DONE v6 NULL + v9 PASS**. All 4 user-requested follow-ups (sgl-project#9 Q3.A 4-arm, sgl-project#10 Sweep 1 multi-seed, sgl-project#11 Setting 4 fallback rule, sgl-project#12 Setting 1 v9 trace) now complete.
lujangus
added a commit
to tails-mpt/sglang
that referenced
this pull request
May 1, 2026
Replaces the sparse_attn_v4 stub (which raised NotImplementedError) with a correct Python reference implementation. Direct port of V4 reference inference/kernel.py:sparse_attn_kernel (lines 277-352) + sparse_attn dispatcher (line 355). What this enables: - V4Attention forward path runs end-to-end (was raising at the sparse_attn call site) - HCA layers (compress_ratio=128) use deterministic-stride topk (get_compress_topk_idxs) and now compute attention correctly via the Python reference - Window-only layers (compress_ratio=0) compute attention correctly - CSA layers (compress_ratio=4) currently fall through to the HCA path (deterministic stride) until the NSA Indexer is wired (TODO(phase1-nsa)) What this does NOT enable: - Performance: Python reference is slow. NOT for production use. Phase 5 launches require the NSA tilelang kernel + attn_sink extension per architecture-notes.md "Open risks sgl-project#11" - Numerical agreement vs V4 HF reference: that's a separate validation task on a real GPU with a loaded V4 checkpoint - CSA quality: until NSAIndexer wiring lands, CSA layers use deterministic stride (HCA's behavior) which approximates but doesn't match the V4 reference's learned-index Indexer Algorithm details (matches V4 reference exactly): - Mask topk_idxs == -1 to -inf scores - Compute scaled QK scores: einsum("bshd,bskd->bshk", q, kv) - Numerically-stable softmax with attn_sink contribution to denominator: scores_max = scores.amax(dim=K) sum_exp = sum(exp(scores - scores_max)) sink_term = exp(attn_sink - scores_max) # per-head sink in denominator weights = exp(scores - scores_max) / (sum_exp + sink_term) - Output = einsum("bshk,bskd->bshd", weights, kv) — sink slot has v=0 - Handles all-invalid-row case (output all zeros, sink absorbs mass) V4Attention.forward updated: - The CSA Indexer branch now CALLS self.indexer(x, qr, start_pos, offset) when self.indexer is not None. Currently always None (TODO(phase1-nsa)) so falls through to the deterministic-stride branch. - Comments updated to make the CSA-quality fallback explicit and cross-reference architecture-notes.md "Open risks sgl-project#11". Tests added: - test_sparse_attn_v4_basic_shape: shape contract (B, S, H, D output; no NaN, no Inf) - test_sparse_attn_v4_invalid_indices_zero_contribution: validates the -1 mask handling. Single-valid-idx case: output == that kv. All- invalid case: output == zeros (sink absorbs all softmax mass). test_v4attention_forward_shape stays skipped (depends on DeepseekV4ForCausalLM trunk + load_weights — separate from sparse_attn).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.