Teak mem fraction by merrymercy · Pull Request #20 · sgl-project/sglang

merrymercy · 2024-01-17T12:43:09Z

No description provided.

* Add fused biased_grouped_topk * add record function

* feat: automated pip install, but manual environmental variable * fix:small bug * feat: added requirements.txt, but encountered some other issues * fix: run_time lib configed correctly, but _cython does not exist * fix: worked? * fix: no .vscode * fix: "re-Deleted .vscode directory"

* Add fused biased_grouped_topk * add record function

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* sync models and datasets every 6 hours Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>

…)" (sgl-project#29)

…sed-new [feat] enable fused qknorm and rope

…and revert necessary changes (sgl-project#20) * refactor: enhance KL divergence method in DiagonalGaussianDistribution for flexible dimension handling and clean up DAC class by removing unused code * refactor: update DacVAE architecture, configuration and its customized loader. * Revert "fix: update adjust_frames parameter to False for improved multi-GPU compatibility" * revert changes in base pipeline configs * revert changes in configs/sample/__init__.py * [Feature] Remove weight norm in DAC * [Fix] Use legacy weight norm, which can be removed * [Fix] remove weight norm at the right place * [Chore] update test script * Revert "[Fix] remove weight norm at the right place" This reverts commit 3a0accbae41650e926c5828025323a12454827a4. * Revert "[Fix] Use legacy weight norm, which can be removed" This reverts commit eb93f20f134888adba4a5124fa1d167b93d180e7. * Revert "[Feature] Remove weight norm in DAC" This reverts commit aaa64abbc25112a706bf3d3604ffeac390a1d8a8. * [Feature] Remove all weight norm from DAC modeling --------- Co-authored-by: CloudRipple <yiyangzhang25@m.fudan.edu.cn>

* port layernorm 3d * apply layernorm * support for bias * fix * intf fix * add support for CPU * fix tp=3/6 padding issue in encoder vision * fix tp=3/6 padding issue in qwen3-omni * refactor code * add mrope * change attention_mask shape to use flash attn * add kernel apply_rotary_pos_emb_cpu * replace nn.Linear with ReplicatedLinear * enable torch.compile * construct mask using query.dtype instead of bool on CPU * add fast path for sparse attention * fix double free segfault by wrong setting of BLOCK_M * improve extend kernel performance for long context length * update test_extend.py * update comment * fix topk softmax performance issue * port optimization for image preprocessor in Qwen2VLImageProcessorFast * apply optimization for image preprocessor * update docker file * optimize conv3d used in patch embedding * resolve conflict * apply optimized conv3d * apply optimization for flash_attn_varlen_func (sgl-project#19) * port optimization for flash_attn_varlen_func * apply flash_attn_varlen_func * remove contiguous before rope (sgl-project#20) * Revert "resolve conflict" This reverts commit 7622f6d. * fix after rebase * Update pyproject_cpu.toml * Update xeon.Dockerfile * minor fix after rebase * rope: add support for bf16 sincos (sgl-project#102) * format * Update xeon.Dockerfile * odd tp for cpu * Apply linear_gelu_linear and fix numa memory bind (sgl-project#22) * [CPU] Optimize small oc GEMM for Qwen3-next on CPU (sgl-project#12446) Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com> * port linear_gelu_linear kernel * apply linear_gelu_linear for TP=1 * fix numa memory bind * apply parallel partition patch --------- Co-authored-by: jianan-gu <jianan.gu@intel.com> * Revert "Fix: test_vlm_offline_throughput output throughput (sgl-project#13279)" (sgl-project#101) This reverts commit 7ee3e36. * fix input dtype mismatch issue * apply optimized layernorm --------- Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com> Co-authored-by: ZailiWang <zaili.wang@intel.com> Co-authored-by: mingfeima <mingfei.ma@intel.com> Co-authored-by: jianan-gu <jianan.gu@intel.com>

support Qwen3.5

Add Acknowledgment To the Team mates

…ocr (#20…" This reverts commit c330b68.

Gemma 4 audio/vision tower changes their weights from `qkv_proj.weight` to `qkv_proj.linear.weight` Gemma 4 vision pipeline have some misc changes

* converted supported-models/reward-models.mdx * fixed reward models * converting rerank models

…e Squeeze) Adds two server-args flags and routes them into the Eagle3 verifier path: --speculative-verify-mode rejection_sampling | typical_acceptance --speculative-typical-acceptance-alpha float (default 0.8) Defaults to "rejection_sampling" so existing deployments are unchanged. When set to "typical_acceptance", the Eagle3 verifier in eagle_info.py (single-layer) and eagle_info_v2.py (multi-layer) routes the alpha value into both threshold_single and threshold_acc that the existing CUDA kernel already consumes: // sgl-kernel/csrc/speculative/speculative_sampling.cuh:80 if (coin <= prob_acc / threshold_acc || target_prob_single >= threshold_single) accept; When alpha=1.0 this reduces to strict rejection sampling. When alpha<1.0 this is exactly Medusa's typical-acceptance algorithm. No CUDA changes. Pre-flight context (Crucible umbrella): experiments/MiniMax-M2.5/squeeze/relaxed/B1-typical-acceptance/preflight.md Plan reference (Crucible umbrella): docs/plans/minimax-squeeze.md §184-191 (Track B B1) Squeeze pipeline coordination (Crucible umbrella): CLAUDE.md rules sgl-project#20/sgl-project#21/sgl-project#22 Branch policy: this commit lives on `squeeze-relaxed` only. The umbrella crucible repo's sglang submodule pointer continues to track origin/main of sglang. The flag will be merged to main after the alpha-sweep bench validates per the squeeze plan's quality floor (≤3% per-dataset drift from Exp F lossless baseline).

AnzeXie added 2 commits January 17, 2024 11:45

update

a0dd253

fix

c637e1a

merrymercy merged commit f9d7238 into main Jan 17, 2024

merrymercy deleted the teak-mem branch January 17, 2024 12:43

CSEEduanyu mentioned this pull request Jan 26, 2025

[Bug] NCCL Crash with SIGSEGV Frequently when deploying deepseek v3 #2803

Closed

5 tasks

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Teak mem fraction (sgl-project#20)

3e64e9f

chunyuan-w pushed a commit to chunyuan-w/sglang that referenced this pull request Mar 25, 2025

Add fused biased_grouped_topk (sgl-project#20)

a6c4b3d

* Add fused biased_grouped_topk * add record function

yuleiqin mentioned this pull request May 26, 2025

[Bug] main pd version Exception: Failed to encode tensor map: 700 #6590

Closed

5 tasks

chunyuan-w pushed a commit to chunyuan-w/sglang that referenced this pull request May 28, 2025

Add fused biased_grouped_topk (sgl-project#20)

d3752b7

* Add fused biased_grouped_topk * add record function

yanbing-j added a commit to yanbing-j/sglang that referenced this pull request Jun 3, 2025

Add fused biased_grouped_topk (sgl-project#20)

3f94893

* Add fused biased_grouped_topk * add record function

yanbing-j added a commit to yanbing-j/sglang that referenced this pull request Jun 4, 2025

Add fused biased_grouped_topk (sgl-project#20)

7c4b6f9

* Add fused biased_grouped_topk * add record function

yanbing-j added a commit to yanbing-j/sglang that referenced this pull request Jun 10, 2025

Add fused biased_grouped_topk (sgl-project#20)

6535049

* Add fused biased_grouped_topk * add record function

yanbing-j added a commit to yanbing-j/sglang that referenced this pull request Jun 18, 2025

Add fused biased_grouped_topk (sgl-project#20)

2ad810a

* Add fused biased_grouped_topk * add record function

Zhou-sx mentioned this pull request Jun 19, 2025

[Bug] Deepseek EP + DP Fail and Accuracy Crush #7041

Closed

5 tasks

blzheng referenced this pull request in blzheng/sglang Jun 23, 2025

fix head_dim in qwen2 when num_attention_heads is padded (#20)

2bf8d2f

yichiche pushed a commit to yichiche/sglang that referenced this pull request Jul 30, 2025

decode output dtype (sgl-project#20)

46e485a

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

yichiche pushed a commit to yichiche/sglang that referenced this pull request Aug 7, 2025

decode output dtype (sgl-project#20)

8715af3

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

yichiche pushed a commit to yichiche/sglang that referenced this pull request Aug 11, 2025

decode output dtype (sgl-project#20)

406547f

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

ericschreiber mentioned this pull request Aug 13, 2025

[Bug] CUDA error: uncorrectable ECC error encountered when using HiCache with xPyD disaggregation. #9151

Closed

5 tasks

gaolaobao mentioned this pull request Aug 25, 2025

[Bug] RTX 5060: RMSNorm failed, same as the #7249 issue, when running qwen2.5-0.5b-instruct model. #9600

Closed

5 tasks

zhuyijie88 added a commit to zhuyijie88/sglang that referenced this pull request Sep 1, 2025

pd disaggregation support mtp kvcache (sgl-project#20)

ebbcd85

zhuyijie88 pushed a commit to zhuyijie88/sglang that referenced this pull request Sep 4, 2025

sync models and datasets every 6 hours (sgl-project#20)

6e18b58

* sync models and datasets every 6 hours Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>

key4ng pushed a commit to key4ng/sglang that referenced this pull request Nov 9, 2025

[BUG] Fix left tokens to sample in _sample_text (sgl-project#20)

aabb754

key4ng pushed a commit to key4ng/sglang that referenced this pull request Nov 9, 2025

Revert "[BUG] Fix left tokens to sample in _sample_text (sgl-project#20…

08b44bf

…)" (sgl-project#29)

0xymoro mentioned this pull request Nov 10, 2025

[Bug] 0.5.5 custom all reduce crashing #13016

Closed

5 tasks

amd-youchen pushed a commit to amd-youchen/sglang that referenced this pull request Nov 24, 2025

Merge pull request sgl-project#20 from amd-youchen/dev-rope-qknorm-fu…

ace9f2c

…sed-new [feat] enable fused qknorm and rope

sywangyi pushed a commit to sywangyi/sglang that referenced this pull request Feb 26, 2026

remove contiguous before rope (sgl-project#20)

50f3e4e

putdanil mentioned this pull request Mar 4, 2026

[Bug] FLUX.2-dev FP8 transformer crashes with 4 reference images during denoising #19873

Closed

5 tasks

SCDESPERTATE pushed a commit to SCDESPERTATE/sglang that referenced this pull request Mar 12, 2026

Merge pull request sgl-project#20 from kvcache-ai/qwen3.5

a864420

support Qwen3.5

alphabetc1 pushed a commit to alphabetc1/sglang that referenced this pull request Mar 14, 2026

Merge pull request sgl-project#20 from zhaochenyang20/update_doc

1c50a0f

Add Acknowledgment To the Team mates

mickqian added a commit that referenced this pull request Mar 17, 2026

Revert "[Bugfix] Fix GLM-4.6V vision regression in glm4v_moe and glm_…

20e371a

…ocr (#20…" This reverts commit c330b68.

KHAEntertainment mentioned this pull request Apr 2, 2026

Bug: delete_snapshot returns None, causing false success=False responses #21893

Closed

wisclmy0611 pushed a commit that referenced this pull request Apr 7, 2026

feat: added reward-models.mdx and rerank models (#20)

694cdef

* converted supported-models/reward-models.mdx * fixed reward models * converting rerank models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Teak mem fraction#20

Teak mem fraction#20
merrymercy merged 2 commits intomainfrom
teak-mem

merrymercy commented Jan 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

merrymercy commented Jan 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants