Improve error message & Add vicuna template#57
Merged
merrymercy merged 1 commit intomainfrom Jan 20, 2024
Merged
Conversation
5 tasks
timethink
pushed a commit
to timethink/sglang
that referenced
this pull request
Mar 9, 2025
NorthmanPKU
pushed a commit
to NorthmanPKU/sglang
that referenced
this pull request
May 16, 2025
…roject#50) * [Transpiler] Add config class and interface for the transpiler * [Misc] Add a static checking for STensor::num_elements * [Transpiler] Implement all kernel operators and their runtime * Format code * Add support for broadcast in ElemwiseBinaryKernel * Refine Python interface for transpiler & Add layout resolve for DTensor * Finish layout resolution for threadblock level ops * Add basic support for threadblock level transpiling * Merge Python JIT frontend (sgl-project#40). Add support for Python JIT frontend * temp frontend * bug fix * bug fix * frontend with output shapes/strides * add jit demo * Add checking for MIRAGE_ROOT --------- Co-authored-by: Shengyu Liu <interestingLSY@gmail.com> * Add unit test for the transpiler * Add document for the transpiler * Add threadblock level matmul operator * Add threadblock-level reduction op * clang format * nits * Remove nonexist examples * Merge from main * Add tb scheduling & Add support for forloop accumulator * Add support for tb operator fusion * TB_FORLOOP_ACCUM_OP->TB_FORLOOP_ACCUM_NO_RED_OP * Refine documents * nits * nits * Add support for chunked copy and async copy * fix Python JIT compilation errors * [CUDA Transpiler] Fix Python JIT compilation errors (sgl-project#51) * fix Python JIT compilation errors * Remove __getattr__ from wrapper --------- Co-authored-by: SpiritedAwayCN <541845219@qq.com> * checkpoint * Bugfix in async copy * Optimize matmul * Add support for output chunked copy * Python interface for creating threadblock graph * rename python objects * Optimize ClearAccumulatorKernel * Add testcases for IO * Optimize threadblock input ops * support customized * Small optimization * Change memory alignment to 128B for dtensor and stensors * bug fixes * remove the Py prefix for tensor objects in Python * fix typo * fix typo * Modify lib.h to adapt to PR sgl-project#53 * Bugfix in testcase * Support in-register accumulation for matmul * Rewrite tb scheduling * Add support for advanced memory planning interface & algos * Allocate software pipeline buffers in memory planner too * Code formatting * Fix test script * Add doc for TB scheduling and memory planning * Add doc for register-backed accumulator * Refine tb elementwise binary operator for broadcast support * Add test for tb elementwise binary operator with broadcast * Fix a subtle bug in matmul * Add some comments * Refine TB input and output ops (do not rely on stride) * Refine TB reduction kernel to avoid using stride * nits * Refine matmul operator: do not rely on stride * Slightly reorganize procedure in Transpiler * Change matmul perf args to align with tb matmul perf * Rename a file * Add support for swizzling (XOR and SHIFT) (has a bug) * Add doc for swizzling * Add a test for the SHIFT swizzling * Fix issue (sgl-project#56) * Format boolean variable as `true` and `false` for better readability * nits * nits * Hotfix CuTe's bug Ref: NVIDIA/cutlass#1766 * Bump Cutlass's version and remove the workaround in the last commit * Remove debug info * Update doc * Bugfix * [Transpiler] Add more Python examples for transpiler testing (sgl-project#57) * bug fixes * fix compile issue * minor updates --------- Co-authored-by: interestingLSY <interestingLSY@gmail.com> Co-authored-by: Chun'an Shi <44977219+SpiritedAwayCN@users.noreply.github.com> Co-authored-by: SpiritedAwayCN <541845219@qq.com>
chunyuan-w
pushed a commit
to chunyuan-w/sglang
that referenced
this pull request
May 20, 2025
* fix * fix AWQ for DSv3 * don't use absorb MLA for AWQ * lint * more fixes * add w4a16 kernel * remove unnecessary name * add note * use prefetch. simplify impl * clean up. add brgemm (WIP) * fix brgemm * fix mismatch BLOCK_N * use at::quint4x2 to signify type better * change type of zero point back to uint8 * add FusedMoE interface * use FusedMoE kernel * fix types * fix MoE * update deepseek.cpp
chunyuan-w
pushed a commit
to chunyuan-w/sglang
that referenced
this pull request
Jun 17, 2025
* fix * fix AWQ for DSv3 * don't use absorb MLA for AWQ * lint * more fixes * add w4a16 kernel * remove unnecessary name * add note * use prefetch. simplify impl * clean up. add brgemm (WIP) * fix brgemm * fix mismatch BLOCK_N * use at::quint4x2 to signify type better * change type of zero point back to uint8 * add FusedMoE interface * use FusedMoE kernel * fix types * fix MoE * update deepseek.cpp
blzheng
added a commit
to blzheng/sglang
that referenced
this pull request
Aug 9, 2025
5 tasks
key4ng
pushed a commit
to key4ng/sglang
that referenced
this pull request
Nov 9, 2025
amd-youchen
pushed a commit
to amd-youchen/sglang
that referenced
this pull request
Dec 8, 2025
[Acc] update hash function to speed multimodal preprocess
Garrybest
pushed a commit
to Garrybest/sglang
that referenced
this pull request
Jan 9, 2026
vschandramourya
pushed a commit
to vschandramourya/sglang
that referenced
this pull request
Feb 3, 2026
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
lujangus
pushed a commit
to tails-mpt/sglang
that referenced
this pull request
Mar 31, 2026
…hidden states generation (sgl-project#57) * add local data path support and more assistant * small refactor * separate out the data-preprocess logic
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.