Skip to content

Improve error message & Add vicuna template#57

Merged
merrymercy merged 1 commit intomainfrom
msg
Jan 20, 2024
Merged

Improve error message & Add vicuna template#57
merrymercy merged 1 commit intomainfrom
msg

Conversation

@merrymercy
Copy link
Copy Markdown
Contributor

No description provided.

@merrymercy merrymercy merged commit f30abd0 into main Jan 20, 2024
@merrymercy merrymercy deleted the msg branch January 20, 2024 01:03
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
NorthmanPKU pushed a commit to NorthmanPKU/sglang that referenced this pull request May 16, 2025
…roject#50)

* [Transpiler] Add config class and interface for the transpiler

* [Misc] Add a static checking for STensor::num_elements

* [Transpiler] Implement all kernel operators and their runtime

* Format code

* Add support for broadcast in ElemwiseBinaryKernel

* Refine Python interface for transpiler & Add layout resolve for DTensor

* Finish layout resolution for threadblock level ops

* Add basic support for threadblock level transpiling

* Merge Python JIT frontend (sgl-project#40). Add support for Python JIT frontend

* temp frontend

* bug fix

* bug fix

* frontend with output shapes/strides

* add jit demo

* Add checking for MIRAGE_ROOT

---------

Co-authored-by: Shengyu Liu <interestingLSY@gmail.com>

* Add unit test for the transpiler

* Add document for the transpiler

* Add threadblock level matmul operator

* Add threadblock-level reduction op

* clang format

* nits

* Remove nonexist examples

* Merge from main

* Add tb scheduling & Add support for forloop accumulator

* Add support for tb operator fusion

* TB_FORLOOP_ACCUM_OP->TB_FORLOOP_ACCUM_NO_RED_OP

* Refine documents

* nits

* nits

* Add support for chunked copy and async copy

* fix Python JIT compilation errors

* [CUDA Transpiler] Fix Python JIT compilation errors (sgl-project#51)

* fix Python JIT compilation errors

* Remove __getattr__ from wrapper

---------

Co-authored-by: SpiritedAwayCN <541845219@qq.com>

* checkpoint

* Bugfix in async copy

* Optimize matmul

* Add support for output chunked copy

* Python interface for creating threadblock graph

* rename python objects

* Optimize ClearAccumulatorKernel

* Add testcases for IO

* Optimize threadblock input ops

* support customized

* Small optimization

* Change memory alignment to 128B for dtensor and stensors

* bug fixes

* remove the Py prefix for tensor objects in Python

* fix typo

* fix typo

* Modify lib.h to adapt to PR sgl-project#53

* Bugfix in testcase

* Support in-register accumulation for matmul

* Rewrite tb scheduling

* Add support for advanced memory planning interface & algos

* Allocate software pipeline buffers in memory planner too

* Code formatting

* Fix test script

* Add doc for TB scheduling and memory planning

* Add doc for register-backed accumulator

* Refine tb elementwise binary operator for broadcast support

* Add test for tb elementwise binary operator with broadcast

* Fix a subtle bug in matmul

* Add some comments

* Refine TB input and output ops (do not rely on stride)

* Refine TB reduction kernel to avoid using stride

* nits

* Refine matmul operator: do not rely on stride

* Slightly reorganize procedure in Transpiler

* Change matmul perf args to align with tb matmul perf

* Rename a file

* Add support for swizzling (XOR and SHIFT) (has a bug)

* Add doc for swizzling

* Add a test for the SHIFT swizzling

* Fix issue (sgl-project#56)

* Format boolean variable as `true` and `false` for better readability

* nits

* nits

* Hotfix CuTe's bug

Ref: NVIDIA/cutlass#1766

* Bump Cutlass's version and remove the workaround in the last commit

* Remove debug info

* Update doc

* Bugfix

* [Transpiler] Add more Python examples for transpiler testing (sgl-project#57)

* bug fixes

* fix compile issue

* minor updates

---------

Co-authored-by: interestingLSY <interestingLSY@gmail.com>
Co-authored-by: Chun'an Shi <44977219+SpiritedAwayCN@users.noreply.github.com>
Co-authored-by: SpiritedAwayCN <541845219@qq.com>
chunyuan-w pushed a commit to chunyuan-w/sglang that referenced this pull request May 20, 2025
* fix

* fix AWQ for DSv3

* don't use absorb MLA for AWQ

* lint

* more fixes

* add w4a16 kernel

* remove unnecessary name

* add note

* use prefetch. simplify impl

* clean up. add brgemm (WIP)

* fix brgemm

* fix mismatch BLOCK_N

* use at::quint4x2 to signify type better

* change type of zero point back to uint8

* add FusedMoE interface

* use FusedMoE kernel

* fix types

* fix MoE

* update deepseek.cpp
chunyuan-w pushed a commit to chunyuan-w/sglang that referenced this pull request Jun 17, 2025
* fix

* fix AWQ for DSv3

* don't use absorb MLA for AWQ

* lint

* more fixes

* add w4a16 kernel

* remove unnecessary name

* add note

* use prefetch. simplify impl

* clean up. add brgemm (WIP)

* fix brgemm

* fix mismatch BLOCK_N

* use at::quint4x2 to signify type better

* change type of zero point back to uint8

* add FusedMoE interface

* use FusedMoE kernel

* fix types

* fix MoE

* update deepseek.cpp
blzheng added a commit to blzheng/sglang that referenced this pull request Aug 9, 2025
key4ng pushed a commit to key4ng/sglang that referenced this pull request Nov 9, 2025
amd-youchen pushed a commit to amd-youchen/sglang that referenced this pull request Dec 8, 2025
[Acc] update hash function to speed multimodal preprocess
Garrybest pushed a commit to Garrybest/sglang that referenced this pull request Jan 9, 2026
vschandramourya pushed a commit to vschandramourya/sglang that referenced this pull request Feb 3, 2026
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
lujangus pushed a commit to tails-mpt/sglang that referenced this pull request Mar 31, 2026
…hidden states generation (sgl-project#57)

* add local data path support and more assistant

* small refactor

* separate out the data-preprocess logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant