Support v1/chat/completions#50
Conversation
|
For the chat template, can we use the hugging face tokenizer by default? https://huggingface.co/docs/transformers/main/en/chat_templating#how-do-i-use-chat-templates |
Good idea. Let me add this. |
|
Now if And this template is actually incorrect for this model, so you will get the following response in the unit test: The following response is expected with ChatML template: Another issue is |
| @@ -0,0 +1,381 @@ | |||
| # Adapted from | |||
| # https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py | |||
There was a problem hiding this comment.
We can consider importing instead of copying later.
…roject#50) * [Transpiler] Add config class and interface for the transpiler * [Misc] Add a static checking for STensor::num_elements * [Transpiler] Implement all kernel operators and their runtime * Format code * Add support for broadcast in ElemwiseBinaryKernel * Refine Python interface for transpiler & Add layout resolve for DTensor * Finish layout resolution for threadblock level ops * Add basic support for threadblock level transpiling * Merge Python JIT frontend (sgl-project#40). Add support for Python JIT frontend * temp frontend * bug fix * bug fix * frontend with output shapes/strides * add jit demo * Add checking for MIRAGE_ROOT --------- Co-authored-by: Shengyu Liu <interestingLSY@gmail.com> * Add unit test for the transpiler * Add document for the transpiler * Add threadblock level matmul operator * Add threadblock-level reduction op * clang format * nits * Remove nonexist examples * Merge from main * Add tb scheduling & Add support for forloop accumulator * Add support for tb operator fusion * TB_FORLOOP_ACCUM_OP->TB_FORLOOP_ACCUM_NO_RED_OP * Refine documents * nits * nits * Add support for chunked copy and async copy * fix Python JIT compilation errors * [CUDA Transpiler] Fix Python JIT compilation errors (sgl-project#51) * fix Python JIT compilation errors * Remove __getattr__ from wrapper --------- Co-authored-by: SpiritedAwayCN <541845219@qq.com> * checkpoint * Bugfix in async copy * Optimize matmul * Add support for output chunked copy * Python interface for creating threadblock graph * rename python objects * Optimize ClearAccumulatorKernel * Add testcases for IO * Optimize threadblock input ops * support customized * Small optimization * Change memory alignment to 128B for dtensor and stensors * bug fixes * remove the Py prefix for tensor objects in Python * fix typo * fix typo * Modify lib.h to adapt to PR sgl-project#53 * Bugfix in testcase * Support in-register accumulation for matmul * Rewrite tb scheduling * Add support for advanced memory planning interface & algos * Allocate software pipeline buffers in memory planner too * Code formatting * Fix test script * Add doc for TB scheduling and memory planning * Add doc for register-backed accumulator * Refine tb elementwise binary operator for broadcast support * Add test for tb elementwise binary operator with broadcast * Fix a subtle bug in matmul * Add some comments * Refine TB input and output ops (do not rely on stride) * Refine TB reduction kernel to avoid using stride * nits * Refine matmul operator: do not rely on stride * Slightly reorganize procedure in Transpiler * Change matmul perf args to align with tb matmul perf * Rename a file * Add support for swizzling (XOR and SHIFT) (has a bug) * Add doc for swizzling * Add a test for the SHIFT swizzling * Fix issue (sgl-project#56) * Format boolean variable as `true` and `false` for better readability * nits * nits * Hotfix CuTe's bug Ref: NVIDIA/cutlass#1766 * Bump Cutlass's version and remove the workaround in the last commit * Remove debug info * Update doc * Bugfix * [Transpiler] Add more Python examples for transpiler testing (sgl-project#57) * bug fixes * fix compile issue * minor updates --------- Co-authored-by: interestingLSY <interestingLSY@gmail.com> Co-authored-by: Chun'an Shi <44977219+SpiritedAwayCN@users.noreply.github.com> Co-authored-by: SpiritedAwayCN <541845219@qq.com>
* remove vllm as a dependency revert registry * Isort format Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Fix isort Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Fix skip vllm import logic Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Fix more vllm imports Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Update to a newer vllm-hpu-extension version that does not rely on vllm Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Add ray dep Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Add einops Signed-off-by: Daniel Huang <daniel1.huang@intel.com> --------- Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Co-authored-by: Yang Wang <yang3.wang@intel.com>
[Feature] [ROCM] Support Add & LayerNorm fused for Qwen3-VL VIT
Close #26