Skip to content

Add an option to use dummy weights#33

Merged
WoosukKwon merged 1 commit intomainfrom
dummy
Apr 9, 2023
Merged

Add an option to use dummy weights#33
WoosukKwon merged 1 commit intomainfrom
dummy

Conversation

@WoosukKwon
Copy link
Copy Markdown
Collaborator

No description provided.

@WoosukKwon WoosukKwon merged commit ee88a7e into main Apr 9, 2023
@WoosukKwon WoosukKwon deleted the dummy branch April 9, 2023 06:36
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
tianyil1 pushed a commit to tianyil1/vllm that referenced this pull request Jun 5, 2024
* Bucketing/Warmup WIP

* Cleanup

* Revert "Fix model_output_idx on HPU (vllm-project#27)"

This reverts commit 90dfa92.

* Rework selected_token_indices fix to also work with block_size padding

* Simple prompt attention POC

* Remove cumsum

* MQA/GQA support for simple prompt_attention

* Cleanup

* Fix typo

* Restore profiling runs
dllehr-amd pushed a commit to dllehr-amd/vllm that referenced this pull request Jul 22, 2024
…ernel tuning script for rocm.

Merge pull request vllm-project#33  - tuned moe configs v2
bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Jul 31, 2024
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 5, 2025
zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 6, 2025
heheda12345 added a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025
…oject#26)

* indexer medatata to separate prefill and decode

* deep_gemm prefill kernel

* decode kernel, can run for single batch

* bug fixing insert decode k into kv before gemm

* don't use tilelang quant function

* faster non-looping torch for kv cache insertion

* add chunked prefill impl

* change quant kernel back to tilelang for promotion

* fix format (vllm-project#31)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* update unit tests

* Fp8 indexer prefill (vllm-project#33)

* init

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* can run

---------

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* remove debug comment

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* cleanup

* further cleanup

---------

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
yma11 pushed a commit to yma11/vllm that referenced this pull request Dec 8, 2025
* Support cpu kv-cache offload on XPU platform

Signed-off-by: chzhang <chaojun.zhang@intel.com>

* Support cpu kv-cache offload on XPU platform

Signed-off-by: chzhang <chaojun.zhang@intel.com>

---------

Signed-off-by: chzhang <chaojun.zhang@intel.com>
GuoRen868 pushed a commit to GuoRen868/vllm that referenced this pull request Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant