Add an option to use dummy weights by WoosukKwon · Pull Request #33 · vllm-project/vllm

WoosukKwon · 2023-04-09T06:31:37Z

No description provided.

* Bucketing/Warmup WIP * Cleanup * Revert "Fix model_output_idx on HPU (vllm-project#27)" This reverts commit 90dfa92. * Rework selected_token_indices fix to also work with block_size padding * Simple prompt attention POC * Remove cumsum * MQA/GQA support for simple prompt_attention * Cleanup * Fix typo * Restore profiling runs

…ernel tuning script for rocm. Merge pull request vllm-project#33 - tuned moe configs v2

Enable jit for com ops

…oject#26) * indexer medatata to separate prefill and decode * deep_gemm prefill kernel * decode kernel, can run for single batch * bug fixing insert decode k into kv before gemm * don't use tilelang quant function * faster non-looping torch for kv cache insertion * add chunked prefill impl * change quant kernel back to tilelang for promotion * fix format (vllm-project#31) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * update unit tests * Fp8 indexer prefill (vllm-project#33) * init Signed-off-by: Chen Zhang <zhangch99@outlook.com> * can run --------- Signed-off-by: Chen Zhang <zhangch99@outlook.com> * remove debug comment Signed-off-by: Chen Zhang <zhangch99@outlook.com> * cleanup * further cleanup --------- Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>

* Support cpu kv-cache offload on XPU platform Signed-off-by: chzhang <chaojun.zhang@intel.com> * Support cpu kv-cache offload on XPU platform Signed-off-by: chzhang <chaojun.zhang@intel.com> --------- Signed-off-by: chzhang <chaojun.zhang@intel.com>

AFDConnector, ubatch refactor

Add use-dummy-weights option

5cd8f3d

WoosukKwon merged commit ee88a7e into main Apr 9, 2023

WoosukKwon deleted the dummy branch April 9, 2023 06:36

starlitsky2010 mentioned this pull request Sep 23, 2023

killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node #1160

Closed

shanshanpt mentioned this pull request Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

orellavie1212 mentioned this pull request Dec 11, 2023

Mixtral-8x7B-v0.1 TP 8 GPUS EDIT: TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given #2022

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add an option to use dummy model weights (vllm-project#33)

5365aa5

ZHJ19970917 mentioned this pull request Jul 14, 2024

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

Closed

dllehr-amd pushed a commit to dllehr-amd/vllm that referenced this pull request Jul 22, 2024

Tune fused_moe_kernel for TP 1,2,4,8 and bf16 and fp16, updated moe k…

38ada92

…ernel tuning script for rocm. Merge pull request vllm-project#33 - tuned moe configs v2

bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Jul 31, 2024

Merge pull request vllm-project#33 from intel-sandbox/jit_com

8fa444c

Enable jit for com ops

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

hao-cold mentioned this pull request May 13, 2025

[Bug]: CUDA error: an illegal instruction was encountered #18045

Closed

1 task

markmc mentioned this pull request May 21, 2025

[Bug][Failing Test]: Distributed Comm Ops - distributed/test_shm_broadcast.py #18492

Closed

1 task

zerosurplus mentioned this pull request Jun 16, 2025

[Bug]: torch.distributed.DistNetworkError: The client socket has timed out after 600000ms while trying to connect to (172.17.0.9, 46229). #19670

Open

1 task

xiaomofang mentioned this pull request Jul 31, 2025

[Bug]: There is an issue with speculative inference in Eagle mode, where the context length of vLLM inference is constrained by the draft model. #21986

Closed

1 task

zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 5, 2025

Remove basic_oai.py (vllm-project#33)

d9c54da

zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 6, 2025

Remove basic_oai.py (vllm-project#33)

fd6e6fb

Michel-debug mentioned this pull request Oct 23, 2025

[Bug]: qwen3-vl-2b after ms-swift fine-tuning lance errors #27405

Closed

1 task

GuoRen868 pushed a commit to GuoRen868/vllm that referenced this pull request Jan 27, 2026

Merge pull request vllm-project#33 from tangtiangu/refactor_base_v1

437c5a2

AFDConnector, ubatch refactor

HervorTao mentioned this pull request Feb 3, 2026

[Bug]: [CPU Backend] AttributeError: '_OpNamespace' '_C_utils' object has no attribute 'init_cpu_threads_env' #33675

Closed

1 task

LironKesem mentioned this pull request Mar 12, 2026

[Bug] DGX Spark (sm_121): CUTLASS can_implement() rejects sm_120f binaries #36835

Closed

1 task

Copilot AI mentioned this pull request Mar 20, 2026

Fix XPU segfault when tensor_parallel_size exceeds available devices hongbolv/vllm#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add an option to use dummy weights#33

Add an option to use dummy weights#33
WoosukKwon merged 1 commit intomainfrom
dummy

WoosukKwon commented Apr 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

WoosukKwon commented Apr 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant