[DO NOT MERGE] Hao integration by zhisbug · Pull Request #31 · vllm-project/vllm

zhisbug · 2023-04-07T10:21:24Z

No description provided.

zhuohan123 · 2023-05-24T04:42:47Z

Changes in this PR have been added to the latest main branch.

Enabled int8 weights by default

…i_docker Docker.ubi: add missing package git

Within the existing `decoding` request parameter section: ```protobuf enum ResponseFormat { // Plain text, no constraints TEXT = 0; // Valid json JSON = 1; } message StringChoices { repeated string choices = 1; } // Mutually-exclusive guided decoding options oneof guided { // Output will be in the specified format ResponseFormat format = 3; // Output will follow the provided JSON schema string json_schema = 4; // Output will follow the provided regex pattern string regex = 5; // Output will be exactly one of the specified choices StringChoices choice = 6; // Output will follow the provided context free grammar string grammar = 7; } ``` Signed-off-by: Nick Hill <nickhill@us.ibm.com>

…on_opts vLLM lm head optimization (tpp)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

Co-authored-by: dengyunyang <dengyunyang@huawei.com>

…oject#26) * indexer medatata to separate prefill and decode * deep_gemm prefill kernel * decode kernel, can run for single batch * bug fixing insert decode k into kv before gemm * don't use tilelang quant function * faster non-looping torch for kv cache insertion * add chunked prefill impl * change quant kernel back to tilelang for promotion * fix format (vllm-project#31) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * update unit tests * Fp8 indexer prefill (vllm-project#33) * init Signed-off-by: Chen Zhang <zhangch99@outlook.com> * can run --------- Signed-off-by: Chen Zhang <zhangch99@outlook.com> * remove debug comment Signed-off-by: Chen Zhang <zhangch99@outlook.com> * cleanup * further cleanup --------- Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>

…t in TTModelRunner::_make_sampler_output as expected by vLLM downstream

…-rebase DBO+aclgrph

…reset on `apt-get` (vllm-project#30784)" (vllm-project#31) This reverts commit 2a60ac9.

* [Docker][Dev] Fix libnccl-dev version for the CUDA 13.0.1 devel image [Docker][Dev] Fix libnccl-dev version conflict for the CUDA 13.0.1 devel image Further update * feat: Support FA4 for mm-encoder-attn-backend for qwen models * feat: Kernel warmup for vit fa4 * fix: Fix some minor conflicts due to the introduction of flash_attn.cute * Revert "[Docker][Dev] Fix libnccl-dev version for the CUDA 13.0.1 devel image" This reverts commit ab76b28. * chore: Update requirements and revert README.md * chore: Install git for flash_attn cute installation * lint: Fix linting * Revert "[Improvement] Persist CUDA compat libraries paths to prevent reset on `apt-get` (vllm-project#30784)" (vllm-project#31) This reverts commit 2a60ac9. --------- Co-authored-by: Shang Wang <shangw@nvidia.com>

zhisbug added 9 commits April 4, 2023 14:06

changes

440915b

merge main

5eadcff

update stop_str

c858c58

recover

3f23520

fix a stop_str name

6442e3e

update

7b121fa

update

b85250e

not using fast tokenizer

0bdd814

add support for koala and alpaca

8e56ab6

zhuohan123 closed this May 24, 2023

zhuohan123 deleted the hao-integration branch June 18, 2023 07:25

shanshanpt mentioned this pull request Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

slyalin pushed a commit to slyalin/vllm that referenced this pull request Apr 22, 2024

Merge pull request vllm-project#31 from slyalin/int8_enabled_by_default

469a4d0

Enabled int8 weights by default

z103cb pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024

Merge pull request vllm-project#31 from z103cb/ibm_main_add_git_to_ub…

38eed8a

…i_docker Docker.ubi: add missing package git

ZHJ19970917 mentioned this pull request Jul 14, 2024

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

Closed

bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Jul 31, 2024

Merge pull request vllm-project#31 from intel-sandbox/jianan/generati…

25e4d7b

…on_opts vLLM lm head optimization (tpp)

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

surak mentioned this pull request Apr 1, 2025

[Bug]: building docker from Dockerfile #15872

Closed

1 task

hao-cold mentioned this pull request May 13, 2025

[Bug]: CUDA error: an illegal instruction was encountered #18045

Closed

1 task

markmc mentioned this pull request May 21, 2025

[Bug][Failing Test]: Distributed Comm Ops - distributed/test_shm_broadcast.py #18492

Closed

1 task

zerosurplus mentioned this pull request Jun 16, 2025

[Bug]: torch.distributed.DistNetworkError: The client socket has timed out after 600000ms while trying to connect to (172.17.0.9, 46229). #19670

Open

1 task

xiaomofang mentioned this pull request Jul 31, 2025

[Bug]: There is an issue with speculative inference in Eagle mode, where the context length of vLLM inference is constrained by the draft model. #21986

Closed

1 task

zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 5, 2025

Move responses_api.py to examples (vllm-project#31)

e0bf571

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 6, 2025

Move responses_api.py to examples (vllm-project#31)

19e469f

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

Bounty-hunter added a commit to Bounty-hunter/vllm that referenced this pull request Sep 25, 2025

setting keepalive time (vllm-project#31)

3f3c455

Co-authored-by: dengyunyang <dengyunyang@huawei.com>

Michel-debug mentioned this pull request Oct 23, 2025

[Bug]: qwen3-vl-2b after ms-swift fine-tuning lance errors #27405

Closed

1 task

inkcherry pushed a commit to inkcherry/vllm that referenced this pull request Nov 6, 2025

fix mori fp8 issue (vllm-project#31)

f1a87bf

iwooook pushed a commit to moreh-dev/vllm that referenced this pull request Nov 29, 2025

fixing vllm-project#31 by converting SamplerOutput output_token to in…

d236ccf

…t in TTModelRunner::_make_sampler_output as expected by vLLM downstream

chopper0126 pushed a commit to chopper0126/vllm that referenced this pull request Jan 7, 2026

Merge pull request vllm-project#31 from chopper0126/backup-after-pull…

21481b7

…-rebase DBO+aclgrph

soodoshll pushed a commit to soodoshll/vllm that referenced this pull request Jan 30, 2026

Revert "[Improvement] Persist CUDA compat libraries paths to prevent …

9a4fc64

…reset on `apt-get` (vllm-project#30784)" (vllm-project#31) This reverts commit 2a60ac9.

HervorTao mentioned this pull request Feb 3, 2026

[Bug]: [CPU Backend] AttributeError: '_OpNamespace' '_C_utils' object has no attribute 'init_cpu_threads_env' #33675

Closed

1 task

LironKesem mentioned this pull request Mar 12, 2026

[Bug] DGX Spark (sm_121): CUTLASS can_implement() rejects sm_120f binaries #36835

Closed

1 task

Copilot AI mentioned this pull request Mar 20, 2026

Fix XPU segfault when tensor_parallel_size exceeds available devices hongbolv/vllm#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DO NOT MERGE] Hao integration#31

[DO NOT MERGE] Hao integration#31
zhisbug wants to merge 9 commits intomainfrom
hao-integration

zhisbug commented Apr 7, 2023

Uh oh!

zhuohan123 commented May 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zhisbug commented Apr 7, 2023

Uh oh!

zhuohan123 commented May 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants