Add a script for serving experiments & Collect system stats in scheduler by WoosukKwon · Pull Request #30 · vllm-project/vllm

WoosukKwon · 2023-04-06T09:46:25Z

Example usage:

Generating a single completion: python benchmark/benchmark_text_completion.py --dataset alpaca_opt_text_completion.pkl --model facebook/opt-13b --request-rate 1.0 --duration 3600 --n1 1.0
Generating two completions in parallel: python benchmark/benchmark_text_completion.py --dataset alpaca_opt_text_completion.pkl --model facebook/opt-13b --request-rate 1.0 --duration 3600 --n2 1.0
Generating two completions with beam search: python benchmark/benchmark_text_completion.py --dataset alpaca_opt_text_completion.pkl --model facebook/opt-13b --request-rate 1.0 --duration 3600 --n2-beam 1.0

…ce_artifacts Revert "Produce artifacts for bare metal installation in Dockerfile.openvino"

This PR logs all errors during validation or generation for a request like TGIS does. Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

…ensions Dockerfile.ubi: get rid of prebuilt-wheel stage

…um_wa WA: Disable cumsum in HPU _prepare_prompt

fix the model loading fp8

Some PR for plugin support is not merged by vllm yet. This PR add monkey patch to vllm-ascend to make vllm-ascend work with vllm directly. This patch code should be removed once the related function is supported by vllm originally. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: simon-mo <xmo@berkeley.edu>

…cache [NIXL] fix unify kv cache spec

New Industry Use Cases (vllm-project#21-30): - vllm-project#21 Game Development: AI game testing + balance tuning - vllm-project#22 Construction: Vision AI safety inspection - vllm-project#23 Agriculture/Smart Farm: Crop monitoring + pest detection - vllm-project#24 Government/Public: Document automation + citizen services - vllm-project#25 Energy/Utilities: Grid monitoring + anomaly detection - vllm-project#26 Environment/Sustainability: Carbon tracking + ESG reporting - vllm-project#27 Fashion/Apparel: Trend analysis + inventory optimization - vllm-project#28 Sports/Fitness: Performance analytics + tactical analysis - vllm-project#29 Automotive/Mobility: Autonomous driving simulation - vllm-project#30 Space/Aerospace: Satellite image analysis Advanced Architecture Patterns: 1. Event-Driven Pattern: Webhook → Event Bus → Agent triggers 2. Streaming Pattern: Large dataset processing with chunking 3. Batch Processing Pattern: Celery-based parallel processing 4. Circuit Breaker Pattern: Fault tolerance + auto recovery 5. CQRS + Event Sourcing: Command/Query separation 6. Saga Pattern: Distributed transaction management Guide now covers: - 30+ industry-specific MCP implementations - 6 production-ready architecture patterns - Real-world scalability solutions - Enterprise integration strategies - Total: 8,672 lines (from 7,249)

* add implementation Signed-off-by: Max Hu <hyoung2991@gmail.com> * add impl Signed-off-by: Max Hu <hyoung2991@gmail.com> * add flashinfer * fix tp Signed-off-by: Max Hu <hyoung2991@gmail.com> * Temporary change for ViT * fix workspace_buffer device. * change max_seqlen to 128k. * remove duplicate multiplier. * fix accuracy and refactor * more fix * change dockerfile * format Signed-off-by: Max Hu <hyoung2991@gmail.com> * fix version Signed-off-by: Max Hu <hyoung2991@gmail.com> * change python version * remove qwen25 transformer support * change dockerfile * add build versions * chagne version * change version * change * change * change * change * change * build image * change back * change to 10.0f * fix fi import Signed-off-by: Max Hu <hyoung2991@gmail.com> * change to build in dev image Signed-off-by: Max Hu <hyoung2991@gmail.com> * change location Signed-off-by: Max Hu <hyoung2991@gmail.com> * change location Signed-off-by: Max Hu <hyoung2991@gmail.com> * change Signed-off-by: Max Hu <hyoung2991@gmail.com> * change cubin and jitcache to wheels Signed-off-by: Max Hu <hyoung2991@gmail.com> * change Signed-off-by: Max Hu <hyoung2991@gmail.com> * add comment Signed-off-by: Max Hu <hyoung2991@gmail.com> --------- Signed-off-by: Max Hu <hyoung2991@gmail.com> Co-authored-by: Anerudhan Gopal <agopal@nvidia.com> Co-authored-by: Baorun Mu <bmu@nvidia.com>

add multistream and core limitation of communication stream

WoosukKwon added 16 commits April 6, 2023 01:02

Add trace generator

ebab871

Add functions to collect stats

3de69a4

Add main experiment script

3bc7fd8

Add to gitignore

b5d6073

Save more info

7ff16a5

Minor

a0aad23

Minor

59b4155

Add timestamps & num_preemption

ccb9826

Save arrival & finish time

1a6f707

Add script to visualize stats

ae21da2

More colors

980f9c9

Minor

6b4db61

Minor

195e7fb

Refactor

f5350bd

Add png to gitignore

4f38abb

Fix output_dir

c504646

WoosukKwon requested a review from zhuohan123 April 6, 2023 09:46

WoosukKwon added 13 commits April 7, 2023 00:33

Fix output_dir

02b4cc4

Fix bug

c0381db

OPT-60B -> OPT-66B

435c09a

n8 -> n3

5110a64

Add max-num-sequences

ea1729c

Add max-num-sequences

e9427b6

Shorten sampling dir name

a3ab3f6

max_num_seqs 128 -> 256

11576fc

Minor

3353a23

Merge branch 'main' into experiment

a832499

Fix bug in trace generator

69618a3

Add n6 & n6-beam

fe15d81

Merge branch 'main' into experiment

16efbce

shanshanpt mentioned this pull request Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Collect system stats in scheduler & Add scripts for experiments (vllm…

52d027d

…-project#30)

slyalin pushed a commit to slyalin/vllm that referenced this pull request Apr 19, 2024

Merge pull request vllm-project#30 from ilya-lavrenov/revert-20-produ…

388450f

…ce_artifacts Revert "Produce artifacts for bare metal installation in Dockerfile.openvino"

dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024

✨ log all errored requests (vllm-project#30)

066041a

This PR logs all errors during validation or generation for a request like TGIS does. Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

z103cb pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024

Merge pull request vllm-project#30 from dtrifiro/dockerfile-build-ext…

8103d10

…ensions Dockerfile.ubi: get rid of prebuilt-wheel stage

tianyil1 pushed a commit to tianyil1/vllm that referenced this pull request Jun 5, 2024

Merge pull request vllm-project#30 from HabanaAI/private/kzawora/cums…

2664659

…um_wa WA: Disable cumsum in HPU _prepare_prompt

fxmarty pushed a commit to fxmarty/vllm-public that referenced this pull request Jun 12, 2024

Merge pull request vllm-project#30 from ROCm/charlifu/fp8_wo_upstream

f3e1926

fix the model loading fp8

ZHJ19970917 mentioned this pull request Jul 14, 2024

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

Closed

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

murray-z mentioned this pull request Aug 29, 2024

[Bug]: RuntimeError: operator torchvision::nms does not exist #7940

Closed

1 task

hao-cold mentioned this pull request May 13, 2025

[Bug]: CUDA error: an illegal instruction was encountered #18045

Closed

1 task

markmc mentioned this pull request May 21, 2025

[Bug][Failing Test]: Distributed Comm Ops - distributed/test_shm_broadcast.py #18492

Closed

1 task

zerosurplus mentioned this pull request Jun 16, 2025

[Bug]: torch.distributed.DistNetworkError: The client socket has timed out after 600000ms while trying to connect to (172.17.0.9, 46229). #19670

Open

1 task

xiaomofang mentioned this pull request Jul 31, 2025

[Bug]: There is an issue with speculative inference in Eagle mode, where the context length of vLLM inference is constrained by the draft model. #21986

Closed

1 task

zyongye added a commit to zyongye/vllm that referenced this pull request Aug 5, 2025

ux log (vllm-project#30)

258cf6f

Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: simon-mo <xmo@berkeley.edu>

zyongye added a commit to zyongye/vllm that referenced this pull request Aug 6, 2025

ux log (vllm-project#30)

a220fa2

Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: simon-mo <xmo@berkeley.edu>

heheda12345 pushed a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025

Merge pull request vllm-project#30 from vllm-model-0920/fix-unify-kv-…

08bd3e3

…cache [NIXL] fix unify kv cache spec

Michel-debug mentioned this pull request Oct 23, 2025

[Bug]: qwen3-vl-2b after ms-swift fine-tuning lance errors #27405

Closed

1 task

inkcherry pushed a commit to inkcherry/vllm that referenced this pull request Nov 6, 2025

Add PTPC support for MORI v1 and force EPLB (vllm-project#30)

716a17a

chopper0126 pushed a commit to chopper0126/vllm that referenced this pull request Feb 2, 2026

Merge pull request vllm-project#30 from lyqdgcb/my_multistream

ab21ac9

add multistream and core limitation of communication stream

HervorTao mentioned this pull request Feb 3, 2026

[Bug]: [CPU Backend] AttributeError: '_OpNamespace' '_C_utils' object has no attribute 'init_cpu_threads_env' #33675

Closed

1 task

LironKesem mentioned this pull request Mar 12, 2026

[Bug] DGX Spark (sm_121): CUTLASS can_implement() rejects sm_120f binaries #36835

Closed

1 task

Copilot AI mentioned this pull request Mar 20, 2026

Fix XPU segfault when tensor_parallel_size exceeds available devices hongbolv/vllm#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a script for serving experiments & Collect system stats in scheduler#30

Add a script for serving experiments & Collect system stats in scheduler#30
WoosukKwon merged 36 commits intomainfrom
experiment

WoosukKwon commented Apr 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

WoosukKwon commented Apr 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant