Add ninja to dependency by WoosukKwon · Pull Request #21 · vllm-project/vllm

WoosukKwon · 2023-04-02T01:59:32Z

The compilation time of flash-attn can be drastically reduced if ninja is installed. Related issue: Dao-AILab/flash-attention#150

…ock_size [CPU] Support for larger block_size

Fix more logging lint errors

Signed-off-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Daniel Clark <daniel.clark@ibm.com>

make package version control by setuptools_scm to keep the same with vllm Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Co-authored-by: Lucia Fang <fanglu@meta.com>

Co-authored-by: root <root@smc300x-ccs-aus-gpue77e.prov.aus.ccs.cpe.ice.amd.com>

* remove duplicated code Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * remove more Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> --------- Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

New Industry Use Cases (vllm-project#21-30): - vllm-project#21 Game Development: AI game testing + balance tuning - vllm-project#22 Construction: Vision AI safety inspection - vllm-project#23 Agriculture/Smart Farm: Crop monitoring + pest detection - vllm-project#24 Government/Public: Document automation + citizen services - vllm-project#25 Energy/Utilities: Grid monitoring + anomaly detection - vllm-project#26 Environment/Sustainability: Carbon tracking + ESG reporting - vllm-project#27 Fashion/Apparel: Trend analysis + inventory optimization - vllm-project#28 Sports/Fitness: Performance analytics + tactical analysis - vllm-project#29 Automotive/Mobility: Autonomous driving simulation - vllm-project#30 Space/Aerospace: Satellite image analysis Advanced Architecture Patterns: 1. Event-Driven Pattern: Webhook → Event Bus → Agent triggers 2. Streaming Pattern: Large dataset processing with chunking 3. Batch Processing Pattern: Celery-based parallel processing 4. Circuit Breaker Pattern: Fault tolerance + auto recovery 5. CQRS + Event Sourcing: Command/Query separation 6. Saga Pattern: Distributed transaction management Guide now covers: - 30+ industry-specific MCP implementations - 6 production-ready architecture patterns - Real-world scalability solutions - Enterprise integration strategies - Total: 8,672 lines (from 7,249)

- Add _lora_slots field on LoRAModelManager, decoupled from lora_config so dynamic scaling does not mutate the original config object - Add _evict_adapters_to_fit() hook on base class (raises on overflow); LRUCacheLoRAModelManager overrides it with LRU eviction and rebuilds _active_adapters cache with new capacity (cachetools maxsize read-only) - Implement resize_lora_slots() on base class: validates, evicts, calls reallocate_lora_weights() on all modules, empty_cache() once, resizes lora_index_to_id, updates _lora_slots - Step 7 (re-load surviving adapters) intentionally omitted — weights are preserved via GPU-to-GPU copy in reallocate_lora_weights(); comment notes what to do if a remote weight store is introduced in future - Add tests/lora/test_lora_model_manager_resize.py: 6 CPU-only unit tests covering validation, no-op, grow, LRU shrink, base-class overflow raise, and empty_cache() called exactly once Closes vllm-project#11 Closes vllm-project#21 AI assistance was used; all changed lines reviewed by the submitter. Co-authored-by: Claude Signed-off-by: Yue Zhu <Yue.Zhu@ibm.com>

Add ninja to dependency

86983b7

WoosukKwon merged commit 2c5cd0d into main Apr 2, 2023

WoosukKwon deleted the ninja branch April 2, 2023 02:00

shanshanpt mentioned this pull request Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add ninja to dependency (vllm-project#21)

f4354de

slyalin pushed a commit to slyalin/vllm that referenced this pull request Apr 3, 2024

Merge pull request vllm-project#21 from luo-cheng2021/luocheng/var_bl…

ee5c232

…ock_size [CPU] Support for larger block_size

tdg5 pushed a commit to tdg5/vllm that referenced this pull request Apr 25, 2024

Merge pull request vllm-project#21 from tdg5/exp-2

36cf873

Fix more logging lint errors

z103cb referenced this pull request in z103cb/opendatahub_vllm May 7, 2024

fix: Missed TLS config logic from internal fork (opendatahub-io#21)

7df0eb8

Signed-off-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Daniel Clark <daniel.clark@ibm.com>

yuhuixu1993 mentioned this pull request Jun 2, 2024

[Bug]: loading squeezellm model #5190

Closed

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

wuhuikx pushed a commit to wuhuikx/vllm that referenced this pull request Mar 27, 2025

[Misc] version control by setuptools_scm (vllm-project#21)

c59375c

make package version control by setuptools_scm to keep the same with vllm Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

hao-cold mentioned this pull request May 13, 2025

[Bug]: CUDA error: an illegal instruction was encountered #18045

Closed

1 task

markmc mentioned this pull request May 21, 2025

[Bug][Failing Test]: Distributed Comm Ops - distributed/test_shm_broadcast.py #18492

Closed

1 task

zerosurplus mentioned this pull request Jun 16, 2025

[Bug]: torch.distributed.DistNetworkError: The client socket has timed out after 600000ms while trying to connect to (172.17.0.9, 46229). #19670

Open

1 task

xiaomofang mentioned this pull request Jul 31, 2025

[Bug]: There is an issue with speculative inference in Eagle mode, where the context length of vLLM inference is constrained by the draft model. #21986

Closed

1 task

zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 5, 2025

Support Responses Streaming (vllm-project#21)

b775a39

zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 6, 2025

Support Responses Streaming (vllm-project#21)

696cfb8

heheda12345 pushed a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025

support mtp with indexer kv (vllm-project#21)

6a29a01

Co-authored-by: Lucia Fang <fanglu@meta.com>

Michel-debug mentioned this pull request Oct 23, 2025

[Bug]: qwen3-vl-2b after ms-swift fine-tuning lance errors #27405

Closed

1 task

inkcherry pushed a commit to inkcherry/vllm that referenced this pull request Nov 6, 2025

debug (vllm-project#21)

e96685a

Co-authored-by: root <root@smc300x-ccs-aus-gpue77e.prov.aus.ccs.cpe.ice.amd.com>

acodercat mentioned this pull request Nov 10, 2025

[Bugfix] Add strong reference to CUDA pluggable allocator callbacks #23477

Merged

4 tasks

sriumcp mentioned this pull request Jan 28, 2026

[Bugfix] Fix OpenTelemetry trace context propagation between API and engine (issue #21) #33272

Closed

Lrcx mentioned this pull request Jan 29, 2026

[Bug]: Crash when using presence_penalty with Qwen3-VL in v0.11.0 #33338

Open

1 task

HervorTao mentioned this pull request Feb 3, 2026

[Bug]: [CPU Backend] AttributeError: '_OpNamespace' '_C_utils' object has no attribute 'init_cpu_threads_env' #33675

Closed

1 task

LironKesem mentioned this pull request Mar 12, 2026

[Bug] DGX Spark (sm_121): CUTLASS can_implement() rejects sm_120f binaries #36835

Closed

1 task

mahaocong90 mentioned this pull request Mar 17, 2026

[Bug]: QWEN 3.5-397B-A17B report "RPC call to sample_tokens timed out" #37250

Closed

1 task

Copilot AI mentioned this pull request Mar 20, 2026

Fix XPU segfault when tensor_parallel_size exceeds available devices hongbolv/vllm#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ninja to dependency#21

Add ninja to dependency#21
WoosukKwon merged 1 commit intomainfrom
ninja

WoosukKwon commented Apr 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

WoosukKwon commented Apr 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant