Merged
Conversation
WoosukKwon
approved these changes
Mar 28, 2023
Collaborator
WoosukKwon
left a comment
There was a problem hiding this comment.
LGTM! Thanks for your effort.
xiangyuT
pushed a commit
to xiangyuT/vllm
that referenced
this pull request
Oct 25, 2023
* Add underlying functions * tests done
hongxiayang
pushed a commit
to hongxiayang/vllm
that referenced
this pull request
Feb 13, 2024
slyalin
pushed a commit
to slyalin/vllm
that referenced
this pull request
Mar 22, 2024
…sthrough Passthrough trust_remote_code
ykim362
pushed a commit
to ykim362/vllm
that referenced
this pull request
Jun 17, 2024
Wenxh/fp8 on a100 v1 pr
Closed
zeroorhero
pushed a commit
to zeroorhero/vllm
that referenced
this pull request
Sep 23, 2024
Kuntai disagg refactor
1 task
wuhuikx
pushed a commit
to wuhuikx/vllm
that referenced
this pull request
Mar 27, 2025
### What this PR does / why we need it? This PR adds Chinese documents for vllm-ascend for Chinese-speaking developers ### Does this PR introduce _any_ user-facing change? Change as follows - add README.zh.md - add environment.zh.md - add CONTRIBUTING.zh.md ### How was this patch tested? By CI --------- Signed-off-by: wangli <wangli858794774@gmail.com>
1 task
juncgu
pushed a commit
to juncgu/vllm
that referenced
this pull request
May 8, 2025
Move new GPUModelRunner methods out of `execute_model` method
1 task
1 task
1 task
1 task
zyongye
pushed a commit
to zyongye/vllm
that referenced
this pull request
Aug 5, 2025
* hf format Signed-off-by: Chen Zhang <zhangch99@outlook.com> * better qkv concat Signed-off-by: Chen Zhang <zhangch99@outlook.com> --------- Signed-off-by: Chen Zhang <zhangch99@outlook.com>
zyongye
pushed a commit
to zyongye/vllm
that referenced
this pull request
Aug 6, 2025
* hf format Signed-off-by: Chen Zhang <zhangch99@outlook.com> * better qkv concat Signed-off-by: Chen Zhang <zhangch99@outlook.com> --------- Signed-off-by: Chen Zhang <zhangch99@outlook.com>
1 task
1 task
4 tasks
dik654
pushed a commit
to dik654/vllm-for-study
that referenced
this pull request
Nov 18, 2025
…ections Manufacturing enhancements: - Add complete Vision Inspection MCP with Vision AI defect detection - Add Manufacturing MES MCP with PostgreSQL integration - Include detailed defect classification and statistics - Add ROI analysis showing 78% cost reduction and 99.6% time savings Healthcare enhancements: - Enhance existing Medical OCR, Drug Interaction, and EHR MCPs - Add ROI analysis showing 97.2% time reduction - Include medical accident prevention benefits (5억원 annual savings) - Demonstrate HIPAA-compliant prescription OCR workflow Summary: - Sections vllm-project#5-8: Fully detailed implementations (2,000+ lines each) - Sections vllm-project#9-10: Enhanced with complete code + ROI - Sections vllm-project#11-20+: Comprehensive summaries covering all major industries - Total guide provides 20+ real-world MCP + Agent architecture patterns
Closed
1 task
chopper0126
pushed a commit
to chopper0126/vllm
that referenced
this pull request
Dec 12, 2025
cam support aclgraph full-graph.
prashanth058
pushed a commit
to prashanth058/vllm
that referenced
this pull request
Dec 12, 2025
…-fixes lora vision misc fixes
1 task
1 task
1 task
1 task
eble-amd
pushed a commit
to eble-amd/vllm
that referenced
this pull request
Mar 17, 2026
- Make w_dequant non-optional in W8A16 custom op since it is always pre-computed at weight-load time; remove dead inline dequant fallback. - Add explicit TORCH_CHECK for unsupported group_size in the wvSplitK_int4g_hf_sweep dispatch instead of silent fallthrough. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
This was referenced Mar 20, 2026
yuezhu1
pushed a commit
to yuezhu1/vllm
that referenced
this pull request
Mar 25, 2026
…llm-project#10, closes vllm-project#20) Implements reallocate_lora_weights(new_slots) so stacked GPU tensors can be resized at runtime without restarting the server. - BaseLayerWithLoRA: single implementation with _reallocate() helper that handles both tuple-of-tensors (linear layers) and plain-tensor (LogitsProcessorWithLoRA) storage via isinstance check. All linear layer subclasses inherit this for free. - FusedMoEWithLoRA: override to reallocate the four w13/w2 weight tuples, resize adapter_enabled, rebuild the flat lora_a/b_stacked views list, and update max_loras. FusedMoE3DWithLoRA inherits this override. - 22 CPU-only unit tests in tests/lora/test_reallocate_lora_weights.py covering shape after grow/shrink, weight preservation for surviving slots, zero-init of new slots, no-op before create_lora_weights, and no empty_cache() call inside the method. Pre-commit: ruff-check, ruff-format, mypy-3.10 all pass. Tests: 22/22 pass on CPU. AI assistance was used (Claude Code). All changed lines reviewed by @yuezhu1. This does not duplicate any existing upstream PR or issue. Co-authored-by: Claude <noreply@anthropic.com>
Damon-Salvetore
pushed a commit
to Damon-Salvetore/vllm
that referenced
this pull request
Mar 31, 2026
…t-linear-fp8 Add cuSPARSELt FP8 Linear method analysis to fp8_gemm_integration_analysis.md
danisereb
pushed a commit
to de-inf/vllm
that referenced
this pull request
Apr 5, 2026
…dp-tcp-placement Port multi-node DP fixes from upstream PR vllm-project#38630
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a FastAPI-based frontend to cacheflow while keeping the old script working.
Remaining TODOs: