Skip to content

FastAPI-based working frontend#10

Merged
zhuohan123 merged 14 commits intomainfrom
real-frontend
Mar 29, 2023
Merged

FastAPI-based working frontend#10
zhuohan123 merged 14 commits intomainfrom
real-frontend

Conversation

@zhuohan123
Copy link
Copy Markdown
Member

@zhuohan123 zhuohan123 commented Mar 27, 2023

Add a FastAPI-based frontend to cacheflow while keeping the old script working.

Remaining TODOs:

  • Add a README for the FastAPI frontend.
  • Rename the old script.
  • Add a gradio demo web frontend.

@zhuohan123 zhuohan123 requested a review from WoosukKwon March 27, 2023 06:19
Copy link
Copy Markdown
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for your effort.

@zhuohan123 zhuohan123 merged commit 721fa3d into main Mar 29, 2023
@zhuohan123 zhuohan123 deleted the real-frontend branch March 29, 2023 06:49
xiangyuT pushed a commit to xiangyuT/vllm that referenced this pull request Oct 25, 2023
* Add underlying functions

* tests done
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
slyalin pushed a commit to slyalin/vllm that referenced this pull request Mar 22, 2024
ykim362 pushed a commit to ykim362/vllm that referenced this pull request Jun 17, 2024
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
zeroorhero pushed a commit to zeroorhero/vllm that referenced this pull request Sep 23, 2024
wuhuikx pushed a commit to wuhuikx/vllm that referenced this pull request Mar 27, 2025
### What this PR does / why we need it?
This PR adds Chinese documents for vllm-ascend for Chinese-speaking
developers

### Does this PR introduce _any_ user-facing change?
Change as follows
- add README.zh.md
- add environment.zh.md
- add CONTRIBUTING.zh.md

### How was this patch tested?
By CI

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
juncgu pushed a commit to juncgu/vllm that referenced this pull request May 8, 2025
Move new GPUModelRunner methods out of `execute_model` method
zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 5, 2025
* hf format

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* better qkv concat

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

---------

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 6, 2025
* hf format

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* better qkv concat

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

---------

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
dik654 pushed a commit to dik654/vllm-for-study that referenced this pull request Nov 18, 2025
…ections

Manufacturing enhancements:
- Add complete Vision Inspection MCP with Vision AI defect detection
- Add Manufacturing MES MCP with PostgreSQL integration
- Include detailed defect classification and statistics
- Add ROI analysis showing 78% cost reduction and 99.6% time savings

Healthcare enhancements:
- Enhance existing Medical OCR, Drug Interaction, and EHR MCPs
- Add ROI analysis showing 97.2% time reduction
- Include medical accident prevention benefits (5억원 annual savings)
- Demonstrate HIPAA-compliant prescription OCR workflow

Summary:
- Sections vllm-project#5-8: Fully detailed implementations (2,000+ lines each)
- Sections vllm-project#9-10: Enhanced with complete code + ROI
- Sections vllm-project#11-20+: Comprehensive summaries covering all major industries
- Total guide provides 20+ real-world MCP + Agent architecture patterns
chopper0126 pushed a commit to chopper0126/vllm that referenced this pull request Dec 12, 2025
prashanth058 pushed a commit to prashanth058/vllm that referenced this pull request Dec 12, 2025
eble-amd pushed a commit to eble-amd/vllm that referenced this pull request Mar 17, 2026
- Make w_dequant non-optional in W8A16 custom op since it is always
  pre-computed at weight-load time; remove dead inline dequant fallback.
- Add explicit TORCH_CHECK for unsupported group_size in the
  wvSplitK_int4g_hf_sweep dispatch instead of silent fallthrough.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
yuezhu1 pushed a commit to yuezhu1/vllm that referenced this pull request Mar 25, 2026
…llm-project#10, closes vllm-project#20)

Implements reallocate_lora_weights(new_slots) so stacked GPU tensors can
be resized at runtime without restarting the server.

- BaseLayerWithLoRA: single implementation with _reallocate() helper that
  handles both tuple-of-tensors (linear layers) and plain-tensor
  (LogitsProcessorWithLoRA) storage via isinstance check. All linear layer
  subclasses inherit this for free.
- FusedMoEWithLoRA: override to reallocate the four w13/w2 weight tuples,
  resize adapter_enabled, rebuild the flat lora_a/b_stacked views list,
  and update max_loras. FusedMoE3DWithLoRA inherits this override.
- 22 CPU-only unit tests in tests/lora/test_reallocate_lora_weights.py
  covering shape after grow/shrink, weight preservation for surviving slots,
  zero-init of new slots, no-op before create_lora_weights, and no
  empty_cache() call inside the method.

Pre-commit: ruff-check, ruff-format, mypy-3.10 all pass.
Tests: 22/22 pass on CPU.

AI assistance was used (Claude Code). All changed lines reviewed by
@yuezhu1. This does not duplicate any existing upstream PR or issue.

Co-authored-by: Claude <noreply@anthropic.com>
Damon-Salvetore pushed a commit to Damon-Salvetore/vllm that referenced this pull request Mar 31, 2026
…t-linear-fp8

Add cuSPARSELt FP8 Linear method analysis to fp8_gemm_integration_analysis.md
danisereb pushed a commit to de-inf/vllm that referenced this pull request Apr 5, 2026
…dp-tcp-placement

Port multi-node DP fixes from upstream PR vllm-project#38630
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants