Skip to content

Implement LLaMA#9

Merged
zhuohan123 merged 16 commits intomainfrom
llama
Mar 30, 2023
Merged

Implement LLaMA#9
zhuohan123 merged 16 commits intomainfrom
llama

Conversation

@WoosukKwon
Copy link
Copy Markdown
Collaborator

@WoosukKwon WoosukKwon commented Mar 26, 2023

TODO:

  • Test against HF implementation
  • Add TP support (@zhuohan123)

@WoosukKwon
Copy link
Copy Markdown
Collaborator Author

@zhuohan123 Please feel free to approve and merge this PR once you think it's ready.

@zhuohan123 zhuohan123 self-requested a review March 29, 2023 06:37
@zhuohan123 zhuohan123 merged commit 80a2f81 into main Mar 30, 2023
@WoosukKwon WoosukKwon deleted the llama branch April 12, 2023 03:12
v1nc3nt27 pushed a commit to v1nc3nt27/vllm that referenced this pull request Sep 12, 2023
dont error if user doesnt have kernels installed
bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Dec 29, 2023
heheda12345 added a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025
* code from ds

Signed-off-by: youkaichao <youkaichao@gmail.com>

* doc from ds

Signed-off-by: youkaichao <youkaichao@gmail.com>

* Fixes for support_materials/2-tilelang/

Signed-off-by: mgoin <mgoin64@gmail.com>

* Fix example 1

Signed-off-by: mgoin <mgoin64@gmail.com>

* Fix Einsum in deepgemm

* Fix `libc10.so` unimported error

* fix reference code

Signed-off-by: youkaichao <youkaichao@gmail.com>

* adding missing indexer args

* passing index args into the module

* init

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* build indexer k cache medadata

* prefill indexer, but weight_proj will output -inf

* unqiantized paged indexer, still have -inf issue

* remove support material

* adding topk_indices mask

* add weight scale

* unittest infrastructure and fix weight_proj, numeric error due to quantization

* varlen prefill passed

* paged prefill

* add indices mask

---------

Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
isaick pushed a commit to isaick/vllm that referenced this pull request Oct 19, 2025
yma11 pushed a commit to yma11/vllm that referenced this pull request Nov 10, 2025
* add wf8af8 pass

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

* remove redundant func

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

* add env into vllm.envs

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

---------

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
dik654 pushed a commit to dik654/vllm-for-study that referenced this pull request Nov 18, 2025
…ections

Manufacturing enhancements:
- Add complete Vision Inspection MCP with Vision AI defect detection
- Add Manufacturing MES MCP with PostgreSQL integration
- Include detailed defect classification and statistics
- Add ROI analysis showing 78% cost reduction and 99.6% time savings

Healthcare enhancements:
- Enhance existing Medical OCR, Drug Interaction, and EHR MCPs
- Add ROI analysis showing 97.2% time reduction
- Include medical accident prevention benefits (5억원 annual savings)
- Demonstrate HIPAA-compliant prescription OCR workflow

Summary:
- Sections vllm-project#5-8: Fully detailed implementations (2,000+ lines each)
- Sections vllm-project#9-10: Enhanced with complete code + ROI
- Sections vllm-project#11-20+: Comprehensive summaries covering all major industries
- Total guide provides 20+ real-world MCP + Agent architecture patterns
chopper0126 pushed a commit to chopper0126/vllm that referenced this pull request Dec 12, 2025
prashanth058 pushed a commit to prashanth058/vllm that referenced this pull request Dec 12, 2025
sriumcp referenced this pull request in inference-sim/vllm Jan 26, 2026
Update plan document to account for completed work:
- Document PR #0 (EngineCoreEvent removal) as completed prerequisite
- Clarify that do_tracing() is current OTEL mechanism (not legacy)
- Update PR #9 to keep RequestJourneyEvent dataclass (needed for Prometheus)
- Fix terminology: 'legacy' = EngineCoreEvent (removed), 'current' = RequestJourneyEvent
- Add PR #0 to dependencies, timeline, and progress tracking sections

Key corrections:
- do_tracing() will NOT be removed (it's the current system)
- RequestJourneyEvent dataclass will NOT be removed (needed for metrics)
- Only buffering LOGIC will be removed in PR #9

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp referenced this pull request in inference-sim/vllm Jan 26, 2026
/9) (#8)

* [Docs] Update journey tracing plan to reflect completed PR #0

Update plan document to account for completed work:
- Document PR #0 (EngineCoreEvent removal) as completed prerequisite
- Clarify that do_tracing() is current OTEL mechanism (not legacy)
- Update PR #9 to keep RequestJourneyEvent dataclass (needed for Prometheus)
- Fix terminology: 'legacy' = EngineCoreEvent (removed), 'current' = RequestJourneyEvent
- Add PR #0 to dependencies, timeline, and progress tracking sections

Key corrections:
- do_tracing() will NOT be removed (it's the current system)
- RequestJourneyEvent dataclass will NOT be removed (needed for metrics)
- Only buffering LOGIC will be removed in PR #9

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Feature] Initialize OTEL tracer in scheduler for journey tracing

Add tracer initialization in Scheduler.__init__() to support dual-stream
journey tracing architecture. This is the foundation for PR #2 which will
create and manage core spans.

Changes:
- Add defensive SpanAttributes import with None fallback
- Initialize tracer when enable_journey_tracing=True and endpoint configured
- Add try/except with warning log for graceful degradation
- Add otlp_traces_endpoint parameter to test utilities
- Add 4 comprehensive tests with proper mocking

Safety guarantees:
- Zero per-request state (tracer is class-level only)
- Zero overhead when disabled (boolean + endpoint guard)
- No spans created (initialization only)
- No cleanup needed (shared tracer instance)
- Backward compatible (all parameters optional)

Test results: All 85 tests passing (81 existing + 4 new)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
tjtanaa pushed a commit to tjtanaa/vllm that referenced this pull request Jan 29, 2026
Srinivasoo7 pushed a commit to Srinivasoo7/vllm that referenced this pull request Mar 4, 2026
…Manager

- Add store_threshold >= 2 validation in FilterReusedOffloadingManager
  constructor (mirrors the existing max_tracker_size >= 1 guard)
- Fix cpu.py gate from > 1 to >= 2; update comment to clarify that
  values < 2 disable filtering
- Add internal assertions to test_filter_reused_manager to verify
  tracker eviction and count reset (Comments vllm-project#8 and vllm-project#9)
- Remove tests/v1/kv_offload/__init__.py (not needed for pytest discovery)
- Remove accidentally tracked dev-workflow files (.patch, diff*.txt,
  error.txt, log files, mypy/test output files)

Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
danisereb pushed a commit to de-inf/vllm that referenced this pull request Apr 5, 2026
…-nongated-moe

[Bugfix] Fix BF16 trtllm-gen MoE weight corruption for non-gated models
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants