[MFM-2025-02-21] Merge main to llama fp8, DeepSeekV3 and PTPC-FP8 by tjtanaa · Pull Request #445 · ROCm/vllm

tjtanaa · 2025-02-24T10:04:40Z

Please direct your PRs to the upstream vllm (https://github.com/vllm-project/vllm.git)

Accepting PRs into the ROCm fork (https://github.com/ROCm/vllm) will require a clear previously communicated exception

…lling (vllm-project#12713) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

…oject#12729)

…llm-project#12634) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: mgoin <michael@neuralmagic.com>

…2748)

…-project#12760)

…or_pytorch'' for --tensor-parallel-size more than 1 (vllm-project#12546)

Signed-off-by: youkaichao <youkaichao@gmail.com>

…n ROCm (ROCm#406) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Merged via CLI script

Signed-off-by: Lu Fang <lufang@fb.com>

…t#12793)

Signed-off-by: youkaichao <youkaichao@gmail.com>

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

…ge case (vllm-project#13358) Signed-off-by: Isotr0py <2037008807@qq.com>

…ct#13362) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

…oject#12304) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

…llm-project#13068) Signed-off-by: Huy Do <huydhn@gmail.com>

…EmbeddingItems` (vllm-project#13380)

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

Signed-off-by: yan ma <yan.ma@intel.com>

…2_17

…rge_25_02_17

Upstream merge 25 02 17

…odeowners (ROCm#431)

* Enabling ROCm CI on MI250 machines: - correct build target - correct queue Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> --------- Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

* Optimization for quantized gemm skinny sizes * lint fix * Add support for bf16/fp16 * code cleanup * code cleanup * lint fix2 * cleanup * Moved the logic into tuned gemm to preserve API compatibility --------- Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

* Removing gfx940 and gfx941 targets. These have been deprecated in favor of gfx942 for MI300X Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> * Remove from custom kernels as well --------- Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Signed-off-by: Divakar Verma <divakar.verma@amd.com>

* Advance torch commit to be past pytorch/pytorch#144942 to fix tunable ops * Make sure to use the submodule commit compatible with the main aiter commit

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

…edLinear layer Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

maleksan85 and others added 30 commits February 5, 2025 03:58

[ROCM][AMD][TRITON] Halving warps number for fw_prefill to reduce spi…

64862d1

…lling (vllm-project#12713) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>

Refactor Linear handling in TransformersModel (vllm-project#12727)

249824c

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

[VLM] Add MLA with pure RoPE support for deepseek-vl2 models (vllm-pr…

98fd089

…oject#12729)

[Misc] Bump the compressed-tensors version (vllm-project#12736)

686006a

[Model][Quant] Fix GLM, Fix fused module mappings for quantization (v…

7ff7a63

…llm-project#12634) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: mgoin <michael@neuralmagic.com>

[Doc] Update PR Reminder with link to Developer Slack (vllm-project#1…

58b218d

…2748)

[Bugfix] Fix OpenVINO model runner (vllm-project#12750)

fcf2e3d

[V1][Misc] Shorten FinishReason enum and use constant strings (vllm…

3d09e59

…-project#12760)

[Doc] Remove performance warning for auto_awq.md (vllm-project#12743)

c53dc46

[Bugfix] Fix 'ModuleNotFoundError: No module named 'intel_extension_f…

022bcc7

…or_pytorch'' for --tensor-parallel-size more than 1 (vllm-project#12546)

[core][distributed] exact ray placement control (vllm-project#12732)

bc1bdec

Signed-off-by: youkaichao <youkaichao@gmail.com>

The code assumes WARP_SIZE to be equal to 32, which is not the case o…

f65ecc9

…n ROCm (ROCm#406) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Merging PR vllm-project#12536

4c3aac5

Merged via CLI script

[Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU)

af8486d

Add: Support for Sparse24Bitmask Compressed Models

3b2005e

[VLM] Use shared field to pass token ids to model

a4ce74c

[Docs] Drop duplicate [source] links

9a5b155

[VLM] Qwen2.5-VL

bf3b79e

[VLM] Update compatibility with transformers 4.49

75404d0

[ROCm][Kernel] Using the correct warp_size value

5b19b93

[Bugfix] Better FP8 supported defaults

76abd0c

[Misc][Easy] Remove the space from the file name

9cdea30

[Model] LoRA Support for Ultravox model (vllm-project#11253)

d88506d

[Bugfix] Fix the test_ultravox.py's license (vllm-project#12806)

56534cd

Signed-off-by: Lu Fang <lufang@fb.com>

Improve TransformersModel UX (vllm-project#12785)

1a6fcad

[Misc] Remove duplicated DeepSeek V2/V3 model definition (vllm-projec…

449d1bc

…t#12793)

[Misc] Improve error message for incorrect pynvml (vllm-project#12809)

0408efc

Signed-off-by: youkaichao <youkaichao@gmail.com>

[Misc] Update w2 scale loading for GPTQMarlinMoE (vllm-project#12757)

7ca9934

[Docs] Add Google Cloud Slides (vllm-project#12814)

cefd56e

[Attention] Use FA3 for MLA on Hopper (vllm-project#12807)

c786e75

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

WoosukKwon and others added 25 commits February 16, 2025 10:02

[V1][PP] Cache Intermediate Tensors (vllm-project#13353)

e18227b

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

[Bugfix][Platform][CPU] Fix cuda platform detection on CPU backend ed…

d67cc21

…ge case (vllm-project#13358) Signed-off-by: Isotr0py <2037008807@qq.com>

[V1][BugFix] Clean up rejection sampler & Fix warning msg (vllm-proje…

69e1d23

…ct#13362) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

[V1][Misc] Avoid unnecessary log output (vllm-project#13289)

2010f04

[Feature][Spec Decode] Simplify the use of Eagle Spec Decode (vllm-pr…

46cdd59

…oject#12304) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

Fix spelling error in index.md (vllm-project#13369)

f857311

Run v1 benchmark and integrate with PyTorch OSS benchmark database (v…

4518683

…llm-project#13068) Signed-off-by: Huy Do <huydhn@gmail.com>

[MISC] tiny fixes (vllm-project#13378)

238dfc8

[VLM] Check required fields before initializing field config in `Dict…

7b623fc

…EmbeddingItems` (vllm-project#13380)

[Model] Support Mamba2 (Codestral Mamba) (vllm-project#9292)

1f69c4a

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

[Bugfix] fix xpu communicator (vllm-project#13368)

30513d1

Signed-off-by: yan ma <yan.ma@intel.com>

[Bugfix] Fix VLLM_USE_MODELSCOPE issue (vllm-project#13384)

ce77eb9

Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…

ce342c7

…2_17

Merge remote-tracking branch 'Isotr0py/local-lookup' into upstream_me…

669fc3f

…rge_25_02_17

Merge pull request ROCm#430 from ROCm/upstream_merge_25_02_17

365687d

Upstream merge 25 02 17

Updating PR template to point people to the upstream repo. Updating c…

4fd2f5b

…odeowners (ROCm#431)

Enabling the ROCm-vLLM CI on MI250 machines (ROCm#432)

17b26bd

* Enabling ROCm CI on MI250 machines: - correct build target - correct queue Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> --------- Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

Restricting FP8 wvSplitk to MI300x (ROCm#439)

b63a984

resolve diff for mixtral8x7B configs (ROCm#437)

5a6afcc

Signed-off-by: Divakar Verma <divakar.verma@amd.com>

Torch version bump to fix tunable ops (ROCm#442)

ff13c7a

* Advance torch commit to be past pytorch/pytorch#144942 to fix tunable ops * Make sure to use the submodule commit compatible with the main aiter commit

merge origin/main into merge-main-to-llama-fp8

32cc0fc

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

Merge remote-tracking branch 'origin/main' into merge-main-to-llama-fp8

9dceba0

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

bugfix: remove unused argument passed to the forward pass of Replicat…

fd88257

…edLinear layer Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

tjtanaa marked this pull request as ready for review February 24, 2025 10:04

hongxiayang approved these changes Feb 25, 2025

View reviewed changes

hongxiayang merged commit d7fefdf into ROCm:llama_fp8_12062024 Feb 25, 2025

tjtanaa mentioned this pull request Feb 25, 2025

[Quant] [Feature] Per-Token-Activation Per-Channel-Weight FP8 Quantization #412

Closed

vllmellm deleted the merge-main-to-llama-fp8 branch March 12, 2025 04:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MFM-2025-02-21] Merge main to llama fp8, DeepSeekV3 and PTPC-FP8#445

[MFM-2025-02-21] Merge main to llama fp8, DeepSeekV3 and PTPC-FP8#445
hongxiayang merged 1120 commits intoROCm:llama_fp8_12062024from
EmbeddedLLM:merge-main-to-llama-fp8

tjtanaa commented Feb 24, 2025 •

edited by github-actions Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

tjtanaa commented Feb 24, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

tjtanaa commented Feb 24, 2025 •

edited by github-actions Bot

Loading