Merge upstream vLLM code into gfx11 #983
Merged
Merged
Conversation
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
…2537) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
…llm-project#42766) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>
…[2/N] (vllm-project#43039) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: george <george@inferact.ai> Co-authored-by: george <george@inferact.ai>
…hase A and Phase B (vllm-project#42289) Signed-off-by: Qiuyang Yue <yueqiuyang1389@gmail.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: gemini-code-assist <noreply@google.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
vllm-project#42671) Signed-off-by: junyanxu <junyanxu5513@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: Gracie Guo <gracieguo@Gracies-MacBook-Pro.local> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: Gracie Guo <gracieguo@Gracies-MacBook-Pro.local> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…42946) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
…llm-project#43073) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
…tion + clear_cache (vllm-project#42117) Signed-off-by: hao-aaron <ahao@anyscale.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…oject#42828) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
…ject#43077) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: ZhanqiuHu <zhu@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
…te` (vllm-project#42887) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: Wang Yiwen <121547057+yiwen101@users.noreply.github.com>
…tool parser (vllm-project#43025) Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
…vllm-project#42080) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
…-project#42994) Signed-off-by: Dao Le <Dao007forever@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
…ous layers (vllm-project#42976) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
…r autotune (vllm-project#43119) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Signed-off-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
…casts in rotary path (vllm-project#42833) Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com>
…roject#43550) Signed-off-by: Aditya Singh <adisin650@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
… & non-streaming paths (vllm-project#43662) Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
…in-aligned W4A16 shapes (vllm-project#43731) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
…rgs (vllm-project#43401) Signed-off-by: Ashwin Giridharan <girida@amazon.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
…E=1 (vllm-project#39155) Signed-off-by: Injae Ryou <injaeryou@gmail.com>
Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…ORI_INTERNODE_KERNEL (vllm-project#41751) Signed-off-by: jatseng-ai <jatseng@amd.com>
…continued) (vllm-project#43361) Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com> Signed-off-by: Chris Leonard <chleonar@redhat.com> Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com> Co-authored-by: Shengqi Chen <harry-chen@outlook.com>
Signed-off-by: Minh Vu <vuhoangminh97@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
51987e3 to
6e4fc54
Compare
Signed-off-by: Callum Mitchell <callumm@amd.com> Signed-off-by: <callumm@amd.com>
6e4fc54 to
112c8cb
Compare
mgehre-amd
approved these changes
Jun 2, 2026
mgehre-amd
left a comment
There was a problem hiding this comment.
Looks good, thanks!
Build is failing (OOM for skinny int4 GEMM?), please check
Author
Looking into this now. |
Signed-off-by: <callumm@amd.com>
Author
|
Reducing MAX_JOBS from 2 -> 1 in CI seemed to do the trick, at the cost of somewhat slower build times. I'll merge this if the rest of CI looks good. |
|
@amd-callumm The performance test job failed on runner |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Merge all commits from vllm-project/vllm:main since the last common ancestor with ROCm/vllm:gfx11.
The commit count and change volume are huge, and included a fair number of conflict resolutions. Thus, substantial testing is needed to ensure critical functionality and optimizations are not lost.
Customer safeguard: in the event that some functionality breaks or performance regresses as a result of this merge, the gfx11_20260528 tag can be used to obtain stable pre-merge code. This package is based on the May 28 nightly build which showed good numbers and test results during nightly regressions.
Test Plan
Run representative subsets of all of the following test suites on a Strix Halo machine pre- and post-merge:
Run attention benchmarking tests to check for any regressions
Run a sample of models/use cases from nightly gfx1151 benchmarks to check for severe performance regressions pre- and post-merge
For any regressions found, plan next steps
Test Result
For all correctness/functionality tests, exactly the same test cases are passing both pre- and post-merge. Out of ~1700 tests, 8 tests failed, all for pre-existing and low-risk reasons such as model gating.
Attention benchmark tests shows some prefill regressions of up to 33% (SmolLM2-1.7B-Instruct-AWQ). Average regression for pure prefill cases is ~10.9%. Decode performance is ~3.3% slower on average (range: 1.7% speedup to 9.4% slowdown).
However, across 16 end-to-end tests, most showed little difference in overall TPOT/TTFT/end-to-end latency compared to pre-merge. A few short-context cases (~128 input tokens) showed 1-3% TTFT slowdown; this short context suggests that the difference is something other than attention, eg. GEMM, CPU overhead.
All of these end-to-end benchmarks successfully completed post-merge.