Skip to content

Merge QKV into one linear layer#15

Merged
WoosukKwon merged 4 commits intomainfrom
qkv_combined
Apr 2, 2023
Merged

Merge QKV into one linear layer#15
WoosukKwon merged 4 commits intomainfrom
qkv_combined

Conversation

@zhuohan123
Copy link
Copy Markdown
Member

@zhuohan123 zhuohan123 commented Mar 30, 2023

@WoosukKwon Feel free to merge this after your review.

@zhuohan123 zhuohan123 requested a review from WoosukKwon March 30, 2023 17:04
Copy link
Copy Markdown
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your effort. Please check my comments.

@WoosukKwon WoosukKwon mentioned this pull request Apr 2, 2023
@WoosukKwon
Copy link
Copy Markdown
Collaborator

The performance regression problem in this PR is fixed in #20 . I will merge the two PRs together when PR #20 is approved.

@WoosukKwon WoosukKwon self-requested a review April 2, 2023 07:23
@WoosukKwon WoosukKwon merged commit 1f01a18 into main Apr 2, 2023
@zhuohan123 zhuohan123 deleted the qkv_combined branch June 18, 2023 07:22
bigPYJ1151 added a commit to bigPYJ1151/vllm that referenced this pull request Sep 12, 2023
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
slyalin pushed a commit to slyalin/vllm that referenced this pull request Mar 26, 2024
…no-model-executor-opt

[CPU] Avoid copy result and force allocation
z103cb referenced this pull request in z103cb/opendatahub_vllm Apr 22, 2024
This PR updates our grpc_server to add TGIS-style logs similar to
https://github.com/IBM/text-generation-inference/blob/main/router/src/grpc_server.rs#L504-L512

This also disables the vllm per-request logging so that we don't
double-log each request

The timing info collected here is pretty rough, it doesn't plumb into
the LLMEngine, it just times the generators to get the total time spent
in the engine. We could do better, but this is a start.

Example logs:

```
INFO 04-09 21:51:01 logs.py:43] generate_stream{input=[b'This is the story of Obama ridin...'] prefix_id= input_chars=[70] params=sampling { } stopping { max_new_tokens: 200 min_new_tokens: 16 } response { } decoding { } tokenization_time=0.45ms queue_and_inference_time=1096.67ms time_per_token=5.48ms total_time=1097.12ms input_toks=16}: Streaming response generated 200 tokens before NOT_FINISHED, output 848 chars: b' California. The story is told i...'
INFO 04-09 21:51:08 logs.py:43] generate{input=[b'Lorem ipsum dolor sit amet, cons...', b'foooood man where is it'] prefix_id= input_chars=[469] params=sampling { } stopping { max_new_tokens: 20 min_new_tokens: 16 } response { } decoding { } tokenization_time=2.03ms queue_and_inference_time=122.23ms time_per_token=6.11ms total_time=124.26ms input_toks=124}: Sub-request 0 from batch of 2 generated 20 tokens before MAX_TOKENS, output 25 chars: b'?\\n\\n<!--\\n<!--\\n<!--\\n<!--\\n<!'
INFO 04-09 21:51:08 logs.py:43] generate{input=[b'Lorem ipsum dolor sit amet, cons...', b'foooood man where is it'] prefix_id= input_chars=[469] params=sampling { } stopping { max_new_tokens: 20 min_new_tokens: 16 } response { } decoding { } tokenization_time=2.07ms queue_and_inference_time=122.22ms time_per_token=6.11ms total_time=124.29ms input_toks=7}: Sub-request 1 from batch of 2 generated 20 tokens before MAX_TOKENS, output 70 chars: b"?\\nI don't know.\\nI don't know.\\nI ..."
```

---------

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <joe@joerun.de>
fxmarty pushed a commit to fxmarty/vllm-public that referenced this pull request May 31, 2024
Correctly calculating the same value for the required cache blocks num for all torchrun processes
ykim362 pushed a commit to ykim362/vllm that referenced this pull request Jun 17, 2024
…-wenxh/fp8-on-a100-v5-pr

Revert "0612 kernel of FP8 on A100"
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
robertgshaw2-redhat referenced this pull request in robertgshaw2-redhat/vllm May 1, 2025
* stash fixed double free issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* fixed issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
robertgshaw2-redhat referenced this pull request in robertgshaw2-redhat/vllm May 3, 2025
* [Update] LMcache connector v1 implementation

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* [Add] examples for disaggregated prefill

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* [add] extra information about evns

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* Initial stubs for P/D scheduling changes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Updates

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Rs branch (#3)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Rs branch (#5)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Remove Unneeded Arguments (#7)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Improve disagg-example.sh (#8)

- fix spelling
- CUDA_VISIBLE_DEVICES should be set externally

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* added connector

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* update

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* remove

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* seems to load properly

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Revert "updated"

This reverts commit 97316d9.

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* added

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* diffs for local dev on macos

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* update

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updaed

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Checkpoint.

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* WIP

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated on scheduler side

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Hacking away

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* cleanup

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* ensure request removed from running list

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Runs E2E. Garbage output. Crashes on 2nd request

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* rename files

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* update

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Second request no longer crashes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Remove gpu_model_runner hacks

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Clean up Justfile

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* justfile edits

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes - lm_eval gsm8k has correctness

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* "just delete the assert"

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* fixup precommit issues

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated (#12)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Add Accuracy Test (#13)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Preemption Bugfixes (#15)

* stash fixed double free issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* fixed issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated (#16)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Fix Bad Merge | Fix Memory Leak in Upstream (#18)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* fix merge

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* clean up justfile, examples

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* more cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* more cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* more cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* more cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* More cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* more cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* more cleanup, precommit fixes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* More cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* run_accuracy_test.sh UX

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* squash warnings

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* pre-commit

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Add get_finished to base kv connector

Signed-off-by: mgoin <mgoin64@gmail.com>

* revert test.txt

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* review comments

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

---------

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
robertgshaw2-redhat referenced this pull request in robertgshaw2-redhat/vllm May 4, 2025
* [Update] LMcache connector v1 implementation

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* [Add] examples for disaggregated prefill

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* [add] extra information about evns

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* Initial stubs for P/D scheduling changes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Updates

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Rs branch (#3)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Rs branch (#5)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Remove Unneeded Arguments (#7)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Improve disagg-example.sh (#8)

- fix spelling
- CUDA_VISIBLE_DEVICES should be set externally

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* added connector

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* update

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* remove

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* seems to load properly

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Revert "updated"

This reverts commit 97316d9.

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* added

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* diffs for local dev on macos

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* update

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updaed

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Checkpoint.

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* WIP

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated on scheduler side

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Hacking away

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* cleanup

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* ensure request removed from running list

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Runs E2E. Garbage output. Crashes on 2nd request

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* rename files

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* update

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Second request no longer crashes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Remove gpu_model_runner hacks

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Clean up Justfile

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* justfile edits

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes - lm_eval gsm8k has correctness

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* "just delete the assert"

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* fixup precommit issues

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated (#12)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Add Accuracy Test (#13)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Preemption Bugfixes (#15)

* stash fixed double free issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* fixed issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated (#16)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Fix Bad Merge | Fix Memory Leak in Upstream (#18)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* fix merge

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup code

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup code

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatted

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* revert

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* more spurious changes

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>

* Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>

---------

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
robertgshaw2-redhat referenced this pull request in robertgshaw2-redhat/vllm May 6, 2025
* [Update] LMcache connector v1 implementation

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* [Add] examples for disaggregated prefill

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* [add] extra information about evns

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* Initial stubs for P/D scheduling changes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Updates

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Rs branch (#3)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Rs branch (#5)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Remove Unneeded Arguments (#7)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Improve disagg-example.sh (#8)

- fix spelling
- CUDA_VISIBLE_DEVICES should be set externally

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* added connector

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* update

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* remove

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* seems to load properly

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Revert "updated"

This reverts commit 97316d9.

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* added

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* diffs for local dev on macos

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* update

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updaed

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Checkpoint.

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* WIP

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated on scheduler side

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Hacking away

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* cleanup

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* ensure request removed from running list

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Runs E2E. Garbage output. Crashes on 2nd request

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* rename files

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* update

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Second request no longer crashes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Remove gpu_model_runner hacks

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Clean up Justfile

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* justfile edits

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes - lm_eval gsm8k has correctness

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* "just delete the assert"

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* fixup precommit issues

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated (#12)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Add Accuracy Test (#13)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Preemption Bugfixes (#15)

* stash fixed double free issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* fixed issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated (#16)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Fix Bad Merge | Fix Memory Leak in Upstream (#18)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* fix merge

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup code

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup code

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatted

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* revert

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* more spurious changes

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Support MLA in NIXL connector

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* WIP adding tests

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* wip

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

---------

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
dcmaddix pushed a commit to dcmaddix/vllm that referenced this pull request Oct 17, 2025
yma11 pushed a commit to yma11/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
tjtanaa pushed a commit to tjtanaa/vllm that referenced this pull request Jan 29, 2026
…d_model_runner

[Worker]Feat/ar gpu worker and model runner
roy-shih added a commit to UnieAI/vllm that referenced this pull request Mar 31, 2026
逐一 grep 驗證所有已完成項目的整合程式碼確實存在:
- #3 spec decode: _batch_precompute_spec_decode() 已在 scheduler.py
- vllm-project#5 builtin hash: 已在 config/cache.py Literal type
- vllm-project#15 batch spec decode: _precomputed_spec 快速路徑已在迴圈中

清除 strikethrough 噪音,統一為乾淨的「已完成/未完成」兩表格式。

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants