[Model] Add num_cached_tokens for PoolingRequestOutput#27378
Merged
DarkLight1337 merged 4 commits intovllm-project:mainfrom Oct 23, 2025
Merged
[Model] Add num_cached_tokens for PoolingRequestOutput#27378DarkLight1337 merged 4 commits intovllm-project:mainfrom
DarkLight1337 merged 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: wang.yuqi <noooop@126.com>
0ee0fd4 to
11f60fc
Compare
noooop
commented
Oct 23, 2025
Collaborator
Author
|
Start CI test to check what CI failures in main that still need to be fixed. |
noooop
commented
Oct 23, 2025
| vllm_outputs = vllm_model.classify(example_prompts) | ||
|
|
||
| # First Run | ||
| vllm_model.classify(example_prompts) |
Member
There was a problem hiding this comment.
Should we check that initially the number of cached tokens is zero?
| PoolingRequestOutput[Any]( | ||
| request_id="", | ||
| outputs=processed_outputs, | ||
| num_cached_tokens=getattr( |
Member
There was a problem hiding this comment.
Why do we need getattr here? In what case is that not available?
Collaborator
Author
There was a problem hiding this comment.
the result of io_processor might not have this value
Collaborator
Author
There was a problem hiding this comment.
Please unblock Language Models Test (Extended Pooling) and Language Models Test (MTEB) to check for CI failures in the main branch that still need to be fixed.
Member
There was a problem hiding this comment.
Hmm... I think we should make this a property of PoolingRequestOutput itself?
Member
There was a problem hiding this comment.
Something like
@property
def num_cached_tokens(self) -> int:
return getattr(self.processed_outputs, "num_cached_tokens", 0)
Collaborator
Author
DarkLight1337
approved these changes
Oct 23, 2025
usberkeley
pushed a commit
to usberkeley/vllm
that referenced
this pull request
Oct 23, 2025
…27378) Signed-off-by: wang.yuqi <noooop@126.com>
845473182
pushed a commit
to raindaywhu/vllm
that referenced
this pull request
Oct 24, 2025
…o step_forward * 'step_forward' of https://github.com/raindaywhu/vllm: (148 commits) [Model] Add MoE support for NemotronH (vllm-project#25863) [Metrics] [KVConnector] Add connector prefix cache hit rate stats (vllm-project#26245) [CI] Reorganize entrypoints tests (vllm-project#27403) add SLA information into comparison graph for vLLM Benchmark Suite (vllm-project#25525) [CI/Build] Fix AMD CI: test_cpu_gpu.py (vllm-project#27388) [Bugfix] Fix args settings for guided decoding args (vllm-project#27375) [CI/Build] Fix Prithvi plugin test (vllm-project#27393) [Chore] Remove duplicate `has_` functions in vllm.utils (vllm-project#27372) [Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#27378) [V1][spec decode] return logprobs for spec decoding (vllm-project#26060) [CORE] Support Prefix Caching with Prompt Embeds (vllm-project#27219) [Bugfix][Core] running queue index leakage exception (vllm-project#26754) [Bugfix] Fix incorrect kv cache metrics in grafana.json (vllm-project#27133) [Bugfix] Fix SLA tuner initialization (vllm-project#27355) [Bugfix] Fix deepseek-ocr multi-image inference and add `merge_by_field_config=True` with tensor schema support (vllm-project#27361) [MLA] Bump FlashMLA (vllm-project#27354) [Chore] Separate out system utilities from vllm.utils (vllm-project#27201) [BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 (vllm-project#27128) [Feature] publisher default set zmq in kv_event config (vllm-project#26915) [Prefix Cache] Use LoRA name for consistent KV-cache block hashing (vllm-project#27211) ...
0xrushi
pushed a commit
to 0xrushi/vllm
that referenced
this pull request
Oct 26, 2025
…27378) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi
pushed a commit
to 0xrushi/vllm
that referenced
this pull request
Oct 26, 2025
…27378) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
ilmarkov
pushed a commit
to neuralmagic/vllm
that referenced
this pull request
Nov 7, 2025
…27378) Signed-off-by: wang.yuqi <noooop@126.com>
rtourgeman
pushed a commit
to rtourgeman/vllm
that referenced
this pull request
Nov 10, 2025
…27378) Signed-off-by: wang.yuqi <noooop@126.com>
This was referenced Nov 11, 2025
5 tasks
devpatelio
pushed a commit
to SumanthRH/vllm
that referenced
this pull request
Nov 29, 2025
…27378) Signed-off-by: wang.yuqi <noooop@126.com>
5 tasks
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.