[Model] Add num_cached_tokens for PoolingRequestOutput by noooop · Pull Request #27378 · vllm-project/vllm

noooop · 2025-10-23T01:23:24Z

Purpose

Add num_cached_tokens for PoolingRequestOutput
fix ci failure in main

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: wang.yuqi <noooop@126.com>

noooop · 2025-10-23T03:24:10Z

Start CI test to check what CI failures in main that still need to be fixed.

DarkLight1337 · 2025-10-23T05:35:05Z

-        vllm_outputs = vllm_model.classify(example_prompts)
+
+        # First Run
+        vllm_model.classify(example_prompts)


Should we check that initially the number of cached tokens is zero?

DarkLight1337 · 2025-10-23T05:36:00Z

                PoolingRequestOutput[Any](
                    request_id="",
                    outputs=processed_outputs,
+                    num_cached_tokens=getattr(


Why do we need getattr here? In what case is that not available?

the result of io_processor might not have this value

Please unblock Language Models Test (Extended Pooling) and Language Models Test (MTEB) to check for CI failures in the main branch that still need to be fixed.

Hmm... I think we should make this a property of PoolingRequestOutput itself?

Something like

@property def num_cached_tokens(self) -> int: return getattr(self.processed_outputs, "num_cached_tokens", 0)

Can we first merge this and discuss the issue in #26973? This PR is actually intended to fix CI failures in the main branch for #27329.

…27378) Signed-off-by: wang.yuqi <noooop@126.com>

…o step_forward * 'step_forward' of https://github.com/raindaywhu/vllm: (148 commits) [Model] Add MoE support for NemotronH (vllm-project#25863) [Metrics] [KVConnector] Add connector prefix cache hit rate stats (vllm-project#26245) [CI] Reorganize entrypoints tests (vllm-project#27403) add SLA information into comparison graph for vLLM Benchmark Suite (vllm-project#25525) [CI/Build] Fix AMD CI: test_cpu_gpu.py (vllm-project#27388) [Bugfix] Fix args settings for guided decoding args (vllm-project#27375) [CI/Build] Fix Prithvi plugin test (vllm-project#27393) [Chore] Remove duplicate `has_` functions in vllm.utils (vllm-project#27372) [Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#27378) [V1][spec decode] return logprobs for spec decoding (vllm-project#26060) [CORE] Support Prefix Caching with Prompt Embeds (vllm-project#27219) [Bugfix][Core] running queue index leakage exception (vllm-project#26754) [Bugfix] Fix incorrect kv cache metrics in grafana.json (vllm-project#27133) [Bugfix] Fix SLA tuner initialization (vllm-project#27355) [Bugfix] Fix deepseek-ocr multi-image inference and add `merge_by_field_config=True` with tensor schema support (vllm-project#27361) [MLA] Bump FlashMLA (vllm-project#27354) [Chore] Separate out system utilities from vllm.utils (vllm-project#27201) [BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 (vllm-project#27128) [Feature] publisher default set zmq in kv_event config (vllm-project#26915) [Prefix Cache] Use LoRA name for consistent KV-cache block hashing (vllm-project#27211) ...

…27378) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

…27378) Signed-off-by: wang.yuqi <noooop@126.com>

mergify Bot added frontend v1 labels Oct 23, 2025

+ num_cached_tokens

11f60fc

Signed-off-by: wang.yuqi <noooop@126.com>

noooop force-pushed the pooling_num_cached_tokens branch from 0ee0fd4 to 11f60fc Compare October 23, 2025 01:27

fix

157df60

Signed-off-by: wang.yuqi <noooop@126.com>

noooop commented Oct 23, 2025

View reviewed changes

Comment thread tests/models/language/pooling/test_classification.py Outdated

noooop added 2 commits October 23, 2025 11:19

update

f17f3f4

Signed-off-by: wang.yuqi <noooop@126.com>

Merge branch 'main' into pooling_num_cached_tokens

f1c2171

noooop marked this pull request as ready for review October 23, 2025 03:21

noooop requested review from aarnphm and chaunceyjiang as code owners October 23, 2025 03:21

noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 23, 2025

noooop commented Oct 23, 2025

View reviewed changes

Comment thread tests/models/language/pooling/test_auto_prefix_cache_support.py

DarkLight1337 reviewed Oct 23, 2025

View reviewed changes

DarkLight1337 approved these changes Oct 23, 2025

View reviewed changes

DarkLight1337 merged commit 3729ed0 into vllm-project:main Oct 23, 2025
50 of 51 checks passed

noooop deleted the pooling_num_cached_tokens branch October 23, 2025 07:37

usberkeley pushed a commit to usberkeley/vllm that referenced this pull request Oct 23, 2025

[Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#…

bfa2a59

…27378) Signed-off-by: wang.yuqi <noooop@126.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#…

a7014b9

…27378) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#…

6e309f0

…27378) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025

[Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#…

c8c4c0c

…27378) Signed-off-by: wang.yuqi <noooop@126.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#…

5445798

…27378) Signed-off-by: wang.yuqi <noooop@126.com>

This was referenced Nov 11, 2025

[Doc][Last/N] Improve all pooling task | Refactor pooling-related documentation #27963

Closed

[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling #27145

Merged

noooop mentioned this pull request Nov 21, 2025

Improve enable chunked_prefill & prefix_caching logic. #26623

Merged

5 tasks

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#…

b38b9cf

…27378) Signed-off-by: wang.yuqi <noooop@126.com>

noooop mentioned this pull request Dec 8, 2025

[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API #26686

Merged

5 tasks

noooop mentioned this pull request Dec 15, 2025

[Model][Last/N] Improve all pooling task | Generate runner supports using embed and token_embed tasks. #30672

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model] Add num_cached_tokens for PoolingRequestOutput#27378

[Model] Add num_cached_tokens for PoolingRequestOutput#27378
DarkLight1337 merged 4 commits intovllm-project:mainfrom
noooop:pooling_num_cached_tokens

noooop commented Oct 23, 2025 •

edited by github-actions Bot

Loading

Uh oh!

Uh oh!

noooop commented Oct 23, 2025

Uh oh!

Uh oh!

DarkLight1337 Oct 23, 2025

Uh oh!

DarkLight1337 Oct 23, 2025

Uh oh!

noooop Oct 23, 2025

Uh oh!

noooop Oct 23, 2025 •

edited

Loading

Uh oh!

DarkLight1337 Oct 23, 2025

Uh oh!

DarkLight1337 Oct 23, 2025 •

edited

Loading

Uh oh!

noooop Oct 23, 2025 •

edited

Loading

Uh oh!

DarkLight1337 Oct 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

noooop commented Oct 23, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Uh oh!

noooop commented Oct 23, 2025

Uh oh!

Uh oh!

DarkLight1337 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

noooop Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

noooop Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noooop Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

noooop commented Oct 23, 2025 •

edited by github-actions Bot

Loading

noooop Oct 23, 2025 •

edited

Loading

DarkLight1337 Oct 23, 2025 •

edited

Loading

noooop Oct 23, 2025 •

edited

Loading