[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. by noooop · Pull Request #25524 · vllm-project/vllm

noooop · 2025-09-23T23:53:03Z

TL;DR

pooling_task required for llm.encode
/pooling endpoint support all pooling tasks
softmax, activation -> use_activation

Improve all pooling task (0.11.1 cut)

[Model][0/N] Improve all pooling task | clean up #25817
[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). #26414
[Model][2/N] Improve all pooling task | Support multi-vector retrieval #25370
[Frontend][3/N] Improve all pooling task | Support binary embedding response #27066
[Frontend][4/N] Improve all pooling task | Add plugin pooling task #26973
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. #25524

These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?

Purpose

Following #25370

Split the encode task into two tasks: token_embed and token_classify
- token_embed is the same as embed, using normalize as activation
- token_classify is the same as classify, default using softmax as activation (we actually allow classify and token_classify to use any activation function by setting act_fn. )

Address: #27413 (comment)

/pooling endpoint support all pooling tasks.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

noooop · 2025-09-24T07:54:27Z

@DarkLight1337

We continue our previous discussion here

following #25370

Split the encode task into two tasks: token_embed and token_classify

token_embed is the same as embed, using normalize as activation
token_classify is the same as classify, default using softmax as activation

For online scenarios (/pooling):

Keep one api now (/pooling), but we need to adaptively select token_embed or token_classify using something like encode2pooling_task method somewhere
split the /pooling api into /pooling_token_embed and /pooling_token_classify. （I personally feel that /pooling_token_embed and /pooling_token_classify looks terrible, the online /pooling API is not suitable for major changes yet. We can collect usage scenarios for a while.）

DarkLight1337 · 2025-09-24T08:30:39Z

I think for online API, the user should be able to pass the task in the request.

mergify · 2025-10-08T14:41:56Z

Documentation preview: https://vllm--25524.org.readthedocs.build/en/25524/

Signed-off-by: wang.yuqi <noooop@126.com>

noooop · 2025-10-28T10:29:47Z

/gemini review

Signed-off-by: wang.yuqi <noooop@126.com>

hmellor

Since activation was user facing, we should probably deprecate it when changing it to use_activation

noooop · 2025-10-29T13:06:08Z

Since activation was user facing, we should probably deprecate it when changing it to use_activation

Anyway, I’d like to merge this PR quickly before the release. It’s mainly documentation changes.

This feature has been a bit of a mess, and it’s been changing with every recent release. It should finally be stable now after this latest change.

DarkLight1337

I agree with @hmellor , we should still keep the old field with a deprecation

DarkLight1337 · 2025-10-29T13:08:25Z

Otherwise we would break back-compatibility, which is definitely not what a "doc PR" should do

Signed-off-by: wang.yuqi <noooop@126.com>

noooop · 2025-10-30T10:23:49Z

cc @DarkLight1337

Let’s land this.

…g) api & Document. (vllm-project#25524) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

vLLM renamed guided_decoding to structured_outputs and changed the embedding API: - SamplingParams: GuidedDecodingParams -> StructuredOutputsParams, guided_decoding -> structured_outputs (vllm-project/vllm#22772, vllm-project/vllm#29326) - Embedding: use encode(pooling_params=...) instead of generate(sampling_params=...) for pooling tasks (vllm-project/vllm#16188, vllm-project/vllm#25524) - EngineArgs: guided_decoding_backend -> structured_outputs_config User-facing "guided_decoding" key in sampling_params dict preserved for backwards compatibility. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

mergify bot added the documentation Improvements or additions to documentation label Sep 23, 2025

noooop mentioned this pull request Sep 24, 2025

[Model][2/N] Improve all pooling task | Support multi-vector retrieval #25370

Merged

5 tasks

noooop changed the title ~~[Frontend][Doc] Consolidate encode (pooling) api & Document.~~ [Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document. Oct 11, 2025

This was referenced Oct 11, 2025

[Model][0/N] Improve all pooling task | clean up #25817

Merged

[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). #26414

Merged

noooop changed the title ~~[Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document.~~ [Frontend][Doc][3/N] Improve all pooling task | Polish encode (pooling) api & Document. Oct 14, 2025

noooop changed the title ~~[Frontend][Doc][3/N] Improve all pooling task | Polish encode (pooling) api & Document.~~ [Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document. Oct 16, 2025

noooop mentioned this pull request Oct 24, 2025

[Usage]: how to request a qwen2.5-VL-7B classify model served by vllm using openai SDK? #27413

Closed

1 task

noooop force-pushed the update_pooling_docs branch from 18b0002 to b6b2e12 Compare October 28, 2025 06:13

noooop added 2 commits October 28, 2025 14:13

token_embed & token_classify

b6b2e12

Signed-off-by: wang.yuqi <noooop@126.com>

Merge branch 'main' into update_pooling_docs

fb5fdfa

mergify bot added the frontend label Oct 28, 2025

noooop changed the title ~~[Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document.~~ [Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. Oct 28, 2025

noooop force-pushed the update_pooling_docs branch from 976793c to d98bf46 Compare October 28, 2025 08:09

/pooling endpoint support all pooling tasks

dd06fe1

Signed-off-by: wang.yuqi <noooop@126.com>

noooop force-pushed the update_pooling_docs branch from e326cde to dd06fe1 Compare October 28, 2025 08:19

noooop added 3 commits October 28, 2025 17:44

update

0643461

Signed-off-by: wang.yuqi <noooop@126.com>

update examples

ce69d7b

Signed-off-by: wang.yuqi <noooop@126.com>

update examples

f9d85cf

Signed-off-by: wang.yuqi <noooop@126.com>

noooop marked this pull request as ready for review October 28, 2025 10:27

noooop requested review from aarnphm and chaunceyjiang as code owners October 28, 2025 10:27

fix

794669d

Signed-off-by: wang.yuqi <noooop@126.com>

noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 28, 2025

noooop added 3 commits October 29, 2025 00:43

Merge branch 'main' into update_pooling_docs

f43249e

Merge branch 'main' into update_pooling_docs

37137cf

Merge branch 'main' into update_pooling_docs

0124f4f

noooop mentioned this pull request Oct 29, 2025

[CI Failure]: torch._inductor.exc.InductorError in Nightly build to run all tests #27724

Closed

3 tasks

hmellor reviewed Oct 29, 2025

View reviewed changes

DarkLight1337 reviewed Oct 29, 2025

View reviewed changes

noooop added 2 commits October 30, 2025 15:58

add deprecated waring

4c2a98e

Signed-off-by: wang.yuqi <noooop@126.com>

Merge branch 'main' into update_pooling_docs

95e014b

DarkLight1337 approved these changes Oct 30, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 30, 2025 08:08

DarkLight1337 merged commit 4464723 into vllm-project:main Oct 30, 2025
61 checks passed

noooop deleted the update_pooling_docs branch October 30, 2025 22:38

This was referenced Nov 2, 2025

[Frontend] Added chat-style multimodal support to /classify. #27516

Merged

[Doc][Last/N] Improve all pooling task | Refactor pooling-related documentation #27963

Closed

noooop mentioned this pull request Nov 21, 2025

Improve enable chunked_prefill & prefix_caching logic. #26623

Merged

5 tasks

This was referenced Dec 8, 2025

[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API #26686

Merged

[Bug]: Skywork Reward Model series not supported for llm.reward #30312

Open

noooop mentioned this pull request Dec 15, 2025

[Model][Last/N] Improve all pooling task | Generate runner supports using embed and token_embed tasks. #30672

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document.#25524

[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document.#25524
DarkLight1337 merged 30 commits intovllm-project:mainfrom
noooop:update_pooling_docs

noooop commented Sep 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

noooop commented Sep 24, 2025

Uh oh!

DarkLight1337 commented Sep 24, 2025

Uh oh!

mergify bot commented Oct 8, 2025

Uh oh!

noooop commented Oct 28, 2025

Uh oh!

hmellor left a comment

Uh oh!

noooop commented Oct 29, 2025 •

edited

Loading

Uh oh!

DarkLight1337 left a comment

Uh oh!

DarkLight1337 commented Oct 29, 2025

Uh oh!

noooop commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

noooop commented Sep 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Improve all pooling task (0.11.1 cut)

Purpose

Test Plan

Test Result

Uh oh!

noooop commented Sep 24, 2025

Uh oh!

DarkLight1337 commented Sep 24, 2025

Uh oh!

mergify bot commented Oct 8, 2025

Uh oh!

noooop commented Oct 28, 2025

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

noooop commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Oct 29, 2025

Uh oh!

noooop commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

noooop commented Sep 23, 2025 •

edited by github-actions bot

Loading

noooop commented Oct 29, 2025 •

edited

Loading