[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document.#25524
Conversation
|
We continue our previous discussion here following #25370 Split the encode task into two tasks: token_embed and token_classify
For online scenarios (/pooling):
|
|
I think for online API, the user should be able to pass the task in the request. |
|
Documentation preview: https://vllm--25524.org.readthedocs.build/en/25524/ |
18b0002 to
b6b2e12
Compare
Signed-off-by: wang.yuqi <noooop@126.com>
976793c to
d98bf46
Compare
Signed-off-by: wang.yuqi <noooop@126.com>
e326cde to
dd06fe1
Compare
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
|
/gemini review |
hmellor
left a comment
There was a problem hiding this comment.
Since activation was user facing, we should probably deprecate it when changing it to use_activation
Anyway, I’d like to merge this PR quickly before the release. It’s mainly documentation changes. This feature has been a bit of a mess, and it’s been changing with every recent release. It should finally be stable now after this latest change. |
DarkLight1337
left a comment
There was a problem hiding this comment.
I agree with @hmellor , we should still keep the old field with a deprecation
|
Otherwise we would break back-compatibility, which is definitely not what a "doc PR" should do |
Signed-off-by: wang.yuqi <noooop@126.com>
|
Let’s land this. |
…g) api & Document. (vllm-project#25524) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…g) api & Document. (vllm-project#25524) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…g) api & Document. (vllm-project#25524) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…g) api & Document. (vllm-project#25524) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
vLLM renamed guided_decoding to structured_outputs and changed the embedding API: - SamplingParams: GuidedDecodingParams -> StructuredOutputsParams, guided_decoding -> structured_outputs (vllm-project/vllm#22772, vllm-project/vllm#29326) - Embedding: use encode(pooling_params=...) instead of generate(sampling_params=...) for pooling tasks (vllm-project/vllm#16188, vllm-project/vllm#25524) - EngineArgs: guided_decoding_backend -> structured_outputs_config User-facing "guided_decoding" key in sampling_params dict preserved for backwards compatibility. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
TL;DR
Improve all pooling task (0.11.1 cut)
These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?
Purpose
Following #25370
Address: #27413 (comment)
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.