[Entrypoints] Split the pooling offline API into PoolingOfflineMixin.#42267
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the LLM class by moving offline pooling inference logic into a new PoolingOfflineMixin class in vllm/entrypoints/pooling/llm.py. The LLM class now inherits from this mixin, and methods such as encode, embed, classify, reward, and score have been relocated to improve modularity. A compatibility issue was identified regarding the use of the default parameter in TypeVar, which is only available in Python 3.13+ and requires importing from typing_extensions to maintain support for older Python versions.
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
Documentation preview: https://vllm--42267.org.readthedocs.build/en/42267/ |
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
Hi @noooop, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Hi @noooop, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Hi @noooop, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
…vllm-project#42267) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
…vllm-project#42267) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
…vllm-project#42267) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
…vllm-project#42267) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
…vllm-project#42267) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
…vllm-project#42267) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: Liuweixiong0118 <lwx34158427@gmail.com>
…vllm-project#42267) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
…vllm-project#42267) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
…vllm-project#42267) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Purpose
This pull request refactors the LLM class by moving offline pooling inference logic into a new PoolingOfflineMixin class in vllm/entrypoints/pooling/offline.py.
vllm/entrypoints/llm.py: 1957 -> 1539
Test Plan
pytest tests/entrypoints/pooling/
Test Result
pass
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.